Memory and Teaching an L2

Memory: What Every Language Teacher Should Know is an attempt to persuade  teachers of an L2 that the best way to teach is by presenting and then practicing carefully selected bits of the language using Delayed Dictation, Drill and Skill, Sentence Stealer, Disappearing Text, Sentence Builders, Shadow reading, Blind Mimes, Rhyming Pairs, Zero Prep Retrieval Starters, Match Up, Spot the Nonsense, Spot the Error, Sentence Puzzles, the Keyword Method, Sentence Chaos, and the MARS-EARS sequencing of lessons.

The book gives an inadequate and incomplete discussion of how memory affects second language learning; it over-emphasises the importance of explicit teaching and is likely to encourage teachers to believe that the Presentation-Practice-Production (PPP) approach is supported by evidence from cognitive science and from evidence from research into SLA. In this review, I begin with a summary of the book and then discuss the authors’ underlying view of language and second language learning. The Notes in brackets are my annotations.

Chapter 1 begins by describing some key terms.

Implicit learning is ‘learning without awareness’. It’s how children pick up their first language(s), with no intention to learn. It’s unconscious. It’s also the most powerful process in second language acquisition. In contrast, explicit learning happens consciously, with intention.

The result of implicit learning is implicit or procedural knowledge – knowing how to do something without necessarily being able to explain it. In contrast, the result of explicit learning is explicit or declarative knowledge – being able to describe what you know, for example being able to say that we usually form the plural in English by adding an ‘s’.

Explicit knowledge develops differently and, although it’s a hotly debated issue, some researchers (for example DeKeyser, 1995), and most teachers, believe that with enough practice, explicit knowledge can become implicit. (Note: Most SLA researchers reject this view.)

The focus in the book is partly on explicit (conscious) learning.

So, what is memory?

The brain records, very quickly, all the information it is bombarded with in a sensory register. This is a sort of ultra-short term memory for each of the five senses. This initially encoded information is sub-conscious. But when we pay conscious attention to something, it is transferred from the sensory register into memory.

Working (or short term) memory holds the information in mind, unwanted information is filtered out so as to allow us to focus on what is worth processing. This allows us to process information more efficiently.  

Long-term memory, on the other hand, is like a huge warehouse where you can access all the things you know about the world, whether it be the capital of Spain, the price of a litre of milk or the meaning of the fact that French adjectives agree with nouns.

Memory can be seen is as a large, leaky bottle with a very narrow neck and huge body. Working memory is the neck which only a small amount of liquid (information) can enter at a time (Baddeley, 2000). If you pour liquid into it too fast much will be lost. The implications of this are clear enough. We need to be really careful to limit the amount of new language presented and practised, especially with beginners, in order for it to pass through the neck of the bottle. (Note: It doesn’t take long for the authors to make their view of language teaching clear: teaching consists of presenting and practicing language. Those who reject basing teaching on implementing a synthetic syllabus reject this view. What’s more, the (false) implication, right from the start, is that language acquisition depends crucially on input passing through working memory.)    

Working memory is defined as:

those mechanisms or processes that are involved in the control, regulation and active maintenance of task-relevant information in the service of complex cognition, including novel as well as familiar, skilled tasks” (Miyake and Shah, 1999, p.450).

We only use working memory when we process new information or carry out tasks consciously. When we perform routine tasks or process familiar information, we use subconscious processes which by-pass working memory. Long-term memory plays an integral role in working memory. Think of a complex, dynamic two-way relationship between working and long-term memory. (Note: the authors acknowledge here that not all language learning relies on the processing of information in working memory. However, the book as a whole gives the strong impression, in my opinion, that SLA relies on the processing of information in working memory – of getting input through the very narrow neck of the bottle. Such an argument is plain wrong.)

The term automaticity comes from a skill acquisition model of learning proposed by John Anderson. The idea is that, with practice, knowledge can be become automatically retrievable without having to think about it (Anderson, 1982). This means that, when we perform a complex task, the brain can bypass working memory, calling on automatised knowledge from long-term memory. So the main point is this: working memory capacity is very limited and language learners, especially beginners, don’t have much long-term memory knowledge to help them deal with incoming language. Work focused on automaticity can speed up retrieval and lighten the load on working memory. (Note:  Anderson’s ACT theory is almost impossible to test empirically and is rejected by most SLA research scholars.)

Every operation the brain performs when decoding a message takes place in working memory. Take vocabulary learning: any rehearsal we do when trying to commit vocabulary to long-term memory (for example, repeating aloud) is performed in working memory, which temporarily holds that information for as long as we rehearse it. In speaking and writing, all the operations needed to put ideas into words, all the while monitoring for accuracy, happen in working memory too.

As new information is noticed, it interacts with all sorts of information held in long-term memory (phonological, lexical, grammatical and semantic). The new information is processed, combined with pre-known information and creates new memories. It’s a highly complex, fast process, in constant flux.

The limited capacity of the Phonological Loop means that a novice second language learner can hold fewer words in working memory than in their first language as they pronounce the words more slowly. The more rapidly a second language speaker can utter a word or phrase, the less space it takes in working memory. So the more you know, and the more fluently you can speak, the easier the job becomes for working memory. This means that teachers need to carefully control the amount and difficulty level of language that students hear and read. (Note: it means no such thing. Not everything, by a very long stretch, we learn about the L2 passes through working memory.)  

For memory, students need to hear lots of comprehensible target language. The more you know and can say fluently, the less space is taken up in working memory. Particular attention is needed to deal with discrepancies between the phonotactics of different languages and to ensure learners get regular practice hearing and using utterances beyond the single word level.

Chapter 5 discusses Visual Space Memory, and I’ll leave it out, but note that the authors fail to discuss the advantages of multi-modal texts.

Chapter 6 is on Cognitive Load Theory (CLT). CLT is based on the limitations of working memory. Given these limitations, the argument is that new information presented to working memory can soon overload it. Sweller’s model of cognitive load claims that there are 3 types of load involved when processing new information: intrinsic, extraneous and germane (Sweller, 1988).

Intrinsic load = how many new, interacting things a student has to do to simultaneously in order to complete a task.

Extraneous load = demands placed on students by the teacher through the way they choose resources, present information or design teaching activities.

Germane Load= the load needed to build knowledge schemas in long-term memory and increase learning. In the case of language teaching, it refers to the process of linking new information with information already stored in long-term memory in order to create new schemas (for example, chunks of language). So when learning a new verb tense, you may call on knowledge of an existing tense. Germane load can be seen as a measure of the extra load imposed by the teaching activity which supports learning. It is where metacognitive strategies come into play; where students are aware of their thinking processes and able to adapt new information accordingly. In sum, you could say it’s where the real learning happens! (Note: you could, but it isn’t. See Discussion, below)

There follow lots of sections which are, not surprisingly, to do with explicit teaching. Examples are:

  • the information store principle;
  • the borrowing and reorganising principle;
  • the randomness-as-genesis principle;
  • the narrow limits of change principle;
  • the environmental organizing and linking principle.

The chapter goes on to look at “Factors affecting cognitive load”, and there are a lot of them, including the “Worked Example Effect”, which is well-discussed in the literature. The way the authors use it is, I think, a good indication of their whole approach. Just as in maths a teacher might work through a problem on the board to show how it is solved, they say, a language teacher can work through how to solve a translation by applying grammatical knowledge. In a two- way process, suggestions can be sought, questions asked, prompts provided and explanations offered. Sentence builders (the bright star in the Smith and Conti teaching almanac, also known as substitution tables) can be used as part of the process. Other effects discussed include

  • the modality effect,
  • the transient information effect,
  • the temporal congruity effect,
  • the segmentation effect,
  • the pre-training effect,
  • the variability effect.

As if this weren’t enough, the next chapter s devoted to considering in more detail teaching hints for managing cognitive load when teaching students new information that is processed in working memory. Sections include:

  • Building phonological memory;
  • The skilled use of questioning;
  • Working step by step;
  • Preventing divided attention;
  • The role of comprehensible input;
  • Chunking the input, including the use of sentence builder frames and knowledge organisers;
  • Learnability and processability;
  • Preventing inattentional blindness;
  • Metacognitive strategies;
  • Managing cognitive load in Task-Based Language Teaching;
  • Cognitive fatigue.

Finally, Rosenshine’s Principles of Instruction are applied to language learning. (Note: These give a good indication of the authors’ inclination towards imparting declarative knowledge, as if they were teaching any other subject in a school curriculum, as if learning an L2 were the same as learning Geography, for example. It’s all about declarative knowledge.)  

Barak Rosenshine’s Principles of Instruction

1.         Begin a lesson with a short review of previous learning.

2.         Present new material in small steps with student practice after each step.

3.         Ask a large number of questions and check the responses of all students.

4.         Provide models.

5.         Guide student practice.

6.         Check for student understanding.

7.         Obtain a high success rate.

8.         Provide scaffolds for difficult tasks.

9.         Require and monitor independent practice.

10.       Engage students in weekly and monthly review.

Chapter 8 deals with long term memory.  

If we don’t pay attention to information, it is not consciously available, so it can’t enter working memory at all and cannot pass into long-term memory by that route. To reiterate, however, some of the information we don’t pay attention to may potentially pass directly into long-term memory through implicit (unconscious) learning. (Note: it is not that some of the information we don’t pay attention to “may potentially” pass directly into long-term memory, but rather that most of the process of becoming a competent user of an L2 consists of implicit learning. I will return to this vital point later.)

Schmidt’s Noticing Hypothesis (Schmidt, 1990; 2001) claims that attention is crucial for input to become intake. What is noticed becomes intake. Intake cannot happen without some level of awareness. It’s worth noting that there remains a good deal of debate about what ‘noticing’ actually means and that students seem to be able to pick up new language without appearing to have noticed it at all (implicit learning). As researcher Lourdes Ortega puts it, “the jury is still out on the question of whether learning can happen without attention” (Ortega, 2013, p. 96). (Note: Ortega acknowledges that this is not true. Schmidt admitted that learning can and does happen without attention. Nick Ellis, who the authors are fond of quoting when it suits them, says: “the bulk of language acquisition is implicit learning from usage. Most knowledge is tacit knowledge; most learning is implicit; the vast majority of our cognitive processing is unconscious” Ellis, 2005).

Even if you pay attention to something and notice it you may not end up storing it properly in memory. As far as language teaching is concerned, language has to be transferred from working memory to long-term memory. For this to happen a process of maintenance rehearsal is needed – reviewing material over and over again. (Note: this a truism, based on a false assumption. Only if teaching (wrongly!) concentrates on declarative knowledge, does what is taught (explicitly) have to be transferred from working memory to long term memory.)  

The Schmidt Noticing Hypothesis claims we usually need to notice patterns in the language to internalise them. Research suggests that students can also pick up patterns implicitly. Students find it very hard to focus on the form and meaning of language at the same time. We cannot assume students will notice patterns unless we get students to look for them or point them out. Input can be manipulated to encourage students to notice patterns. On balance it seems that listening to music with lyrics while studying is distracting, but research supplies mixed messages on this subject. For memories to become more permanent, maintenance rehearsal is needed. The Ebbinghaus Forgetting Curve is a clear reminder of how quickly students can forget language. Regular distributed practice is usually needed for memories to stick. (Note: this comes close to distilling all that’s wrong with the whole book. It’s a classic example of covering all bases while insisting on an erroneous argument, namely that concentrating on the limitations of working memory is the key to successful second language learning.)

Chapter 9: Declarative and Procedural Knowledge

Nick Ellis (2017) points out that implicit and explicit learning are distinct processes; humans have separate implicit and explicit memory systems; there are different types of knowledge of and about language; these are stored in different areas of the brain. This ties in with what cognitive psychologists call declarative memory and procedural memory. Broadly speaking, explicit learning tends to produce declarative knowledge (‘knowing that’, for example, knowing the endings of a verb), while implicit learning tends to produce procedural knowledge (‘knowing how’ – being able to use those verb endings without having to think about it).

The relationship between explicit/declarative and implicit/ procedural knowledge may not be simple. One question is whether, in second language learning, declarative knowledge can become procedural. In the literature about SLA, the term interface is used to denote the potential barrier between declarative and procedural learning. Researchers disagree about the extent to which the interface can be crossed, with most believing that it can, under certain conditions, for example, N. Ellis (2005) and Ullman (2006). Our own belief is that by combining explicit teaching with repeated implicit exposure, students do gradually internalise language patterns.

Priming This brings us to a really important concept for language learning: priming. Speaking our first language at normal speed seems pretty effortless. We’re able to do this because every time we utter a word or phrase we are sub-consciously associating it with previous and possible future words or phrases. Our vast experience with the language gives us a huge range of possibilities since we’ve heard or read a myriad of possible combinations. So when we’re about to utter the next word or phrase, in a fraction of a second (around 50 milliseconds to be precise), we subconsciously choose the right one from the range of possibilities. This subconscious process of words affecting the following ones is called priming. One word or phrase primes the next. (Note: this is a bizarre account of priming. I’ll discuss it later.)

There are two main types of priming which have powerful learning effects: Perceptual priming and conceptual priming. Priming is known to activate the brain areas in the cortex associated with the thing being primed. So (Note: “So”? Really???) priming the word transport causes all the areas of the brain associated with transport to become active for a brief moment. This extra bit of activity makes it easier for additional information to be activated fully. (Note: Truly bizarre!)  

Manipulating the language input is likely to lead students to use and remember structures more successfully. That’s why it’s a good idea to repeatedly use high frequency grammatical patterns in the expectation that students will pick them up both in the short and long term. This can be done, for example, by means of sentence builders, question-answer sequences or audio-lingual style drills, as well as flooding input language with the patterns you want students to pick up.

We have seen that priming means repeating the presentation of something affects the way it’s processed a second time. If students are frequently exposed to a repertoire of chunked language it is more likely that one word, phrase or sentence will prime the next, allowing fluency to develop. In time-poor classroom settings, to achieve the amount of recycling needed for priming effects to develop, it’s wise to limit the amount of language input. You might like to think of it this way: at the start you have a small snowball of language. Over time, as new language is added, the snowball gets larger and larger as you add new language to the existing repertoire.

There are two types of learning happening in language lessons, implicit (unconscious) and explicit (conscious). The more comprehensible language that students hear and read, the more chance there is for implicit learning to occur. Priming is a type of implicit learning where previous learning events affect those in the future, or one word or pattern influences the next. (Note: the authors concede that priming is implicit learning – nothing to do with working memory.)  

It is sometimes said, therefore, that language learners have an in-built syllabus which affects what they can and can’t easily acquire. But this is a much debated and messy area of research and more recently doubt has been cast on the extent to which natural orders apply in second language acquisition (discussed in Kwon, 2005). Perhaps other factors such as frequency, inherent difficulty of grammar and differences between the first and second language come into play (see below). Others have pointed out that the social context may have an effect on sequence of acquisition, for example whether the language being learned is in a formal classroom or in other social settings (Ellis, 2015). However, it is safe to say one thing: teaching can only have at best a partial effect on the order in which learners acquire grammar. Remember that by acquire we mean possess the internalized (automatised) ability to use grammar, not just explain it. In other words the grammatical system needs to be in procedural long-term memory and this takes time. In sum, whether students are immune to the order in which you teach grammar or not, it’s important to have a sense of whether students are developmentally ready to acquire new grammar. (Note: this concession actually challenges the main argument of the book. Furthermore, I’ve never seen such a poor treatment of interlanguage development. )

Pienemann’s Processability Theory

The basic idea underlying second language acquisition researcher Manfred Pienemann’s Processability Theory is that at any stage of development a learner can produce and comprehend only those second language linguistic forms which the current state of the ‘language processor’ in the brain can handle. A student may be ready to acquire a new structure, or not ready. Knowing if a student is ready or not for a structure is therefore hard to gauge and, in the end, comes down to the teacher’s knowledge of each class and each student. In reality, because the range of natural aptitude and achievement in any class is considerable, deciding when to move on is bound to be a compromise because a traditional grammatical syllabus fails to take account of a student’s current state of second language development, you have to select the most important structures, supplement the text book and build in more practice. (Note: Another crass misrepresentation of a scholar’s work. I’ll say more in the discussion.)

To get around the difficulty of internalising grammar, some researchers, writers and teachers (including ourselves) suggest that combining vocabulary and grammar through a chunking approach makes learning easier, particularly for students learning a language in school settings. As a reminder, this is termed a lexicogrammatical approach (combining lexis – vocabulary – with grammar) to provide learners with lots of ready- made language chunks which they can learn and manipulate communicatively.

Surprise drives learning. Rescorla and Wagner hypothesised that the brain learns only if it perceives a gap between what it predicts and what it receives (cognitive conflict). (Note: this is an inadequate treatment of the important matter of parsing, which I’ll discuss below.) Long’s Interaction Hypothesis claims we need to test our utterances with other speakers to get feedback and to notice when we make mistakes in order to improve. When a student makes a mistake they are trying out a hypothesis. Corrective feedback tells them if it was right.

Allowing students to make errors is more productive than creating the conditions where errors are avoided at all costs (as in the behaviourist model). Deliberately using errors in input is a productive practice for language teachers, but needs careful timing and implementation. Recasts and prompts are two ways to provide feedback. The latter may be more effective, notably with beginners. Research is unclear about the timing of error feedback, but experience suggests it’s best to focus on only a very few major errors at one time. Although feedback can improve memory, language teachers can easily overestimate the value of correcting errors and may spend too much time doing so.

A selection of grammatical structures has to be made, sequenced in some coherent way, but this doesn’t necessarily mean organising your whole curriculum around an ordered sequence. Research offers little support for a curriculum based on ‘the grammar point of the day’ (a so-called synthetic syllabus). As we previously explained, students become developmentally ready to acquire grammar at different points. In addition, although teachers may find grammar fascinating, this is not necessarily the case for our students. One way around this, as we have mentioned before, is to incorporate grammar and vocabulary through a lexicogrammatical approach. This means presenting and practising language in communicative chunks in a way which is more appealing to students and corresponds better with how memory works. In a lexicogrammatical approach the grammar emerges from the language chunks used in communication. Grammatical points are explained and practised once students have had repeated receptive exposure through flooded input.

If learning a new language is largely a natural, unconscious, implicit process then it’s clear that our main role is to provide language input, allow learners to interact with it, nature will take its course and long-term memory will grow. On the other hand, if learning is a conscious process involving working memory, one where declarative knowledge becomes procedural, then teaching has to take this into account. Our own belief is that, in school settings, both learning routes are necessary to maximise both implicit and explicit learning. (Note: this sums up the “have your cake and eat it” argumentation that characterises the whole book.)

Finally, some motherhood statements conclude the book.

  • Make sure students receive as much meaningful, stimulating input as possible. Make sure students have lots of opportunities to practise orally, Use a balanced mixture of the four skills of listening, speaking, reading and writing.
  • Promote independent learning outside the classroom.
  • Select and sequence the vocabulary and grammar you expose students to. Do not overload them with too much new language at once. Focus on high frequency language.
  • Be prepared to explain how the language works, but don’t spend too much time on this.
  • Aim to enhance proficiency – the ability to independently use the language promptly in real situations.
  • Use listening and reading activities to model good language use rather than test; focus on the process, not the product.
  • Be prepared to judiciously and sensitively correct students and get them to respond to feedback. Research suggests negative feedback can improve acquisition.
  • Translation (both ways) can play a useful role, but if you do too much you may neglect general language input.
  • Make sensible and selective use of digital technology to enhance exposure and practice.
  • Place a significant focus on the second language culture.


Here’s what I say in my blog:

Most teachers are aware that we learn our first language/s unconsciously and that explicit learning about the language plays a minor role, but they don’t know much about how people learn an L2. In particular, few teachers know that the consensus of opinion among SLA scholars is that implicit learning through using the target language for relevant, communicative purposes is far more important than explicit instruction about the language. Here are just 4 examples from the literature:

1. Doughty, (2003) concludes her chapter on instructed SLA by saying:

In sum, the findings of a pervasive implicit mode of learning, and the limited role of explicit learning in improving performance in complex control tasks, point to a default mode for SLA that is fundamentally implicit, and to the need to avoid declarative knowledge when designing L2 pedagogical procedures.

2. Nick Ellis (2005) says:

the bulk of language acquisition is implicit learning from usage. Most knowledge is tacit knowledge; most learning is implicit; the vast majority of our cognitive processing is unconscious.

3. Whong, Gil and Marsden’s (2014) review of a wide body of studies in SLA concludes:

“Implicit learning is more basic and more important  than explicit learning, and superior.  Access to implicit knowledge is automatic and fast, and is what underlies listening comprehension, spontaneous  speech, and fluency. It is the result of deeper processing and is more durable as a result, and it obviates the need for explicit knowledge, freeing up attentional resources for a speaker to focus on message content”.

4. ZhaoHong, H. and Nassaji, H. (2018) review 35 years of instructed SLA research, and, citing the latest meta-analysis, they say:

On the relative effectiveness of explicit vs. implicit instruction, Kang et al. reported no significant difference in short-term effects but a significant difference in longer-term effects with implicit instruction outperforming explicit instruction.

Despite lots of other disagreements among themselves, the vast majority of SLA scholars agree on this crucial matter. The evidence from research into instructed SLA gives massive support to the claim that concentrating on activities which help implicit knowledge (by developing the learners’ ability to make meaning in the L2, through exposure to comprehensible input, participation in discourse, and implicit or explicit feedback) leads to far greater gains in interlanguage development than concentrating on the presentation and practice of pre-selected bits and pieces of language.

Now, while the book under discussion covers its back, so to speak, by recognizing the importance of implicit learning, its message is clear: explicit teaching of bits of language and their underlying grammar is the name of the game, and thus, considerations of the limitations of working memory are vital. Hence the overriding importance given to the discussion of cognitive load. And hence so much reliance on all the stuff, the awful jargon-ridden, anacronym-clogged  stuff, that characterises Conti’s confident “Here’s the Way to Do It” prescriptions that he goes round the world promoting. The book sends a clear message to readers: follow Conti and the MARS-EARS approach to teaching.

Conti’s approach is wrong. I’ve discussed his approach elsewhere, and I’ve argued in several posts that Long’s TBLT approach is better. As McLaughlin (1990) says, a cognitive description of second language learning provides a partial account and needs to be linked to linguistic theories of second language acquisition. “By itself, for example, the cognitive perspective cannot explain such linguistic constraints as are implied in markedness theory or that may result from linguistic universals. These specifically linguistic considerations are not addressed by an approach that sees learning a second language in terms of the acquisition of a complex cognitive skill” (p. 126).  

An important weakness of the book is how it deals – or fails to deal – with the constructs of input and intake. Smith and Conti talk about input going through working memory and into long term memory. But what is “input”? Without bothering much to define input, everybody agrees that “comprehensible input” is the key term. And what is comprehensible input? It’s that part of the language which the learner hears or reads and comprehends! This, of course, begs the question of what “the language they understand” consists of. In fact, input is noise, or stuff that hits the retina when you read, plus stuff you feel in your gut, and so on. So here’s the crucial point: we don’t get language from the environment, we get sensory stimuli. Jumping the gun a bit, when Schmidt claims that input becomes intake as the result of being “noticed” he uses three constructs that make up a circular argument. Sensory input from the external environment does not include nformation about the formal properties of a language which can be either ignored or noticed. You cannot notice from input that “went” is the irregular past tense form of the verb to go. Something already in your mnd has to do something quite special for that to happen

What you do, of course, is infer things from sensory stimului. The question is: What helps you make these inferences?  The two big contenders for an answer to this question are Chomsky’s UG theory and usage-based theories, perhaps articulated best these days by Nick Ellis. Personally, I think that William O’Grady’s theory of SLA and Suzanne Carroll’s theory of SLA are both much better than those that rely entirely on UG or on emergentism, but let’s keep it simple. Chomsky says that language learning relies on the workings of a special module of the human mind. Human beings are born with an innate capacity to make sense of the stimuli they get from the environment. This innate ability enables humans to make sense of stimuli thanks to a module of mind devoted to parsing all the stimuli we call language, informed by principles that underly all languages, plus parameters that further refine the principles. So we don’t really learn languages, any more than we learn how to use our lungs – we just grow into proficient users. Note: we’re talking about L1 acquisition. Chomsky didn’t really care that much about developing a complete theory of anything. What he was interested in was describing the underlying grammar that unifies all human languages; a truly magnificent task which has had extraordinarily widespread practical results.  The most powerful argument in favour of Chomsky’s view is the “Poverty of the stimulus” argument: children know more about their L1 than can be explained by an appeal to their encounters with the language they’ve been exposed to. Given the knowledge children show of their L1, which could not have come from their exposure to it, we conclude that the knowledge they demonstrate is innate. This is called “inference to the best explanation” (see Hacking for the best discussion) and I’ve yet to see a good reply to it.

On the other hand, there are various theories, these days put under the umbrella of “usage-based” theories, that explain all language learning as the result of much more general, very simple, operations of the mind. The most extreme of these theories wants to return to true empiricist principles, where any suggestion of a mind is outlawed. This “behaviorism revisited” view has been dolled up in various ways, most fancifully by Larsen-Freeman, who somehow manages to keep a straight face while explaining how flocks of birds and bits of broccoli support her new view of SLA. Such is Larsen-Freeman’s clout, or maybe, such is his gullibility, that my hero, Scott Thornbury, was heard for a while almost parroting this “chaos-theory” nonsense. Scott talked of how Descartes got uncomfortably stuck in a non-existent bit of himself, how the corporal body leans (or was it “sways”?) naturally towards the present perfect when talking of things present, how a proper appreciation of the definite article and two-letter prepositions can slowly release the whole rich grammar of English, and other mystic flights of thought.

The usage-based school is more reasonably represented by Nick Ellis, and I’m dismayed to see that he has such enormous support these days. Nick (I use his first name only to avoid confusion with Rod Ellis, whose own position is as clumsily ambiguous as always) makes a necessarily complicated case (I mean that there’s no simple case to be made for it) for his view that we should see language as one more tool in an evolving set of skills that has emerged in our attempts to communicate with each other. Language is not just a tool for social interaction. How about your innermost emotional musings, the way you think and talk to yourself, your half remembered thoughts when you wake up from a torrid dream, or the unwritten thoughts of your granny, or Socrates or Wittgenstein, for example?  Silly examples, but language is not, pace Holliday and the rest, only a tool for communication. I agree with Pinker that you don’t need language to think, but language is more than a tool for communication. Furthermore, Nick Ellis has still not given any good reply to Gregg’s (2003) resounding criticism. “Emergentists have so far failed to take into account, let alone defeat, standard Poverty of the Stimulus arguments for ‘special nativism’, and have equally failed to show how language competence could ‘emerge’.”

Whatever theory of SLA you fancy, they all agree on one thing: language learning is mostly implicit: it’s a question of learning by doing and letting whatever processes of the mind you want to hold responsible take their course. My own view is that the stimulii that make up the L2 are parsed by various processors which are tuned to the L1. When the parsers hit a problem, various interventions occur, trying to solve the various parts of the problem. Teachers explaining things can help, but what they can’t do is give their students procedural knowledge by telling them about the language. It follows that teachers should find out what their students need to DO with the L2 and then help them do it through scaffolded practice. That’s my view, and it’s why I advocate a TBLT approach and criticise General English Coursebooks.

Smith and Conti ‘s book fails to discuss the vexed, but essential question of the roles of explicit and implicit learning of a second language. It suggests to teachers that declarative knowledge about the L2 can be converted into procedural knowledge of how to use the L2 by careful attention to cognitive load. It fails to discuss the way that LTM stores linguistic information, ignores the evidence that suggests the fundamental division between declarative and procedural knowledge, and ignores the evidence that learners develop their own idiosyncratic interlanguage in a way that is impervious to explicit teaching. It serves the purpose of promoting teaching practices that do almost nothing to reform the classic PPP approach that blights current L2 teaching practice.


  1. I promised to talk about Smith & Conti’s use of priming and didn’t do so. See this post for a discussion
  2. As to Pienemann’s work, I disuss it in this post, one of the SLA series of posts
  3. Cognitive load is narrowly discussed in this book. It’s an interesting subject, poorly dealt with in the book, which only looks at its effects on presenting bits of language and the effects on working memory. Much more interesting, IMHO, are the discussions of cognitive load when applied to tasks in a TBLT syllabus. Long (2015) insists that cognitive load should refer to the demands of pedagogic tasks, which ask students to do things with the language in order to achieve an identified target task, like giving a presentation, or writing a report, for example, So it’s the complexity of the task, not its putative linguistic complexity, that is the organising principle of the syllabus. I think he’s absolutely right. We should sequence pedagogic tasks by slowly increasing their cognitive demands, and these demands have to do with their effects on CAF – the complexity, accuracy and fluency of production. Long inclined towards Robinson’s (2005; 2007) complicated theory of task compexity (which assumes that learners will respond to the increasing demands of successively more demading tasks unhindered by restrictions of working memory), but later agreed more with Skehan’s (1998; 2003) “trade-off” view, which is based on considerations of the limitations of working memory. Smith and Conti’s book touches on these issues, but doesn’t discuss them well. The diferences in the postions of Robinson and Skehan, when talking about the way communicative tasks should be sequenced, is very interesting. I think Skehan’s right to say that Robinson’s theory is fanciful, I agree with Skehan about trade offs, but I think Skehan is wrong when he sides with Willis, emphasising the importance of explicit teaching. All very interesting stuff, none of it properly discussed in the book under review. I predict that the authors, if they respond to this review, will point to that bit of the book which gives a table of task types (open / closed, etc.) and their different demands. If they do, let them tell us more than the book does.


Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89(4), 369–406.

Bryfonski, L., & McKay, T. H. (2017). TBLT implementation and evaluation: A meta-analysis. Language Teaching Research, 23, 5, 603-632.

Carroll, S. (2000). Input and evidence: The raw materials of second language. Amsterdam, Benjamins.

Chomsky, N. (1959). Review of B.F. Skinner Verbal behavior. Language 35, 26–8.

DeKeyser, R. (1995). Learning Second Language Grammar Rules: An Experiment With a Miniature Linguistic System. Studies in Second Language Acquisition, 17(3), 379-410

Doughty, C. (2003). Instructed SLA. In Doughty, C. & Long, M. Handbook of SLA, pp 256 – 310. New York, Blackwell.

Ellis, N. (2019). Essentials of a Theory of Language Cognition. The Modern Language Journal, 103 (Supplement 2019).

Ellis, N. C. (2013). Second language acquisition. In Oxford Handbook of Construction Grammar (pp. 365-378), G. Trousdale & T. Hoffmann (Eds.). Oxford: Oxford University Press.  

Ellis, N. C. (2005). At the interface: Dynamic interactions of explicit and implicit language knowledge. Studies in Second Language Acquisition, 27, 305–352.

Ellis, R. (2015). Researching Acquisition Sequences: Idealization and De‐idealization in SLA. Language Learning, 65, 181-209.

Eubank, L. and Gregg, K. R. (2002) News Flash–Hume Still Dead. Studies in Second Language Acquisition, 24, 237-24.

Gregg, K.R. (2003) The state of emergentism in second language acquisition. Second Language Research, 19,2, 95–128.

Long, M. (2015). Second Language Acquisition and Task-Based Language Teaching. Oxford, Wiley.

McLaughlin, B. (1987). Theories of Second-Language Learning. London: Arnold.

O’Grady, W. (2005) How children learn language. Cambridge, Cambridge University Press

Ortega, L. (2013). Understanding Second Language Acquisition. London: Routledge.

Pienemann, M. (1998). Language processing and second language development: Processability theory. Amsterdam/Philadelphia: John Benjamins.

Robinson, P. (2005). Cognitive complexity and task sequencing: studies in a componential framework for second language task design. International Review of Applied Linguistics, 43, 1-32.

Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: effects on L2 speech production, interaction, uptake and perceptions of task difficulty. International Review of Applied Linguistics, 45, 3, 193-213.

Rosenshine, B. (2012). Principles of Instruction. Research-Based Strategies that All Teachers Should Know. American Educator.

Schmidt, R. (1990). The Role of Consciousness in Second Language Learning. Applied Linguistics, 11, 129-158.

Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction, 3-32. Cambridge: Cambridge University Press.

Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press.

Skehan, P. (2003). Task-based instruction, Language Teaching, 36, 1-14.

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science 12,2, 257-285.

Whong, M., Gil, K.H. and Marsden, H. (2014). Beyond paradigm: The ‘what’ and the ‘how’ of classroom research. Second Language Research, 30, 4, 551-568.

ZhaoHong, H. and Nassaji, H. (2018). Introduction: A snapshot of thirty-five years of instructed second language acquisition. Language Teaching Research, 23, 4, 393-402.

20 thoughts on “Memory and Teaching an L2

  1. Good job, Geoff; I can see why you’re exhausted. But when you’ve recuperated sufficiently, could you maybe provide a reference list?
    One comment on S&C on priming. They say (I assume you’re quoting them indirectly)
    Our vast experience with the language gives us a huge range of possibilities since we’ve heard or read a myriad of possible combinations. So when we’re about to utter the next word or phrase, in a fraction of a second (around 50 milliseconds to be precise), we subconsciously choose the right one from the range of possibilities. This subconscious process of words affecting the following ones is called priming.
    Are they serious? Say I intend to utter the sentence, “I have a pain in my big toe”. Having reached ‘my’, I then scan my vocabulary and in 50ms select ‘big toe’ rather than ‘fat butt’ or ‘green soup’ or ‘old typewriter’ or ….? I don’t think I’ve ever come across this kind of account of priming, and I’m not surprised. Note that as a native speaker with a much larger vocabulary, I’m at a disadvantage compared with the English student with a painful toe.

  2. Hi Kevin,
    Thanks for this. I’ll do the references today.
    Maybe “Tell me about priming” should be established as a baloney test.
    I hope your toe gets better soon.

  3. Hi Geoff
    Thanks for the review.

    I do wonder if the problem of “having your cake and eating it” (book recognises implicit learning but promotes explicit learning) is more pronounced in the context of language teaching in public institutions (like schools) than in private institutions as in ELT? I assume the pressure to be like another school subject is greater in state schooling?

    I think that in any case the awareness of teachers in ELT is similarly low about the key importance of implicit learning. As a language teacher it is still a struggle for me to accept that learners will come into the language and external things like having a teacher has minimal effect on internal implicit processes!

  4. Hi Mura,

    Thanks for this.

    As far as I know, Smith and Conti’s audience consists mostly of teachers of modern languages (ML) working in primary and secondary education, although Conti in particular likes to throw his net as wide as possible and frequently talks about ELT. I’m sure there’s pressure on teachers to adopt a syllabus and pedagogic procedures which fit in with other school subjects in the curriculum and which prepare students for end of year exams where declarative knowledge is tested more than procedural knowledge, and I would guess that most teachers themselves approve of the PPP approach Smith and Conti recommend. But none of this justifies the weaknesses that I try to highlight in my review.

    First, the book claims to be a serious, research-based account of its topic. In fact, it misrepresents the literature on memory in the ways I indicate, and gives very flawed accounts of interlanguage development, noticing, priming, parsing, and other important matters along the way. These weaknesses need pointing out. Second, in my opinion, through an unbalanced, incomplete and faulty review of language learning, the book encourages an inefficacious approach to teaching. That’s my opinion, supported by arguments and evidence from SLA research. Readers will have to decide for themselves.

    I agree that the awareness of teachers in ELT about the key importance of implicit learning is low. But that’s no accident: ELT is a multi-billion-dollar industry, based on commodification, on turning ELT into the manufacture, marketing, sale and promotion of commodities like coursebooks, CELTA, and the IELTS exam. (Conti is a good example of a strident purveyor of language teaching commodities.) It doesn’t suit those who profit from this industry to tell teachers about how people actually learn languages, because it would upset a very lucrative apple cart.

    You say that, as a language teacher, you struggle to accept that having a teacher has minimal effect on your students’ internal implicit learning processes. I didn’t suggest any such thing, Teachers can play a vital, facilitative role in helping students learn and in speeding up the process. I’ve discussed how I think they can best do with at length elsewhere in posts about Long’s version of TBLT, which stresses the importance of attention to grammar and other formal aspects of the L2.

    Some say that we all agree that teaching languages involves both declarative and procedural knowledge and it’s just a question of emphasis. But it isn’t. The difference between an approach to teaching based on using a synthetic syllabus, proficiency levels, and high stakes exams on the one hand, and an approach based on using an analytic syllabus (e.g., Dogme, or Long’s TBLT), needs analysis and criterion-reference, performance-based tests on the other hand, is fundamental. All the evidence suggests that the latter approach gets better results, if communicative competence is the goal.



  5. Hi Geoff,

    Yes I agree with your points in your review.

    Regarding my comment about teaching, I am taking my cue from VanPatten who says:

    “No strong case can be made for instruction speeding up acquisition.
    Classroom learners may be faster than non-classroom learners for a
    variety of reasons unrelated to instruction.” (VanPatten et al 2020:250)

    VanPatten, B., Smith, M., & Benati, A. (2020) Key Questions in Second Language Acquisition: An Introduction


  6. Hi Mura,

    Well, I disagree with vanPatten – a bit!

    Mike Long gave a list of studies revealing positive contributions that instruction can make in 1987, while ZhaoHong & Nassaji’s (2018) article, cited in the post, reviews more than 60 studies over the last 35 years of instructed SLA research, and paints a far more positive picture.

    Still, I take the point, Most of the studies conclude that instruction has lots of “potential”. But, as you well know, I strongly agree with Bill vanPatten’s reservations about the efficacy of the vast majority of the stuff that today passes for “instruction” – including the stuff you’ll find in Conti’s Language Gym.

  7. It seems pointless to say that ‘instruction’ does or does not contribute to acquisition without specifying what sort of instruction and what sort of acquisition. If you tell me that the past tense of ‘eat’ is ‘ate’, I could very well acquire the knowledge of the past tense of ‘eat’. If you tell me that vowels agree in height with preceding vowels, I could very well be no wiser than before. I’d like to see an example from the list Mike gave, or a study from Z &N.

  8. Does the book indicate at all the individual differences approach that working memory and second language research is situated in?
    VanPatten et al (2020) suggests that like other individual differences (motivation and aptitude) working memory maybe important for rate of acquisition but not for how language is acquired or what is acquired?

  9. Mura, RE individual differences

    Yes, there is some mention of individual differences in learners’ WM and the effects on SLA. Smith & Conti stress the importance of this in avoiding “cognitive overload.”

    I’ve just checked and I haven’t got a copy of the van Patten et al (2020) book, so I’ll have to get one and read it 😦

  10. Kevin, RE Effects of instruction

    Yes, agreed.

    I’m sure you’ve seen Mike’s paper, or know how to get hold of it. The Zhao, Hong & Nassaji’s (2018) article is in the Language Teaching Research journal.

  11. yeah I like it, it has some nice arguments. For example recalling your critique against a lexical approach as some how “releasing grammar” – if you consider learners inferring meaning first from input (from general cognitive resources) then as you understand more of the meaning of a word/collocation then this can free up resources to process syntax (using UG).

  12. He makes that argument in the 2015 book, doesn’t he? I’m keen to see what’s new.

    VanPatten, B. (2015). Input Processing in Adult Second Language Acquisition. In B. VanPatten, & J. Williams (Eds.), ‘Theories in Second Language Acquisition: an Introduction’ (pp. 113-134). New York, NY: Routledge.

  13. It’s been a long time since I’ve looked at VanP’s stuff, but I have a clear enough memory of thinking that he has no coherent account of input processing. Specifically, he offers no grounds for believing that his subjects were processing the input as they should. Sorry I can’t be more specific; I had to dump the hundreds of papers I’d kept when I had to vacate my office. I will say that William O’Grady shares my opinion of VanPatten’s work. Of course, this may all be irrelevant to the current discussion.

  14. Kevin,
    I haven’t looked at VanPatten’s account since the 2015 book, which I didn’t get much from. I’ve just downloaded the Kindle version of the 2020 book that Mura mentioned, and it adds nothing much as far as I can see from a quick look. It’s a terrible tease – it promises that all will be revealed in the final chapter, which turns out to be very thin soup. I think VanPatten’s IP account is coherent – it makes sense – but it’s flimsy. Why do you say it’s incoherent? PLEASE don’t say you’ve got a train to catch. And where can we read William O’Grady’s view?

  15. Sorry; as I said, it’s been years since I’ve read anything of VanP’s. I suppose ‘coherent’ wasn’t the best choice of words; anyway, I found his account unsatisfying. As for William, all I have (had) was an e-mail exchange; he’s been reading VanPatten and Pienemann for some reason, and when I expressed my opinion of VanP’s research, he agreed.

  16. I can’t imagine why. Mind you, I have no interest in language instruction, but what is Processability Theory that it would have an approach to it?
    Geoff; I have no idea what the paper is that Mike wrote on the effects of instruction; but I’m sure I don’t have it or know where to get it.

  17. Hi Kevin, Sorry not to have replied to this.
    Processability Theory leads to to the Teachability Hypothesis.
    As for Mike’s paper, it was in Vol. 1. Issue 1 of the new journal Instructed SLA, can’t remember the title, but 2017, I think. VanPatten wrote the other paper for that first issue.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s