Teacher Trainers in ELT

This blog is dedicated to improving the quality of teacher training and development in ELT.


The Teacher Trainers 

The most influential ELT teacher trainers are those who publish “How to teach” books and articles, have on-line blogs and a big presence on social media, give presentations at ELT conferences, and travel around the world giving workshops and teacher training & development courses. Among them are: Jeremy Harmer, Penny Ur, Nicky Hockley, Adrian Underhill, Hugh Dellar, Sandy Millin, David Deubelbeiss, Jim Scrivener, Willy Cardoso, Peter Medgyes, Mario Saraceni, Dat Bao, Tom Farrell, Tamas Kiss, Richard Watson-Todd, David Hill, Brian Tomlinson, Rod Bolitho, Adi Rajan, Chris Farrell, Marisa Constantinides, Vicki Hollet, Scott Thornbury, and Lizzie Pinard. I apppreciate that this is a rather “British” list, and I’d be interested to hear suggestions about who else should be included. Apart from these individuals, the Teacher Development Special Interest Groups (TD SIGs) in TESOL and IATEFL also have some influence.

What’s the problem? 

Most current teacher trainers and TD groups pay too little attention to the question “What are we doing?”, and the follow-up question “Is what we’re doing effective?”. The assumption that students will learn what they’re taught is left unchallenged, and trainers concentrate either on coping with the trials and tribulations of being a language teacher (keeping fresh, avoiding burn-out, growing professionally and personally) or on improving classroom practice. As to the latter, they look at new ways to present grammar structures and vocabulary, better ways to check comprehension of what’s been presented, more imaginative ways to use the whiteboard to summarise it, and more engaging activities to practice it.  A good example of this is Adrian Underhill and Jim Scrivener’s “Demand High” project, which leaves unquestioned the well-established framework for ELT and concentrates on doing the same things better. In all this, those responsible for teacher development simply assume that current ELT practice efficiently facilitates language learning.  But does it? Does the present model of ELT actually deliver the goods, and is making small, incremental changes to it the best way to bring about improvements? To put it another way, is current ELT practice efficacious, and is current TD leading to significant improvement? Are teachers making the most effective use of their time? Are they maximising their students’ chances of reaching their goals?

As Bill VanPatten argues in his plenary at the BAAL 2018 conference, language teaching can only be effective if it comes from an understanding of how people learn languages.  In 1967, Pit Corder was the first to suggest that the only way to make progress in language teaching is to start from knowledge about how people actually learn languages. Then, in 1972, Larry Selinker suggested that instruction on formal properties of language has a negligible impact (if any) on real development in the learner.  Next, in 1983, Mike Long raised the issue again of whether instruction on formal properties of language made a difference in acquisition.  Since these important publications, hundreds of empirical studies have been published on everything from the effects of instruction to the effects of error correction and feedback. This research in turn has resulted in meta-analyses and overviews that can be used to measure the impact of instruction on SLA. All the research indicates that the current, deeply entrenched approach to ELT, where most classroom time is dedicated to explicit instruction, vastly over-estimates the efficacy of such instruction.

So in order to answer the question “Is what we’re doing effective?”, we need to periodically re-visit questions about how people learn languages. Most teachers are aware that we learn our first language/s unconsciously and that explicit learning about the language plays a minor role, but they don’t know much about how people learn an L2. In particular, few teachers know that the consensus of opinion among SLA scholars is that implicit learning through using the target language for relevant, communicative  purposes is far more important than explicit instruction about the language. Here are just 4 examples from the literature:

1. Doughty, (2003) concludes her chapter on instructed SLA by saying:

In sum, the findings of a pervasive implicit mode of learning, and the limited role of explicit learning in improving performance in complex control tasks, point to a default mode for SLA that is fundamentally implicit, and to the need to avoid declarative knowledge when designing L2 pedagogical procedures.

2. Nick Ellis (2005) says:

the bulk of language acquisition is implicit learning from usage. Most knowledge is tacit knowledge; most learning is implicit; the vast majority of our cognitive processing is unconscious.

3. Whong, Gil and Marsden’s (2014) review of a wide body of studies in SLA concludes:

“Implicit learning is more basic and more important  than explicit learning, and superior.  Access to implicit knowledge is automatic and fast, and is what underlies listening comprehension, spontaneous  speech, and fluency. It is the result of deeper processing and is more durable as a result, and it obviates the need for explicit knowledge, freeing up attentional resources for a speaker to focus on message content”.

4. ZhaoHong, H. and Nassaji, H. (2018) review 35 years of instructed SLA research, and, citing the latest meta-analysis, they say:

On the relative effectiveness of explicit vs. implicit instruction, Kang et al. reported no significant difference in short-term effects but a significant difference in longer-term effects with implicit instruction outperforming explicit instruction.

Despite lots of other disagreements among themselves, the vast majority of SLA scholars agree on this crucial matter. The evidence from research into instructed SLA gives massive support to the claim that concentrating on activities which help implicit knowledge (by developing the learners’ ability to make meaning in the L2, through exposure to comprehensible input, participation in discourse, and implicit or explicit feedback) leads to far greater gains in interlanguage development than concentrating on the presentation and practice of pre-selected bits and pieces of language.

One of the reasons why so many teachers are unaware of the crucial importance of implicit learning is that so few teacher trainers talk about it. Teacher trainers don’t tell their trainees about the research findings on interlanguage development, or that language learning is not a matter of assimilating knowledge bit by bit; or that the characteristics of working memory constrain rote learning; or that by varying different factors in tasks we can significantly affect the outcomes. And there’s a great deal more we know about language learning that teacher trainers don’t pass on to trainees, even though it has important implications for everything in ELT from syllabus design to the use of the whiteboard; from methodological principles to the use of IT, from materials design to assessment.

We know that in the not so distant past, generations of school children learnt foreign languages for 7 or 8 years, and the vast majority of them left school without the ability to maintain an elementary conversational exchange in the L2. Only to the extent that teachers have been informed about, and encouraged to critically evaluate, what we know about language learning, constantly experimenting with different ways of engaging their students in communicative activities, have things improved. To the extent that teachers continue to spend most of the time talking to their students about the language, those improvements have been minimal.  So why do so many teacher trainers ignore all this? Why is all this knowledge not properly disseminated?

Most teacher trainers, including Penny Ur (see below), say that, whatever its faults, coursebook-driven ELT is practical, and that alternatives such as TBLT are not. Ur actually goes as far as to say that there’s no research evidence to support the view that TBLT is a viable alternative to coursebooks. Such an assertion is contradicted by the evidence. In a recent statistical meta-analysis by Bryfonski & McKay (2017) of 52 evaluations of program-level implementations of TBLT in real classroom settings, “results revealed an overall positive and strong effect (d = 0.93) for TBLT implementation on a variety of learning outcomes” in a variety of settings, including parts of the Middle-East and East Asia, where many have flatly stated that TBLT could never work for “cultural” reasons, and “three-hours-a-week” primary and secondary foreign language settings, where the same opinion is widely voiced. So there are alternatives to the coursebook approach, but teacher trainers too often dismiss them out of hand, or simply ignore them.

How many TD courses today include a sizeable component devoted to the subject of language learning, where different theories are properly discussed so as to reveal the methodological principles that inform teaching practice?  Or, more bluntly: how many TD courses give serious attention to examining the complex nature of language learning, which is likely to lead teachers to seriously question the efficacy of basing teaching on the presentation and practice of a succession of bits of language? Today’s TD efforts don’t encourage teachers to take a critical view of what they’re doing, or to base their teaching on what we know about how people learn an L2. Too many teacher trainers base their approach to ELT on personal experience, and on the prevalent “received wisdom” about what and how to teach. For thirty years now, ELT orthodoxy has required teachers to use a coursebook to guide students through a “General English” course which implements a grammar-based, synthetic syllabus through a PPP methodology. During these courses, a great deal of time is taken up by the teacher talking about the language, and much of the rest of the time is devoted to activities which are supposed to develop “the 4 skills”, often in isolation. There is good reason to think that this is a hopelessly inefficient way to teach English as an L2, and yet, it goes virtually unchallenged.


The published work of most of the influential teacher trainers demonstrates a poor grasp of what’s involved in language learning, and little appetite to discuss it. Penny Ur is a good example. In her books on how to teach English as an L2, Ur spends very little time discussing the question of how people learn an L2, or encouraging teachers to critically evaluate the theoretical assumptions which underpin her practical teaching tips. The latest edition of Ur’s widely recommended A Course in Language Teaching includes a new sub-section where precisely half a page is devoted to theories of SLA. For the rest of the 300 pages, Ur expects readers to take her word for it when she says, as if she knew, that the findings of applied linguistics research have very limited relevance to teachers’ jobs. Nowhere in any of her books, articles or presentations does Ur attempt to seriously describe and evaluate evidence and arguments from academics whose work challenges her approach, and nowhere does she encourage teachers to do so. How can we expect teachers to be well-informed, critically acute professionals in the world of education if their training is restricted to instruction in classroom skills, and their on-going professional development gives them no opportunities to consider theories of language, theories of language learning, and theories of teaching and education? Teaching English as an L2 is more art than science; there’s no “best way”, no “magic bullet”, no “one size fits all”. But while there’s still so much more to discover, we now know enough about the psychological process of language learning to know that some types of teaching are very unlikely to help, and that other types are more likely to do so. Teacher trainers have a duty to know about this stuff and to discuss it with thier trainees.

Scholarly Criticism? Where?  

Reading the published work of leading ELT trainers is a depressing affair; few texts used for the purpose of training teachers to work in school or adult education demonstrate such poor scholarship as that found in Harmer’s The Practice of Language Teaching, Ur’s A Course in Language Teaching, or Dellar and Walkley’s Teaching Lexically, for example. Why are these books so widely recommended? Where is the critical evaluation of them? Why does nobody complain about the poor argumentation and the lack of attention to research findings which affect ELT? Alas, these books typify the general “practical” nature of TD programmes in ELT, and their reluctance to engage in any kind of critical reflection on theory and practice. Go through the recommended reading for most TD courses and you’ll find few texts informed by scholarly criticism. Look at the content of TD courses and you’ll be hard pushed to find a course which includes a component devoted to a critical evaluation of research findings on language learning and ELT classroom practice.

There is a general “craft” culture in ELT which rather frowns on scholarship and seeks to promote the view that teachers have little to learn from academics. Teacher trainers are, in my opinion, partly responsible for this culture. While it’s  unreasonable to expect all teachers to be well informed about research findings regarding language learning, syllabus design, assessment, and so on, it is surely entirely reasonable to expect the top teacher trainers to be so. I suggest that teacher trainers have a duty to lead discussions, informed by relevant scholarly texts, which question common sense assumptions about the English language, how people learn languages, how languages are taught, and the aims of education. Furthermore, they should do far more to encourage their trainees to constantly challenge received opinion and orthodox ELT practices. This surely, is the best way to help teachers enjoy their jobs, be more effective, and identify the weaknesses of current ELT practice.

My intention in this blog is to point out the weaknesses I see in the works of some influential ELT teacher trainers and invite them to respond. They may, of course, respond anywhere they like, in any way they like, but the easier it is for all of us to read what they say and join in the conversation, the better. I hope this will raise awareness of the huge problem currently facing ELT: it is in the hands of those who have more interest in the commercialisation and commodification of education than in improving the real efficacy of ELT. Teacher trainers do little to halt this slide, or to defend the core principles of liberal education which Long so succinctly discusses in Chapter 4 of his book SLA and Task-Based Language Teaching.

The Questions

I invite teacher trainers to answer the following questions:


  1. What is your view of the English language? How do you transmit this view to teachers?
  2. How do you think people learn an L2? How do you explain language learning to teachers?
  3. What types of syllabus do you discuss with teachers? Which type do you recommend to them?
  4. What materials do you recommend?
  5. What methodological principles do you discuss with teachers? Which do you recommend to them?



Bryfonski, L., & McKay, T. H. (2017). TBLT implementation and evaluation: A meta-analysis. Language Teaching Research.

Dellar, H. and Walkley, A. (2016) Teaching Lexically. Delata.

Doughty, C. (2003) Instructed SLA. In Doughty, C. & Long, M. Handbook of SLA, pp 256 – 310. New York, Blackwell.

Long, M. (2015) Second Language Acquisition and Task-Based Language Teaching. Oxford, Wiley.

Ur, P. A Course in Language Teaching. Cambridge, CUP.

Whong, M., Gil, K.H. and Marsden, H., (2014). Beyond paradigm: The ‘what’ and the ‘how’ of classroom research. Second Language Research, 30(4), pp.551-568.

ZhaoHong, H. and Nassaji, H. (2018) Introduction: A snapshot of thirty-five years of instructed second language acquisition. Language Teaching Research, in press.


SLA Part 5: Pienemann’s Processability Theory

This theory started out as the Multidimensional Model, which came from work done by the ZISA group mainly at the University of Hamburg in the late seventies.  One of the first findings of the group was that all the children and adult learners of German as a second language in the study adhered to a five-stage developmental sequence.

Stage X – Canonical order (SVO)

die kinder spielen mim bait   the children play with the ball

Stage X + I- Adverb preposing (ADV)

da kinder spielen   there children play

Stage X + 2- Verb separation (SEP)

alle kinder muss die pause machen  all children must the break have

Stage X+3- Inversion (INV)

dam hat sie wieder die knock gebringt  then has she again the bone brought

Stage X+4- Verb-end (V-END)

er sagte, dass er nach house kommt  he said that he home comes

Learners didn’t abandon one interlanguage rule for the next as they progressed; they added new ones while retaining the old, and thus the presence of one rule implies the presence of earlier rules.

The explanation offered for this developmental sequence is that each stage reflects the learner’s use of three speech-processing strategies. Clahsen and Pienemann argue that processing is “constrained” by the strategies and development consists of the gradual removal of these constraints, or the “shedding of the strategies”, which allows the processing of progressively more complex structures. The strategies are:

(i) The Canonical Order Strategy.  The construction of sentences at Stage X obeys simple canonical order that is generally assumed to be “actor – action – acted upon.”  This is a pre-linguistic phase of acquisition where learners build sentences according to meaning, not on the basis of any grammatical knowledge.

(ii) The Initialisation-Finalisation Strategy. Stage X+1 occurs when learners notice discrepancies between their rule and input.  But the areas of input where discrepancies are noticed are constrained by perceptual saliency – it is easier to notice differences at the beginnings or the ends of sentences since these are more salient than the middle of sentences. As a result, elements at the initial and final positions may be moved around, while leaving the canonical order undisturbed.

Stage X+2 also involves this strategy, but verb separation is considered more difficult than adverb fronting, because the former requires not just movement to the end position but also disruption of a continuous constituent, the verb + particle, infinitive, or particle.

Stage X+3 is even more complex, since it involves both disruption and movement of an internal element to a non-salient position, and so requires the learner to abandon salience and recognise different grammatical categories.

(iii) The Subordinate Clause Strategy.  This is used in Stage X+4 and requires the most advanced processing, skills because the learner has to produce a hierarchical structure, which involves identifying sub-strings within a string and moving elements out of those sub-strings into other positions.

These constraints on interlanguage development are argued to be universal; they include all developmental stages, not just word order, and they apply to all second languages, not just German.

The ZISA model also proposed a variational dimension to SLA, and hence the name “Multidimensional”.  While the developmental sequence of SLA is fixed by universal processing restraints, individual learners follow different routes in SLA, depending primarily on whether they adopt a predominantly “standard” orientation, favouring accuracy, or a predominantly “simplifying” one, favouring communicative effectiveness.

Processability Theory

Pienemann ‘s next development (1998) is to expand the Multidemensional model into a Processability Theory, which predicts which grammatical structures an L2 learner can process at a given level of development.

This capacity to predict which formal hypotheses are processable at which point in development provides the basis for a uniform explanatory framework which can account for a diverse range of phenomena related to language development (Pienemann, 1998: xv).

The important thing about this theory is that while Pienemann describes the same route as other scholars have done for interlanguage development, in addition now he is offering an explanation for why interlanguage grammars develop in the way they do. His theory proposes that

for linguistic hypotheses to transform into executable procedural knowledge the processor needs to have the capacity of processing those hypotheses (Pienemann, 1998: 4).

Pienemann, in other words, argues that there will be certain linguistic hypotheses that, at a particular stage of development, the L2 learner cannot access because he or she doesn’t have the necessary processing resources available. At any stage of development, the learner can produce and comprehend only those L2 linguistic forms which the current state of the language processor can handle.

The processing resources that have to be acquired by the L2 learner will, according to Processability Theory, be acquired in the following sequence:

  1. lemma access,
  2. the category procedure,
  3. the phrasal procedure
  4. the S-procedure,
  5. the subordinate clause procedure – if applicable. (Pienemann, 1998: 7)

The theory states that each procedure is a necessary prerequisite for the following procedure, and that

the hierarchy will be cut off in the learner grammar at the point of the missing processing procedures and the rest of the hierarchy will be replaced by a direct mapping of conceptual structures onto surface form (Pienemann, 1998: 7).

The SLA process can therefore be seen as one in which the L2 learner entertains hypotheses about the L2 grammar and that this “hypothesis space” is determined by the processability hierarchy.


In this account of the SLA process, the mechanism at work is an information processing device, which is constrained by limitations in its ability to process input. The device adds new rules while retaining the old ones, and as the limiting “speech-processing strategies” which constrain processing are removed, this allows the processing of progressively more complex structures.

What is most impressive about the theory (it provides an explanation for the interlanguage development route) is also most problematic, since the theory takes as self-evident that our cognition works in the way the model suggests. We are told that people see things in a canonical order of “actor – action – acted upon.”, that people prefer continuous to discontinuous entities, that the beginnings and ends of sentences are more salient than the middles of sentences and so on, without being offered much justification for such a view, beyond the general assumption of what is easy and difficult to process. As Towell and Hawkins say of the Multidimensional Model:

They require us to take on faith assumptions about the nature of perception. The perceptual constructs are essentially mysterious, and what is more, any number of new ones may be invented in an unconstrained way (Towell and Hawkins, 1994: 50).

This criticism isn’t actually as damning as it might appear – there are obviously good reasons to suppose that simple things will be more easily processed than complex ones, there is a mountain of evidence from L1 acquisition studies to support some of the claims, and, of course, whatever new assumptions “may be invented” can be dealt with if and when they appear. As Pienemann makes clear, the assumptions he makes are common to most cognitive models, and, importantly, they result in making predictions that are highly falsifiable.

Apart from some vagueness about precisely how the processing mechanism works, and exactly what constitutes the acquisition of each level, the theory has little to say about transfer, and deals with a limited domain, restricting itself to an account of processing that accounts for speech production, and avoiding any discussion of linguistic theory.

In brief, the two main strengths of this theory are that it provides not just a description, but an explanation of interlanguage development, and that it is testable. The explanation is taken from experimental psycholinguistics, not from the data, and is thus able to make wide, strong predictions, and to apply to all future data.  The predictions the theory makes are widely-applicable and, to some extent, testable: if we can find an L2 learner who has skipped a stage in the developmental sequence, then we will have found empirical evidence that challenges the theory. Since the theory also claims that the constraints on processability are not affected by context, even classroom instruction should not be able to change or reduce these stages.

The Teachability Hypothesis 

Which brings us to the most important implication of Pieneamann’s theory: the Teachability Hypothesis. First proposed in 1984, this predicts that items can only be successfully taught when learners are at the right stage of interlanguage development to learn them. Note immediately that neither Pienemann or anybody else is claiming to know anything but the outlines of the interlanguage development route. We don’t have any route map, and even if we did, and even if we could identify the point where each of our students was on the map (i.e., where he or she was on his or her her interlanguage trajectory) this wouldn’t mean that explicit teaching of any particular grammar point or lexical chunk, for example, would lead to procedural knowledge of it. No; what Pienamann’s work does is to give further support to the view that interlanguage development is a cognitive process involving slow, dynamic reformulation and constrained by processing limitations.

Whether Pienemann’s theory gives a good explanation of SLA is open to question, to be settled by an appeal to empirical research and more critical interrogation of the constructs. But there’s no question that Pieneamann’s research adds significantly to the evidence for the claim that SLA is a process whose route is unaffected by teaching. In order to respect our students interlanguage development, we must teach in such a way that they are given the maximum opportunities to work things out for themselves, and avoid the mistake of trying to teach them things they’re not ready, or motivated, to learn.


For a good discussion of Pienemann’s theory, see the peer commentaries in the first issue of  Biligualism: Language and Cognition: Vol. 1, Number 1, 1998: entirely devoted to Processibility Theory.

SLA Part 4: Schmidt’s Noticing Hypothesis

(Note: Following a few emails I’ve received, I should make it clear that unless referring to UG, I use the word “grammar” in the sense that linguists use it; viz., “knowledge of a language”.)

Schmidt, undeterred by McLaughlin’s warning to stay clear of attempts to define “consciousness” , attempts to do away with its “terminological vagueness” by examining three senses of the term:

  1. consciousness as awareness,
  2. consciousness as intention,
  3. consciousness as knowledge.

1 Consciousness as awareness

Schmidt distinguishes between three levels of awareness: Perception, Noticing and Understanding. The second level, Noticing, is the key to Schmidt’s eventual hypothesis. Noticing is focal awareness.

When reading, for example, we are normally aware of (notice) the content of what we are reading, rather than the syntactic peculiarities of the writer’s style, the style of type in which the text is set, music playing on a radio in the next room, or background noise outside a window.  However, we still perceive these competing stimuli and may pay attention to them if we choose (Schmidt, 1990: 132).

Noticing refers to a private experience, but it can be operationally defined as “availability for verbal report”, and these reports can be used to both verify and falsify claims concerning the role of noticing in cognition.

2 Consciousness as intention

This distinguishes between awareness and intention behaviour. “He did it consciously”, in this second sense, means “He did it intentionally.” Intentonal learning is not the same as noticing.

3 Cnsciousness as knowledge

Schmidt suggests that 6 different contrasts (C) need to be distinguished:

C1: Unconscious learning refers to unawareness of having learned something.

C2: Conscious learning refers to noticing and unconscious learning to picking up stretches of speech without noticing them.  Schmidt calls this the “subliminal”  learning question: is it possible to learn aspects of a second language that are not consciously noticed?

C3: Conscious learning refers to intention and effort.  This is the incidental learning question: if noticing is required, must learners consciously pay attention?

C4: Conscious learning is understanding principles of the language, and unconscious learning is the induction of such principles.  This is the implicit learning question: can second language learners acquire rules without any conscious understanding of them?

C5: Conscious learning is a deliberate plan involving study and other intentional learning strategies, unconscious learning is an unintended by-product of communicative interaction.

C6: Conscious learning allows the learner to say what they appear to “know”.

Addressing C2, Schmidt points to diasagreement on a definition of intake. While Krashen seems to equate intake with comprehensible input, Corder distinguishes between what is available for going in and what actually goes in, but neither Krashen nor Corder explains what part of input functions as intake for the learning of form.   Schmidt also notes the distinction Slobin (1985), and Chaudron (1985) make between preliminary intake (the processes used to convert input into stored data that can later be used to construct language), and final intake (the processes used to organise stored data into linguistic systems).

Schmidt proposes that all this confusion is resolved by defining intake as:

that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139).

The implication of this is that:

subliminal language learning is impossible, and that noticing is the necessary and sufficient condition for converting input into intake (Schmidt, 1990:  130).

The only study mentioned by Schmidt in support of his hypothesis is by Schmidt and Frota (1986) which examined Schmidt’s own attempts to learn Portuguese, and found that his notes matched his output quite closely.  Schmidt himself admits that the study does not show that noticing is sufficient for learning, or that noticing is necessary for intake.  Nevertheless, Schmidt does not base himself on this study alone; there is, Schmidt claims evidence from a wider source:

… the primary evidence for the claim that noticing is a necessary condition for storage comes from studies in which the focus of attention is experimentally controlled. The basic finding, that memory requires attention and awareness, was established at the very beginning of research within the information processing model (Schmidt, 1990: 141).

Addressing C3, the issue of incidental learning versus paying attention, Schmidt acknowledges that the claim that conscious attention is necessary for SLA runs counter to both Chomsky’s rejection of any role for conscious attention or choice in L1 learning, and the arguments made by Krashen, Pienemann and others for the existence of a natural order or a developmental sequence in SLA.  Schmidt says that Chomsky’s arguments do not necessarily apply to SLA, and that

natural orders and acquisition sequences do not pose a serious challenge to my claim of the importance of noticing in language learning, …they constrain but do not eliminate the possibility of a role for selective, voluntary attention (Schmidt, 1990: 142).

Schmidt accepts that “language learners are not free to notice whatever they want” (Schmidt, 1990: 144), but, having discussed a number of factors that might influence noticing, such as expectations, frequency, perceptual salience, skill level, and task demands, concludes that

those who notice most, learn most, and it may be that those who notice most are those who pay attention most.  (Schmidt, 1990: 144)

As for C4, the issue of implicit learning versus learning based on understanding, Schmidt judges the question of implicit second language learning to be the most difficult “because it cannot be separated from questions concerning the plausibility of linguistic theories” (Schmidt, 1990: 149). But Schmidt rejects the “null hypothesis” which claims that, as he puts it, “understanding is epiphenomenal to learning, or that most second language learning is implicit” (Schmidt, 1990: 149).


Schmidt’s hypothesis caused an immediate stir within the academic community and quickly became widely-accepted.  It caused Mike Long to re-write his Interaction hypothesis and has been used by many scholars as the basis for studies of SLA. More importantly for my thesis, “noticing” is being increasingly used by teacher trainers, often with scant understanding of it, to justify concentrating on explicit grammar teaching.

I have the following criticisms to make of Schmidt’s noticing hypothesis.

1. Empirical support for the Noticing Hypothesis is weak

In response to a series of criticisms of his original 1990 paper, Schmidt’s 2001 paper gives various sources of evidence of noticing, all of which have been subsequently challenged:

a) Schmidt says learner production is a source of evidence, but no clear method for identifying what has been noticed is given.

b) Likewise, learner reports in diaries. Schmidt cites Schmidt & Frota (1986), and Warden, Lapkin, Swain and Hart (1995), but, as Schmidt himself points out, diaries span months, while cognitive processing of L2 input takes place in seconds. Furthermore, as Schmidt admits, making diaries requires not just noticing but reflexive self-awareness.

c) Think-aloud protocols. Schmidt agrees with the objection made to such protocols that studies based on them cannot assume that the protocols include everything that is noticed.  Schmidt cites Leow (1997), Jourdeais, Ota, Stauffer, Boyson, and Doughty (1995) who used think-aloud protocols in focus-on-form instruction, and Schmidt concludes that such experiments cannot identify all the examples of target features that were noticed.

d) Learner reports in a CALL context (Chapelle, 98) and programs that track the interface between user and program – recording mouse clicks and eye movements (Crosby 1998). Again, Schmidt concedes that it is still not possible to identify with any certainty what has been noticed.

e) Schmidt claims that the noticing hypothesis could be falsified by demonstrating the existence of subliminal learning either by showing positive priming of unattended and unnoticed novel stimuli or by showing learning in dual task studies in which central processing capacity is exhausted by the primary task. The problem in this case is that in positive priming studies one can never really be sure that subjects did not allocate any attention to what they could not later report, and similarly, in dual task experiments one cannot be sure that no attention is devoted to the secondary task. Jacoby, Lindsay, & Toth (1996, cited in Schmidt, 2001: 28) argue that the way to demonstrate true non-attentional learning is to use the logic of opposition, to arrange experiments where unconscious processes oppose the aims of conscious processes.

f) Merikle and Cheesman distinguish between the objective and subjective thresholds of perception. The clearest evidence that something has exceeded the subjective threshold and been consciously perceived or noticed is a concurrent verbal report, since nothing can be verbally reported other than the current contents of awareness. Schmidt argues that this is the best test of noticing, and that after the fact recall is also good evidence that something was noticed, providing that prior knowledge and guessing can be controlled.  For example, if beginner level students of Spanish are presented with a series of Spanish utterances containing unfamiliar verb forms, are forced to recall immediately afterwards the forms that occurred in each utterance, and can do so, that is good evidence that they did notice them. On the other hand, it is not safe to assume that failure to do so means that they did not notice.  It seems that it is easier to confirm that a particular form has not been noticed than that it has: failure to achieve above-chance performance in a forced-choice recognition test is a much better indication that the subjective threshold has not been exceeded and that noticing did not take place.

g) Truscott (1998) points out that the reviews by Brewer (1974) and Dawson and Schell (1987), cited by Schmidt, 1990), dealt with simple conditioning experiments and that, therefore, inferences regarding learning an L2 were not legitimate. Brewer specifically notes that his conclusions do not apply to the acquisition of syntax, which probably occurs ‘in a relatively unconscious ,  automatic fashion’ (p . 29). Truscott further points out that while most current research on unconscious learning is plagued by continuing controversy, “one can safely conclude that the evidence does not show that awareness of the information to be acquired is necessary for learning” (p. 108).

h) Altman (1990) gathered data in a similar way to Schmidt (1986) in studying her learning of Hebrew over a five-year period. Altman found that while half her verbalisation of Hebrew verbs could be traced to diary entries of noticing, it was not possible to identify the source of the other half and they may have become intake subconsciously.

i) Alanen’s (1992) study of Finnish L2 learning found no significant statistical difference between an enhanced input condition group and the control group.

j) Robinson’s (1997) study found mixed results for noticing under implicit, incidental, rule-search and instructed conditions.

Furthermore, studies of ‘noticing’ have been criticised for serious methodological problems:

i) The studies are not comparable due to variations in focus and in the conditions operationalized.

ii) The level of noticing in the studies may have been affected by variables which casts doubt on the reliability of the findings.

iii) Cross (2002) notes that “only Schmidt and Frota’s (1986) and Altman’s (1990) research considers how noticing target structures positively relates to their production as verbal output (in a communicative sense), which seems to be the true test of whether noticing has an effect on second language acquisition. A dilemma associated with this is that, as Fotos (1993) states, there is a gap of indeterminate length between what is noticed and when it appears as output, which makes data collection, analysis and correlation problematic.”

iv) Ahn (2014) points to a number of problems that have been identified in eye-tracking studies, especially those using heat map analyses. (See Ahn (2014) for the references that follow.)Heat maps are only “exploratory” (p. 239), and they cannot provide temporal information on eye movement, such as regression duration, “the duration of the fixations when the reader returns to the lookzone” (Simard & Foucambert, 2013, p. 213), which might tempt researchers to rush into a conclusion that favors their own predictions. Second, as Godfroid et al. (2013) accurately noted, the heat map analyses in Smith (2012) could not control the confounding effects of “word length, word frequency, and predictability, among other factors” (p. 490). This might have yielded considerable confounding effects as well. As we can infer from the analyses shown in Smith (2012), currently the utmost need in the field is for our own specific guidelines for using eye-tracking methodology to conduct research focusing on L2 phenomena (Spinner, Gass, & Behney, 2013). Because little guidance is available, the use of eye tracking is often at risk of misleading researchers into making unreliable interpretations of their results.


2 The construct of “noticing” is not clearly defined. Thus, it’s not clear what exactly it refers to, and, as has already been suggested above, there’s no way of assertaining when it is, and when it isn’t being used by L2 learners.

Recall that in his original 1990 paper, Schmidt claimed that “intake” was the sub-set of  input which is noticed, and that the parts of input that aren’t noticed are lost. Thus, Schmidt’s Noticing Hypothesis, in its 1990 version, claims that noticing is the necessary condition for learning an L2. Noticing is said to be the first stage of the process of converting input into implicit knowledge. It takes place in short-term memory (where, according to the original claim, the noticed ‘feature’ is compared to features produced as output) and it is triggered by these factors: instruction, perceptual salience, frequency, skill level, task demands, and comparing.

But what is it? It’s “focused attention”, and, Schmidt argues, attention research supports the claim that consciousness in the form of attention is necessary for learning, Truscott (1998) points out that such claims are “difficult to evaluate and interpret”. He cites a number of scholars and studies to support the view that the notion of attention is “very confused”, and that it’s “very difficult to say exactly what attention is and to determine when it is or is not allocated to a given task. Its relation to the notoriously confused notion of consciousness is no less problematic”. He concludes (1998, p. 107) “The essential point is that current  research and theory on attention, awareness and learning are not clear enough to  support any strong claims about relations among the three.”

In an attempt to clarify matters and answer his critics, Schmidt re-formulated his Noticing Hypothesis in 2001. A number of concessions are made, resulting in a much weaker version of the hypothesis. To minimise confusion, Schmidt says  he will use ‘noticing’ as a technical term equivalent to what Gass (1988) calls  “apperception”, what Tomlin and Villa (1994) call “detection within selective attention,” and what Robinson’s (1995) calls “detection plus rehearsal in short term memory.” So now, what is noticed are “elements of the surface structure of utterances in the input, instances of language” and not “rules or principles of which such instances may be exemplars”. Noticing does not refer to comparisons across instances or to reflecting on what has been noticed.

In a further concession, in the section “Can there be learning without attention?”, Schmidt admits there can, with the L1 as a source that helps learners of an L2 being an obvious example. Schmidt says that it’s “clear that successful second language learning goes beyond what is present in input”. Schmidt presents evidence which, he admits, “appears to falsify the claim that attention is necessary for any learning whatsoever”, and this prompts him to propose the weaker version of the Noticing Hypothesis, namely “the more noticing, the more learning”.

There are a number of problems with this reformulation.

Gass: Apperception

As was mentioned, Schmidt (2001) says that he is using ‘noticing’ as a technical term equivalent to Gass’ apperception. True to dictionary definitions of apperception, Gass defines apperception as “the process of understanding by which newly observed qualities of an object are initially related to past experiences”. The light goes on, the learner realises that something new needs to be learned. It’s “an internal cognitive act in which a linguistic form is related to some bit of existing knowledge (or gap in knowledge)”. It shines a spotlight on the identified form and prepares it for further analysis. This seems to clash with Schmidt’s insistence that noticing does not refer to comparisons across instances or to reflecting on what has been noticed, and in any case, Gass provides no clear explanation of how the subsequent stages of her model convert apperceptions into implicit knowledge of the L2 grammar.

Tomlin and Villa: Detection

Schmidt says that ‘noticing’ is also equivalent to what Tomlin and Villa (1994) call “detection within selective attention.” But is it? Surely Tomlin and Villa’s main concern is detection that does not require awareness. According to Tomlin and Villa, the three components of attention are alertness, orientation, and detection, but only detection is essential for further processing and awareness plays no important role in L2 learning.

Carroll: input doesn’t contain mental constructs; therefore they can’t be noticed

As Gregg commented when I discussed Scmidt’s hypthesis in my earlier blog “You can’t notice grammar!” Schmidt’s 2010 paper attempts to deal with Suzanne Carroll’s objection by first succinctly summarising Carroll’s view that attention to input plays little role in L2 learning because most of what constitutes linguistic knowledge is not in the input to begin with. She argues that Krashen, Schmidt and Gass all see “input” as observable sensory stimuli in the environment from which forms can be noticed,

whereas in reality the stuff of acquisition (phonemes, syllables, morphemes, nouns, verbs, cases, etc.) consists of mental constructs that exist in the mind and not in the environment at all. If not present in the external environment, there is no possibility of noticing them (Carroll, 2001, p.47).

Schmidt’s answer is:

In general, ideas about attention, noticing, and understanding are more compatible with instance-based, construction-based and usage-based theories (Bley-Vroman, 2009; Bybee & Eddington, 2006; Goldberg, 1995) than with generative theories.

It seems that Schmidt, in an attempt to save his hypothesis, is prepared, to ditch what Carroll refers to as “100 years of linguistic research, which  demonstrates that linguistic cognition is structure dependent”, and to adopt the connectionist view that linguistic knowledge is encoded as activated neural nets, and that it is linked to acoustic events by no more than association.

I think it’s worth quoting a bit more from Carroll’s impressive 2001 book. Commenting on all those who start with input, she says:

The view that input is comprehended speech is mistaken  and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!” 

Learners do not attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal.  Thus, it is possible to have speech-signal processing without attention-as-noticing or attention-as-awareness. Learners may unconsciously and without awareness detect, encode and respond to linguistic sounds; learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; the processing of forms and meanings are often not noticed; and attention is thus the result of processing not a prerequisite for processing.


In brief:

1. In his 2010 paper, Schmidt confirms the concessions made in 2001, which amount to saying that ‘noticing’ is not needed for all L2 learning, but that the more you notice the more you learn. He also confirms that noticing does not refer to reflecting on what is noticed.

2. The Noticing Hypothesis even in its weaker version doesn’t clearly describe the construct of ‘noticing’.

3. The empirical support claimed for the Noticing Hypothesis is not as strong as Schmidt (2010) claims.

4. A theory of SLA based on noticing a succession of forms faces the impassable obstacle that, as Schmidt seemed to finally admit, you can’t ‘notice’ rules, or principles of grammar.

5. “Noticing the gap” is not sanctioned by Schmidt’s ammended Noticing Hypothesis.

6. The way that so many writers and ELT trainers use “noticing” to justify all kinds of explicit grammar and vocabulary teaching demonstrates that Scmidt’s Noticing Hypothesis is widely misunderstood and misused.



Ahn, J. I. (2014) Attention, Awareness, and Noticing in SLA: A Methodological Review.  MSU Working Papers in SLS, Vol. 5.

Carroll, S. (2001) Input and Evidence. Amsterdam; Benjamins.

Corder, P. (1967) The significance of learners’ errors. International Review of Applied Linguistics, 5, 161-169

Cross, J. (2002) ‘Noticing’ in SLA: Is it a valid concept? Downloaded from  http://tesl-ej.org/ej23/a2.html

Ellis, N. (1998) Emergentism, Connectionism and Language Learning. Language Learning 48:4,  pp. 631–664.

O’Grady, W. (2005) How Children learn language. CUP.

Schmidt,R.W. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. and Frota, S.N.  (1986) Developing  basic  conversational  ability in  a  second language:  a  case  study of an adult learner of Portuguese . In Day , R.R., editor,  Talking to learn: conversation in second language acquisition. Rowley, MA: Newbury.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Truscott, J. (1998)  Noticing in second language acquisition: a critical review. Second Language Research, 2.

SLA: Behaviourism and Mentalism

The Shift From a Behaviourist to a Cognitivist View of SLA

Before proceeding with the review of SLA, I need to recap the story so far, in order to highlight the difference between two contradictory epistemologies. I do so for two reasons. Firstly, we are seeing a return to behaviourism in the guise of increasingly popular, and increasingly misinterpreted, usage-based theories of language learning such as emergentism. The epistemological underpinnings of these theories are rarely mentioned, particularly by ELT teacher trainers who either clumsily endorse them or airily dismiss them. Secondly, it gives me an opportunity to restate the implications of the shift to a more cognitive view of the SLA process.


Behaviourism has much in common with logical positivism, the most spectacularly misguided movement in the history of philosophy. Chasing the chimera of absolute truth, the logical positivists, most famously, those in the Vienna Circle, formed in the early 1920s in order to clean up language and put science on a sure empirical footing. The mad venture was all over before the second world war broke out, but not so behaviourism, which slightly preceded it with the 1913 work of pioneering American psychologist John B. Watson, and went on to outlive in when B.F. Skinner took over after WW2.

Watson, influenced by the work of Pavlov (1897) and Bekhterev (1896) on conditioning of animals, but also, later, by the works of Mach (1924) and Carnap (1927) from the Vienna School, attempted to make psychological research “scientific” by using only objective procedures, such as laboratory experiments which were designed to establish statistically significant results. Watson formulated a stimulus-response theory of psychology according to which all complex forms of behaviour are explained in terms of simple muscular and glandular elements that can be observed and measured.  No mental “reasoning”, no speculation about the workings of any “mind”, were allowed. Thousands of researchers adopted this methodology, and from the end of the first world war until the 1950s an enormous amount of research on learning in animals and in humans was conducted under this strict empiricist regime.

In 1950 behaviourism could justly claim to have achieved paradigm status, and at that moment B.F. Skinner became its new champion.  Skinner’s contribution to behaviourism was to challenge the stimulus-response idea at the heart of Watson’s work and replace it by a type of psychological conditioning known as reinforcement (see Skinner, 1957, and Toates and Slack, 1990).  Important as this modification was, it is Skinner’s insistence on a strict empiricist epistemology, and his claim that language is learned in just the same way as any other complex skill is learned, by social interaction, that is important here.

The strictly empiricist epistemology of  behaviourism outlaws any talk of mental structure or of internal mental states. While it’s perfectly OK to talk about these things in every day parlance, they have no place in scientific discourse. Strictly speaking –  which is how scientists, including psychologists should speak – there is no such thing as the mind, and there is no sense (sic) in talking about feelings or any other stuff that can’t be observed by appeal to the senses. Behaviourism sees psychology as the science of behaviour, not the science of mind. Behaviour can be described and explained without any ultimate reference to mental events or to any internal psychological processes. The sources of behaviour are external (in the environment), not internal (in the mind). If mental terms or concepts are used to describe behaviour, then they must be replaced by behavioural terms or paraphrased into behavioural concepts.

Behaviour is all there is: humans and animals are organisms that can be observed doing things, and the things they do are explained in terms of responses to their environment, which also explains all types of learning.  Learning a language is like learning anything else – it’s the result of repeated responses to stimuli.  There are no innate rules by which organisms learn, which is to say that organisms learn without being innately or pre-experientially provided with explicit procedures by which to learn. Before organisms interact with the environment they know nothing – by definition. Learning doesn’t consist of rule-governed behaviour; learning is what organisms do in response to stimuli. An organism learns from what it does, from its successes and mistakes, as it were.

The minimalist elegance of such a stark view is impressive, even attractive, – especially if you’re sick of trying to make sense of Freud, Jung, or Adler, perhaps – but it makes explaining unobservable phenomena, whatever they happen to be, problematic, to say the least. Still, for Amerrican scholars immersed in the field of foreign language learning in the post WW2 era, a field not exactly renowned  for its contributions to philosophy or scientific method, behaviourism had a lot going for it: an easily-grasped theory with crystal clear pedagogic implications. The opposition to the Chomskian threat was entirely understandable, but, historically at least, we may note that their case collapsed like a house of cards. Casti (1989) points out that a Kuhnian paradigm shift is nowhere more completely and swiftly brought about in the 20th century than by Chomsky in linguistics.

In his 1957 Verbal Behaviour, Skinner put forward his view that language learning is a  process of habit formation involving associations between an environmental stimulus and a particular automatic response, produced through repetition with the help of reinforcement. This view of learning was challenged by Chomsky’s (1959) Review of Skinner’s Verbal Behaviour, where he argued that language learning was quite different from other types of learning and could not be explained in terms of habit-formation. Chomsky’s revolutionary argument, begun in Syntactic Structures (1957), and consequently developed in Aspects of the Theory of Syntax (1965) and Knowledge of Language (1986) was that all human beings are born with an innate grammar – a fixed set of mental rules that enables children to create and utter sentences they have never heard before. Chomsky asserted that language learning was a uniquely human capacity, a result of Homo Sapiens’s possession of what Chomsky at first referred to as a Language Acquisition Device. Chomsky developed his theory and later claimed that language consists of a set of abstract principles that characterise the core grammars of all natural languages, and that the task of learning one’s L1 is thus simplified since one has an innate mechanism that constrains possible grammar formation.  Children do not have to learn those features of the particular language to which they are exposed that are universal, because they know them already.  The job of the linguistic was to describe this generative, or universal, grammar, as rigorously as possible.

So the lines are clearly drawn. For Skinner, language learning is a behavioural phenomenon, for Chomsky, it’s a mental phenomenon. For Skinner, verbal behaviour is the source of learning; for Chomsky it’s the manifestation of what had been learned. For Skinner, talk of innate knowledge is little short of gibberish; for Chomsky it’s the best explanation he can come up with for the knowledge children have of language.


In SLA Part 1, I described how, under the sway of a behaviourist paradigm, researchers in SLA viewed the learner’s L1 as a source of interference, resulting in errors. In SLA Part 2, I described how, under the new influence of a mentalist paradigm, researchers now viewed learners as drawing on their innate language learning capacity to construct their own distinct linguistic system, or  interlanguage. The view of learning an L2 changes from one of accumulating new habits while trying to avoid mistakes (which only entrench bad past habits), to one of a cognitive process, where errors are evidence of the learner’s ‘creative construction’ of the L2.  Research into learner errors and into learning specific grammatical features, gave clear evidence to support the mentalist view. The research showed that all learners, irrespective of their L1, seemed to make the same errors, which in turn supported the view that learners were testing hypotheses about the target language on the basis of their limited experience, and making appropriate adjustments to their developing interlanguage system. Far from being evidence of non-learning, errors were thus clear signs of interlanguage development.

Furthermore, and very importantly in terms of its pedagogic implications, interlanguage development, seen as a kind of built-in syllabus, could be observed following the same route, regardless of differences in the L1 or of the linguistic environment. It was becoming clear that (leaving aside the question of maturational constraints for a moment) learning an L2 involved moving along a universal route which was unaffected by the L1, or by the learning environment – classroom, workplace, home, wherever. Just as importantly, the research showed that L2 learning is not a matter of successively accumulating parts of the language one bit after the other. Rather, SLA is a dynamic process involving the gradual development of a complex system. Learners can sometimes take several months to fully acquire a particular  feature, and the learning process is anything but linear: it involves slowly and unsystematically moving through a series of transitional stages, including zigzags, u-shaped patterns, stalls, and plateaus, as learners’ interlanguages are constantly adjusted, reformulated, and rebuilt in such a way that they gradually approximate more to the L1 model.

A picture is thus emerging of SLA as a learning process with two important characteristics.

  1. Knowledge of the L2 develops along a route which is impervious to instruction, and
  2. it develops in a dynamic, nonlinear way, where lots of different parts of the developing system are being worked on at the same time.

As we continue the review, we’ll look at declarative and procedural knowledge, explicit and implicit knowledge, and explicit and implicit learning, and this will indicate the third important characteristic of the SLA process:

3. Implicit learning is the default mechanism for learning an L2.

We’ll then be in a stronger position to argue that teacher trainers who advise their trainees to devote the majority of classroom time to the explicit teaching of a sequence of formal elements of the L2 are grooming those trainees for failure.


For References See “Bibliography ..” in Header 

SLA Part 3: From Krashen to Schmidt

Developing a transition theory

What is the process of SLA? How do people get from no knowledge of the L2 to some level of proficiency? Before going on, I should make it clear that I’m only looking at psycholinguistic theories, thus ignoring important social aspects of L2 learning and, even within the realm of cognition, leaving out such factors as aptitude and motivation.

 Krashen’s Monitor Model

Krashen’s (1977a, 1977b, 1978, 1981, 1982, 1985) Monitor Model  came hard on the heels of Corder’s work, and contains the following five hypotheses:

The Acquisition-Learning Hypothesis.

Adults have two ways of developing L2 competence:

  1. via acquisition, that is, picking up a language naturally, more or less like children do their L1, by using language for communication. This is a subconscious process and the resulting acquired competence is also subconscious.
  2. via language learning, which is a conscious process and results in formal knowledge of the language.

For Krashen, the two knowledge systems are separate.”Acquired” knowledge is what explains communicative competence.  Knowledge gained through “learning” can’t be internalised and thus serves only the very minor role of acting as a monitor of the acquired system, checking the correctness of utterances against the formal knowledge stored therein.

 The Natural Order Hypothesis

The rules of language are acquired in a predictable way, some rules coming early and others late. The order is not determined solely by formal simplicity, and it is independent of the order in which rules are taught in language classes.

The Monitor Hypothesis

The learned system has only one, limited, function: to act as a Monitor.  Further, the Monitor cannot be used unless three conditions are met:

  1. Enough time. “In order to think about and use conscious rules effectively, a second language performer needs to have sufficient time” (Krashen, 1982:12).
  2. Focused on form “The performer must also be focused on form, or thinking about correctness” (Krashen, 1982: 12).
  3. Knowledge of the rule.

The Input Hypothesis

Second languages are acquired by understanding language that contains structure “a bit beyond our current level of competence (i + 1) by receiving “comprehensible input”.  “When the input is understood and there is enough of it, i + 1 will be provided automatically.  Production ability emerges.  It is not taught directly”  (Krashen, 1982: 21-22).

The Affective Filter Hypothesis

The Affective Filter is “that part of the internal processing system that subconsciously screens incoming language based on … the learner’s motives, needs, attitudes, and emotional states” (Dulay, Burt, and Krashen, 1982: 46). If the affective Filter is high, (because of lack of motivation, or dislike of the L2 culture, or feelings of inadequacy, for example) input is prevented from passing through and hence there is no acquisition.  The Affective Filter is responsible for individual variation in SLA (it is not something children use) and explains why some learners never acquire full competence.


The biggest problem with Krashen’s account is thatThere is no way of testing the Acquisition-Learning hypothesis: we are given no evidence to support the claim that two distinct systems exist, nor any means of determining whether they are, or are not, separate.  Similarly, there is no way of testing the Monitor hypothesis: with no way to determine whether the Monitor is in operation or not, it is impossible to determine the validity of its extremely strong claims. The Input Hypothesis is equally mysterious and incapable of being tested: the levels of knowledge are nowhere defined and so it is impossible to know whether i + 1 is present in input, and, if it is, whether or not the learner moves on to the next level as a result.  Thus, the first three hypotheses make up a circular and vacuous argument.  The Monitor accounts for discrepancies in the natural order, the learning-acquisition distinction justifies the use of the Monitor, and so on.

Further, the model lacks explanatory adequacy. At the heart of the model is the Acquisition-Leaning Hypothesis which simply states that L2 competence is picked up through comprehensible input in a staged, systematic way, without giving any explanation of the process by which comprehensible input leads to acquisition.  Similarly, we are given no account of how the Affective Filter works, of how input is filtered out by an unmotivated learner.

Finally, Krashen’s use of key terms, such as “acquisition and learning”, and “subconscious and conscious”, is vague, confusing, and, not always consistent.

In summary, while the model is broad in scope and is intuitively appealing, Krashen’s key terms are ill-defined, and circular, so that the set is incoherent. The lack of empirical content in the five hypotheses means that there is no means of testing them.  As a theory it has such serious faults that it is not really a theory at all.

And yet, Krashen’s work has had an enormous influence, and in my opinion, rightly so. While the acquisition / learning distinction is badly defined, it is, nevertheless, absolutely crucial to current attempts to explain SLA; all the subsequent work on implicit and explicit learning, knowledge, and instruction starts here, as does the work on interlanguage development. Since the questions of conscious and unconscious learning, and of interlanguage development are the two with the biggest teaching implications, and since I think Krashen was basically right about both issues, I personally see Krashen’s work as of enormous and enduring importance.

Processing Approaches

A) McLaughlin: Automaticity and Restructuring

McLaughlin’s (1987) review of Krashen’s Monitor Model is considered one of the most complete rebuttals offered (but see Gregg, 1984). In an attempt to overcome the problems of finding operational definitions for concepts used to describe and explain the SLA process, McLaughlin went on the argue (1990) that the distinction between conscious and unconscious should be abandoned in favour of clearly-defined empirical concepts.  McLaughlin substitutes the use of the conscious /unconscious dichotomy with the distinction between controlled and automatic processing. Controlled processing requires attention, and humans’ capacity for it is limited; automatic processing does not require attention, and takes up little or no processing capacity.  So, McLaughlin argues, the L2 learner begins the process of acquisition of a particular aspect of the L2 by relying heavily on controlled processing; then, through practice the learner’s use of that aspect of the L2 becomes automatic.

McLaughlin uses the twin concepts of Automaticity and Restructuring to describe the cognitive processes involved in SLA. Automaticity occurs when an associative connection between a certain kind of input and some output pattern occurs.   Many typical greetings exchanges illustrate this:

Speaker 1: Morning.

Speaker 2: Morning. How are you?

Speaker 1: Fine, and you?

Speaker 2: Fine.

Since humans have a limited capacity for processing information, automatic routines free up more time for processing new information. The more information that can be handled automatically, the more attentional resources are freed up for new information.  Learning takes place by the transfer of information to long-term memory and is regulated by controlled processes which lay down the stepping stones for automatic processing.

The second concept, restructuring, refers to qualitative changes in the learner’s interlanguage as they move from stage to stage, not to the simple addition of new structural elements.  These restructuring changes are, according to McLaughlin, often reflected in “U-shaped behaviour”, which refers to three stages of linguistic use:

  • Stage 1: correct utterance,
  • Stage 2: deviant utterance,
  • Stage 3: correct usage.

In a study of French L1 speakers learning English, Lightbown (1983) found that, when acquiring the English  “ing” form, her subjects passed through the three stages of U-shaped behaviour.  Lightbown argued that as the learners, who initially were only presented with the present progressive, took on new information – the present simple – they had to adjust their ideas about the “ing” form.  For a while they were confused and the use of “ing” became less frequent and less correct. TBelow is a diagram showing the same process for past tense forms:


McLaughlin suggested getting rid of the unconscious / conscious distinction because it wasn’t properly defined by Krashen, but in doing so he threw the baby out with the bathwater. Furthermore, we have to ask to what extent the terms “controlled processing” and “automatic processing” are any better; after all, measuring the length of time necessary to perform a given task is a weak type of measure, and one that does little to solve the problem it raises.

Still, the “U-shaped” nature of staged development has been influential in successive attempts to explain interlanguage development, and we may note that McLaughlin was, with Bialystock, among the first scholars to apply general cognitive psychological concepts of computer-based information-processing models to SLA research.  Chomsky’s Minimalist Program confirms his commitment to the view that cognition consists in carrying out computations over mental representations.  Those adopting a connectionist view, though taking a different view of the mind and how it works, also use the same metaphor.  Indeed the basic notion of “input – processing – output” has become an almost unchallenged account of how we think about and react to the world around us.  While in my opinion the metaphor can be extremely useful, it is worth making the obvious point that we are not computers.  One may well sympathise with Foucault and others who warn us of the blinding power of such metaphors.

Schmidt’s noticing hypothesis

Rather than accept McLaughlin’s advice to abandon the search for a definition of “consciousness”, Schmidt attempts to do away with its “terminological vagueness” by examining it in detail. His work has proved enormously influential, but I think there are serious problems with the “Noticing Hypothesis”, and that is has been widely misinterpreted in order to justify types of explicit instruction that are not actually supported by a more considered view of the evidence. I’ll deal with this in Part 4.

See Bibliography in Header for all references 

SLA Part 2: Cognitive Theories  


The paradigm shift from behaviourism to “mentalism”, caused by the publication of Chomsky’s Syntactic Structures in 1957 and subsequent confrontations with Skinner, meant that language learning was now seen as a psychological process going on in “the mind”, a revived construct which had been proscribed by the behaviourists. SLA scholars turned their attention away from teaching and towards the mental process of learning an L2. We may note that they usually ignored Gregg’s perfectly justifiable demand for a property theory; they worked in different, limited domains; their explanations used confusing, sometimes contradictory constructs; their hypotheses were often difficult, sometimes impossible, to test; their studies got support from inconclusive, insufficient, and flawed research; none of the theories provided a full or satisfactory explanation of SLA; and there was little consensus among researchers about fundamental questions of domain or research methodology. While, 60 years later, I think it’s fair to say that there is still no full or satisfactory explanation of SLA, I think it’s also fair to say that a great deal of progress has been made, and that there are now robust, reliable reseach findings which, if given the attention they deserve and acted on, would transform ELT practice.

Error Analysis

We may begin with Pit Corder, who, in 1967, argued that errors were neither random nor best explained in terms of interference from the learner’s L1; errors were indications of learners’ attempts to figure out an underlying rule-governed system.  Corder distinguished between errors and mistakes: mistakes are slips of the tongue and not systematic, whereas errors are indications of an as yet non-native-like, but nevertheless, systematic, rule-based grammar.  It’s easy to see such a distinction as reflecting Chomsky’s distinction between performance and competence, and to interpret Corder’s interest in errors as an interest in the development of a learner’s grammar.

But error analysis, by concentrating exclusively on errors, failed to capture the full picture of a learner’s linguistic behaviour. Schachter (1974) in a study of the compositions of Persian, Arabic, Chinese and Japanese learners of English focusing on their use of relative clauses, found that the Persian and Arabic speakers had a far greater number of errors, but that the Chinese and Japanese students produced only  half as many relative clauses as did the Persian and Arabic students. Schachter then looked at the students’ L1 and found that Persian and Arabic relative clauses are similar to English in that the relative clause is placed after the noun it modifies, whereas in Chinese and Japanese the relative clause comes before the noun.  She concluded that Chinese and Japanese speakers of English use relative clauses cautiously but accurately because of the distance between the way their L1 and the L2 (English) form relative clauses.  While Schacter’s main aim was to challenge the strong claims of the Contrastive Analysis Hypothesis, her study drew attention to the fact that one needed to look at what learners get right as well as what they get wrong.


Error analysis had a pedagogical goal: by identifying, classifying, and quantifying errors, remedial work could be planned, based on the kind and frequency of the error.  Nevertheless, the seeds of a powerful SLA theory, covering a much wider domain, were planted.  Although Corder focused on teaching methodology, the long-term effect of error analysis was to shift SLA research away from teaching and towards  studying how L2 learners formulate and internalise a grammar of the L2, on the basis of exposure to the language and some kind of internal processing.


9.3. The Morpheme Order Studies

While not in a strict chronological sense, the next development in SLA theory was provoked by the morpheme order studies.  Dulay and Burt (1975) claimed that fewer than 5% of errors were due to native language interference, and that errors were, as Corder suggested, in some sense systematic, that there was something akin to a Language Acquisition Device at work not just in first language acquisition, but also in SLA.  The morpheme studies of Brown in L1 (1973) led to studies in L2 by Dulay & Burt (1973, 1974a, 1974b, 1975), and Bailey, Madden & Krashen (1974), all of which suggested that there was a natural order in the acquisition of English morphemes, regardless of L1. This became known as the L1 = L2 Hypothesis, and further studies (by Ravem (1974), Cazden, Cancino, Rosansky & Schumann (1975), Hakuta (1976), and Wode (1978), cited in Larsen-Freeman and Long, 1991), all pointed to systematic staged development in SLA.

Some of these studies, particularly those of Dulay and Burt, and of Bailey, Madden and Krashen, were soon challenged.  Among the objections to the findings were:

  • The Bilingual Syntax Measure was claimed to have skewed results – it was suggested that any group of learners taking the test would have produced similar results.
  • The category “non-Spanish” was too wide.
  • Morphemes of different meanings were categorised together, e.g., the English article system.
  • Accuracy orders do not necessarily reflect developmental sequences. The total picture of a learner’s use of a form was not taken into account.
  • The type of data elicited was “forced”.

After the original studies, over fifty new L2 morpheme studies were carried out, many using more sophisticated data collection and analysis procedures (including an analysis of the subjects’ performance in supplying morphemes in non-obligatory, as well as obligatory, contexts), and the results of these studies went some way to restoring confidence in the earlier findings (Larsen Freeman and Long, 1991: 91).


For all its imperfections, this research marked a decisive turning-point in the development of SLA theory, because it focused attention on learners’ staged development in the L2.  Even so, it was a modest start: the morpheme studies left most questions unanswered, since even if English morphemes are acquired in a predictable order, this doesn’t mean that all acquisition takes place in a predictable order. As Gregg (1984) pointed out, the morpheme studies lacked any explanation of why this “natural order” was systematic.







Base Camp: Interlanguages Identified 

The emerging cognitive paradigm of language learning perhaps received its full expression in Selinker’s (1972) paper which argues that L2 learners develop their own autonomous mental grammar (interlanguage (IL) grammar) with its own internal organising principles.

Question forms were the first stage of this interlanguage to be identified. In a study of six Spanish L1 students over a 10-month period, Cazden, Cancino, Rosansky and Schumann (1975) found that the participants produced interrogative forms in a predictable sequence:

  1. Rising intonation (e.g., He works today?),
  2. Uninverted WH (e.g., What he (is) saying?),
  3. “Overinversion” (e.g., Do you know where is it?),
  4. Differentiation (e.g., Does she like where she lives?).

Negation was the next example, reported in Larsen-Freeman and Long (1991: 94), where learners from a variety of different L1 backgrounds were seen to go through the same four stages in acquiring English negation:

  1. External (e.g., No this one./No you playing here),
  2. Internal, pre-verbal (e.g., Juana no/don’t have job),
  3. Auxiliary + negative (e.g., I can’t play the guitar),
  4. Analysed don’t (e.g., She doesn’t drink alcohol.)


Attention is now being overtly focused on the phenomena of staged development and systematicity in L2 learning; the work is attempting to explain the process of SLA, and to suggest what mechanisms are involved.  To the extent that such studies can be taken as support for the view that interlanguage development is at least in part explained by language universals, they can be seen as related to UG theory, but we must stress the distance between the two types of analysis. Interlanguage is not seen in terms of principles and parameters, it is concerned, among other things, with surface grammar, with processing, with skills acquisition.

It’s important to note the agreement among researchers at this time that “incompleteness” was a much-commented on phenomenon: most L2 learners’ ultimate level of L2 attainment falls well short of native-like competence.  Here again, the difference between L1 acquisition and SLA, and the resultant differences in the appropriate approaches to research and explanation in the two fields is clear.  Selinker and Lamendella (1978) suggested that both internal factors (e.g., age, lack of desire to acculturate) and external factors (e.g., communicative pressure, lack of input and opportunities to practice, lack of negative feedback) are at work, but the precise role these factors play and how they interact was not explained.

So, there are signs of progress. As work progresses, problems of defining key terms and constructs (particularly the implicit/explicit, procedural/declarative, automatic/ controlled dichotomies) are worked on; and, as more stages in interlanguage development are identified  (questions, negation, word order, embedded clauses and pronouns being the most important areas (see Braidi, 1999)), things get at once clearer and more complex: the dynamic nature of SLA means that differentiating between different stages is difficult, the stages overlap, and there are variations within stages as McLaughlin’s theory (see Part 3) suggests.


References: See Bibliography in Header

SLA Part 1: Contrastive Analysis

This is Question 2 of the 5 that I think teacher trainers should answer. Here, 1 I look at Contrastive Analysis. I take the terms “learning an L2” and “SLA” to refer to the same thing.

Contrastive Analysis concentrated on the role of the “native language”, and suggested that language transfer was the key to explaining SLA.  The Contrastive Analysis Hypothesis (CAH) was founded on

  • structural linguistics, which was a corpus-based descriptive approach, providing detailed linguistic descriptions of a particular language, and
  • behavioural psychology, which held that learning was establishing a set of habits.

Lado (1957), following the behaviourist argument, assumed that learning a language was like learning anything else, and that, in line with this general learning theory,  learning task A will affect the subsequent learning of task B.  Consequently, SLA is crucially affected by learning of the L1. If acquisition of the L1 involved the formation of a set of habits, then the same process must also be involved in SLA, with the difference that some of the habits appropriate to the L2 will already have been acquired, while other habits will need to be modified, and still others will have to be learned from scratch. Lado went on to suggest that there were two types of language transfer: positive transfer (facilitation) and negative transfer (interference).  There were, in turn, two types of interference: retroactive, where learning acts back on previously learned matter (language loss), and proactive inhibition, where a series of responses already learned tend to appear in situations where a new series of responses is needed.

To summarise, the CAH claimed that language learning is habit formation and SLA involves establishing a new set of habits.  By considering the main differences between L1 and L2, one can anticipate the errors learners will make when learning an L2: errors indicate differences and these differences have to be learned.


The CAH was immediately challenged by evidence from studies, which showed that errors occurred when not predicted by contrastive analysis, and did not occur when predicted.  Initial studies led to subsequent work, and lots more counterevidence. For example, Zobl (1980) found that while English-speaking learners of French negatively transferred English postverbal pronoun placement to produce ungrammatical utterances such as  Le chien a mangé les (Le chien les a mangés), French-speaking learners of English did not make such errors, even though both languages have preverbal object pronouns. This is a case of a one-way learning difficulty.  Furthermore, not all areas of similarity between an L1 and an L2 lead to positive transfer.  Odlin (1989), for example, reported that although Spanish has a copula verb similar to English be in sentences like That’s very simple, or The picture’s very dark, Spanish-speaking learners of L2 English usually omit the copula in early stages of acquisition, saying That very simple, and The picture very dark.

More systematic classification of learners’ errors suggested that only a small percentage of them could be attributed to contrasting properties between L1 and L2.  Lococo (1975) for example found that in the corpus she examined, only 25% of errors resulted from L1/L2 contrast, and Dulay and Burt’s study (1975) claimed that only 5% of errors were thus accounted for. The Dulay and Burt study was subsequently seriously questioned (see Ellis, 1993: 45), but later morheme studies did much to resore the credibility of the underlying argument.


Contrastive analysis has a lot to recommend it.  As a theory of SLA, the following points can be made about the CAH:

  • It is a coherent and cohesive consequence of a general theory of learning.
  • It embraces a well-developed theory of languages. Just as learning is seen in behaviourist terms, so languages are seen from a well-defined structuralist viewpoint: languages are studied in the true Baconian, “botanist” tradition, and it is the careful description and analysis of their differences which is the researcher’s main concern.
  • It occupies a limited domain, dealing almost exclusively with the phenomenon of transfer of properties of the L1 grammar into the L2 grammar.
  • It is a testable hypothesis: empirical evidence can support or challenge it and research studies can be replicated. The research methods can be scrutinised and improved.
  • It is extremely economical in its use of key terms and constructs.

It may also be noted that there are crystal-clear pedagogical implications. Contrastive Analysis indicates what particular habits have to be learned, and pedagogical practice – the audio-lingual method (speech is primary, and is learned through drills and practice) – fits perfectly with the theory of SLA.  One would venture to say that this was no coincidence, that the agenda of the early SLA researchers was clearly focused on pedagogical concerns.

The fundamental difficulty of the theory lies in its underlying behaviouristic theory of learning, according to which all learning is repetition, a question of habit-formation.  Such a view of learning adopts an empiricist / positivist epistemology which denies the validity of the mind as a construct, and the possibility of causal explanations. Following the disasterous trajectory of the logical positivists in the 1930s, behaviourism and positivism were almost universally rejected, although, more recently, connectionist and emergetist approaches to learning seem to herald a return to behaviourism. Of course,  a sleight of hand is required in order to allow for the use of theoretical constructs and for the inference to a causal explanation of L2 learning.

As we’ll see, the shift away from behaviourism meant saying farewell to a very comfortable state of affairs in ELT. Before the shift, learning a second language was explained in terms of a general learning theory, and there was no doubt as to the practical applications of that theory: you learn the L2 in the same way as you learned the L1, and in the same way as you learn anything else, by forming stimulus-response behaviour patterns.

It is instructive to see what happened to the CAH.  While the strong claims of the CAH have been refuted by research findings, there has rarely been any doubt that the L1 does indeed affect SLA.  Later studies concentrated on when and how the L1 influenced SLA.

Regarding “When”, Wode (1978) suggested that it is the similarities, not the differences, between L1 and L2 which cause the biggest problems, and Zobl (1982) proposed that “markedness” constrains L1 transfer.  Zobl argued that linguistically unmarked L1 features will transfer, but linguistically marked features will not, where markedness is measured in terms of infrequency or departure from something basic or typical in a language.

Regarding “How”, Zobl identified two patterns of L1 influence on SLA: (a) the pace at which a developmental stage is traversed (where the L1 can inhibit or accelerate the process), and (b) the number of developmental structures in a stage.  Larsen-Freeman and Long, in their discussion of markedness conclude:

When L1 transfer occurs, it generally does so in harmony with developmental processes, modifying learners’ encounters with interlanguage sequences rather than altering them in fundamental ways. (Larsen-Freeman and Long, 1991: 106)

Today, L1 transfer can be seen as playing an important part in all of the most interesting current views of SLA, including Nick Ellis’ and Mike Long’s agreement that one of the biggest tasks facing adult L2 learners is that of “re-setting the dial”. We’ll come to that.

References can be found by clicking on the “Bibliography for Theory Construction in SLA” menu in the Header.

Coursebooks and the commodification of ELT

(Note: This is a copy of a post from my CriticELT blog. I think it’s relevant to the subject of teacher trainers because, with the noteable exception of Scott Thornbury and Luke Meddings, leading teacher trainers, and TD SIGs, work on the assumption that modern coursebooks are an essential tool for current ELT practice.)


Coursebooks embody a synthetic approach to syllabus design. Wilkins (1976) distinguished between a ‘synthetic’ approach, where items of language are presented one by one in a linear sequence to the learner, whose job is to build up, or ‘synthesizes’, the knowledge incrementally, and an ‘analytic’ approach, where the learner does the ‘analysis’, i.e. ‘works out’ the system, through engagement with natural language data. Coursebook writers take the target language (the L2) as the object of instruction, and they divide the language up into bits of one kind or another – words, collocations, grammar rules, sentence patterns, notions and functions, for example – which are presented and practiced in a sequence. The criteria for sequencing can be things like valency, criticality, frequency, or saliency, but the most common criterion is ‘level of difficulty’, which is intuitively defined by the writers themselves.

The approach is thus based on taking incremental steps towards proficiency; “items”, “entitities”, sliced up bits of the target language are somehow accumulated through a process of presentation, practice and re-cycling, and communicative competence is the result.


Different coursebooks claim to use different types of syllabus, – grammatical, lexical, or notional-functional, for example – but they’re all synthetic syllabuses with the same features described above, and they all give pride of place to explicit teaching and learning. The syllabus is delivered by the teacher, who first presents the bits of the L2 chosen by the coursebook writers (in written and spoken texts, grammar boxes, vocabulary lists, diagrams, pictures, and so on), and leads students through a series of activities aimed at practicing the language, like drills, written exercises, discussions, games, tasks and practice of the four skills.

Among the courseboooks currently on sale from UK and US publishers, and used around the world are the following:

Headway;     English File;      Network;      Cutting Edge;      Language Leader;      English in Common;      Speakout;      Touchstone; Interchange;      Mosaic;      Inside Out;      Outcomes.

Each of these titles consists of a series of five or six books aimed at different levels, from beginner to advanced, and offers a Student’s Book, a Teacher’s Book and a Workbook, plus other materials such as video and on-line resources. Each Student’s Book at each level is divided into a number of units, and each unit consists of a number of activities which teachers lead students through. The Student’s Book is designed to be used systematically from start to finish – not just dipped into wherever the teacher fancies. The different activities are designed to be done one after the other; so that Activity 1 leads into Activity 2, and so on. Two examples follow.

In New Headway, Pre-Intermediate, Unit 3, we see this progression of activities:

  1. Grammar (Past tense) leads into ( ->)
  2. Reading Text (Travel) ->
  3. Listening (based on reading text) ->
  4. Reading (Travel) ->
  5. Grammar – (Past tense) ->
  6. Pronunciation ->
  7. Listening (based on Pron. activity) ->
  8. Discussing Grammar –>
  9. Speaking (A game & News items) ->
  10. Listening & Speaking (News) ->
  11. Dictation (from listening) ->
  12. Project (News story) ->
  13. Reading and Speaking (About the news) ->
  14. Vocabulary (Adverbs) ->
  15. Listening (Adverbs) ->
  16. Grammar (Word order) ->
  17. Everyday English (Time expressions)

And if we look at Outcomes Intermediate, Unit 2, we see this:

  1. Vocab. (feelings) ->
  2. Grammar (be, feel, look, seem, sound + adj.) ->
  3. Listening (How do they feel?) ->
  4. Developing Conversations (Response expressions) ->
  5. Speaking (Talking about problems) ->
  6. Pronunciation (Rising & fallling stress) ->
  7. Conversation Practice (Good / bad news) ->
  8. Speaking (Physical greetings) ->
  9. Reading (The man who hugged) ->
  10. Vocabulary (Adj. Collocations) ->
  11. Grammar (ing and ed adjs.) ->
  12. Speaking (based on reading text) ->
  13. Grammar (Present tenses) ->
  14. Listening (Shopping) ->
  15. Grammar (Present cont.) ->
  16. Developing conversations (Excuses) ->
  17. Speaking (Ideas of heaven and hell).

All the other coursebooks mentioned are similar in that they consist of a number of units, each of them containing activities involving the presentation and practice of target versions of L2 structures, vocabulary, collocations, functions, etc., using the 4 skills. All of them assume that the teacher will lead students through each unit and do the succession of activities in the order that they’re set out. And all of them wrongly assume that if learners are exposed to selected bits of the L2 in this way, one bit at a time in a pre-determined sequence, then, after enough practice, the new bits, one by one, in the same sequence, will become part of the learners’ growing L2 competence. This false assumption flows from a skill-based view of second-language acquisition, which sees language learning as the same as learning any other skill, such as driving a car or playing the piano.

Skills-based theories of SLA

The most well-known of these theories is John Anderson’s (1983) ‘Adaptive Control of Thought’ model, which makes a distinction between declarative knowledge – conscious knowledge of facts; and procedural knowledge – unconscious knowledge of how an activity is done. When applied to second language learning, the model suggests that learners are first presented with information about the L2 (declarative knowledge ) and then, via practice, this is converted into unconscious knowledge of how to use the L2 (procedural knowledge). The learner moves from controlled to automatic processing, and through intensive linguistically focused rehearsal, achieves increasingly faster access to, and more fluent control over the L2 (see DeKeyser, 2007, for example).

The fact that nearly everybody successfully learns at least one language as a child without starting with declarative knowledge, and that millions of people learn additional languages without studying them (migrant workers, for example), might make one doubt that learning a language is the same as learning a skill such as driving a car. Furthermore, the phenomenon of L1 transfer doesn’t fit well with a skills based approach, and neither do putative critical periods for language learning. But the main reason for rejecting such an approach is that it contradicts SLA research findings related to interlanguage development.

Firstly, it doesn’t make sense to present grammatical constructions one by one in isolation because most of them are inextricably inter-related. As Long (2015) says:

Producing English sentences with target-like negation, for example, requires control of word order, tense, and auxiliaries, in addition to knowing where the negator is placed. Learners cannot produce even simple utterances like “John didn’t buy the car” accurately without all of those. It is not surprising, therefore, that Interlanguage development of individual structures has very rarely been found to be sudden, categorical, or linear, with learners achieving native-like ability with structures one at a time, while making no progress with others. Interlanguage development just does not work like that. Accuracy in a given grammatical domain typically progresses in a zigzag fashion, with backsliding, occasional U-shaped behavior, over-suppliance and under-suppliance of target forms, flooding and bleeding of a grammatical domain (Huebner 1983), and considerable synchronic variation, volatility (Long 2003a), and diachronic variation.


Secondly, research has shown that L2 learners follow their own developmental route, a series of interlocking linguistic systems called “interlanguages”.  Myles (2013) states that the findings on the route of interlanguage (IL) development is one of the most well documented findings of SLA research of the past few decades. She asserts that the route is “highly systematic” and that it “remains largely independent of both the learner’s mother tongue and the context of learning (e.g. whether instructed in a classroom or acquired naturally by exposure)”. The claim that instruction can influence the rate but not the route of IL development is probably the most widely-accepted claim among SLA scholars today.

Selinker (1972) introduced the construct of interlanguages to explain learners’ transitional versions of the L2. Studies show that interlanguages exhibit common patterns and features, and that learners pass through well-attested developmental sequences on their way to different end-state proficiency levels. Examples of such sequences are found in morpheme studies; the four-stage sequence for ESL negation; the six-stage sequence for English relative clauses; and the sequence of question formation in German (see Hong and Tarone, 2016, for a review).  Regardless of the order or manner in which target-language structures are presented in coursebooks, learners analyse input and create their own interim grammars, slowly mastering the L2 in roughly the same manner and order. The  acquisition sequences displayed in interlanguage development don’t reflect the sequences found in any of the coursebooks mentioned; on the contrary, they prove to be  impervious to coursebooks, as they are to different classroom methodologies, or even whether learners attend classroom-based courses or not.

Note that interlanguage development refers not just to grammar; pronunciation, vocabulary, formulaic chunks, collocations, sentence patterns, are all part of the development process. To take just one example, U-shaped learning curves can be observed in learning the lexicon. Learners have to master the idiosyncratic nature of words, not just their canonical meaning. While learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. Only by passing through a period of incorrectness, in which the lexicon is used in a variety of ways, can they climb back up the U-shaped curve.

Interlanguage development takes place in line with what Corder (1967) referred to as the internal “learner syllabus”, not the external syllabus embodied in coursebooks. Students don’t learn different bits of the L2 when and how a coursebook says that they should, but only when they are developmentally ready to do so. As Pienemann demonstrates (e.g. Pienemann, 1987) learnability (i.e., what learners can process at any one time), determines teachability (i.e., what can be taught at any one time). Coursebooks flout the learnability and teachability conditions; they don’t respect the learner’s internal learner syllabus.

 False Assumptions made by Coursebooks

To summarise the above, we may list the 3 false assumptions made by coursebooks.

Assumption 1: In SLA, declarative knowledge converts to procedural knowledge. Wrong! No such simple conversion occurs. Knowing that the past tense of has is had and then doing some controlled practice, does not lead to fluent and correct use of had in real-time communication.

Assumption 2: SLA is a process of mastering, one by one, accumulating structural items. Wrong! All the items are inextricably inter-related. As Long (2015, 67) says:

The assumption that learners can move from zero knowledge to mastery of negation, the present tense, subject- verb agreement, conditionals, relative clauses, or whatever, one at a time, and move on to the next item in the list, is a fantasy.

Assumption 3: Learners learn what they’re taught when they’re taught it. Wrong – as every teacher knows! Pienemann (1987) has demonstrated that teachability is constrained by learnability.

Objections to Coursebooks

  1. As the section on interlanguage above indicates, presenting and practicing a pre-set series of linguistic forms (pronunciation contrasts, grammatical structures, notions, functions, lexical items, collocations, etc.) simply does not work, unless a form coincidentally happens to be learnable (by some students in a class), and so teachable, at the time it is presented.
  2. The approach is counterproductive: both teachers and students feel frustrated by the constant mismatch between teaching and learning.
  3. The cutting up of language into manageable pieces (or “McNuggets” as Thornbury (2014) calls them) often results in impoverished input and output opportunities.
  4. Both the content and methodology of the course are externally pre-determined and imposed. This point will be developed below.
  5. Coursebooks pervade the ELT industry and stunt the growth of innovation and teacher training. The publishing companies that produce coursebooks also produce exams, teacher training courses and everything else connected to ELT; Pearson’s GSE initiative is a good example. Publishing companies spend tens of millions of dollars on marketing, aimed at persuading stakeholders that coursebooks represent the best practical way to manage ELT. Pearsons is one example, another is the the British ELT establishment, where key players like the British Council, the Cambridge Examination Boards, the Cambridge CELTA and DELTA teacher training bodies among them, accept the coursebook as central to ELT practice. TESOL and IATEFL, bodies that are supposed to represent teachers’ interests, have also succumbed to the influence of the big publishers, as their annual conferences make clear. So the coursebook rules, at the expense of teachers, of good educational practice, and of language learners.

Coursebooks represent the commofification of ELT. Grammar, vocabulary, lexical chunks, discourse, the whole messy chaotic stuff of language is neatly packaged into items, granules, chunks, seved up in sanitised short texts and summarised in lists and tables.  Communicative competence itself, as Leung (cited in Thornbury 2014) points out, is turned into “inert and decomposed knowledge”, and language teaching is increasingly prepackaged and delivered as if it were a standardised, marketable product.  ELT becomes just another market transaction; in this case between de-skilled teachers, who pass on a set of standardised, testable knowledge and skills to learners, who have been reconfigured as consumers.

An Alternative: The Analytic or Process Syllabus

An analytic syllabus rejects the method of cutting up a language into manageable pieces, and instead organises the syllabus according to the needs of the learners and the kinds of language performance that are necessary to meet those needs. “Analytic” refers not to what the syllabus designer does, but to what learners are invited to do. Grammar isn’t “taught” as such; rather learners are provided with opportunities to engage in meaningful communication on the assumption that they will slowly analyse and induce language rules, by exposure to the language and by the teacher providing scaffolding, feedback, and information about the language.

Breen’s (1987) distinction between product and process syllabuses contrasts the focus on content and the pre-specification of linguistic or skill objectives, with a “natural growth” approach which aims to expose the learners to to real-life communication without any pre-selection or arrangement of items. Figure 1, below, summarises the differences.

A process approach focuses on how the language is to be learned. There is no pre-selection or arrangement of items; the syllabus is negotiated between learners and teacher as joint decision makers, and emphasises the process of learning rather than the subject matter. No coursebook is used. The teacher implements the evolving syllabus in consultation with the students who participate in decision-making about course objectives, content, activities and assessment.


Hugh Dellar has made a number of attempts to defend coursebooks, and here are some examples of what he’s said:

  • “Attempts to talk about coursebook use as one unified thing that we all understand and recognise are incredibly myopic. Coursebooks differ greatly in terms of the way they frame the world and in terms of the questions and positions they expect or allow students to take towards these representations. …. So hopefully it’s clear that far from being one homogenous unified mass of media, coursebooks are wildly heterogeneous in both their world views and their presentations of language.”
  • “Teachers mediate coursebooks”.
  • “The kind of broad brush smearing of coursebooks you’re engaging in does those teachers a profound disservice as it’s essentially denying the possibility of them still being excellent practitioners. I’d also suggest that grammar DOES still seem to be the primary – though not the only – thing that the vast majority of teachers around the world expect and demand from material, whether you like it or not (and I don’t, personally, but there you go. We live in an imperfect world). To pretend this isn’t the case or to denigrate all those who believe this is wipe out a huge swathe of the teaching profession and preach mainly to the converted.”
  • Teachers in very poor parts of the world would just love to have coursebooks.
  • Coursebooks are based on the presentation and practice of discrete bits of grammar because that’s what teachers want.
  • Coursebooks help teachers do their jobs.
  • Coursebooks save time on lesson preparation.
  • Coursebooks meet student and parental expectations.

These remarks are echoed by others (e.g. Harmer, Scrivener, Prodromou, Ur, Lansford, Walter), and can be summed up by the following:

  1. Coursebooks are not all the same.
  2. Teachers adapt, modify and supplement them.
  3. They’re convenient.
  4. They give continuity and direction to a language course.

I accept that some coursebooks don’t follow the synthetic syllabus I describe, but these are the exceptions. All the coursebooks I list at the start of this article, and I’d say those that make up 90% of the total sales of coursebooks worldwide, use a synthetic syllabus and make the 3 assumptions I suggest, including Dellar’s. All the stuff about coursebooks differing greatly “in terms of the way they frame the world and in terms of the questions and positions they expect or allow students to take towards these representations” has absolutely no relevance to the arguments made against them.

As for teachers adapting, modifying and supplementing coursebooks, the question is to what extent they do so. If they do so to a great extent,  then the coursebook no longer serves as the syllabus, but they’ve rather contradicted the main point of having a coursebook, and one wonders how they can justify getting their students to buy the book if it’s only used let’s say 30% of the time. If they only modify and supplement to a small extent, then the coursebook drives the course, learners are led through a pre-determined series of steps, and my argument applies. The most important thing to note is that what teachers actually do is ameliorate coursebooks; they make them less terrible, more bearable, in dozens of different clever and inventive ways. But this, of course, is no argument in favour of the coursebook itself; indeed, to the extent that students learn, it will be more despite than because of the damn coursebook.

Which brings us to the claim that the coursebook is convenient, time-saving, etc.. Even if it’s true (which it won’t be if you spend lots of time adapting, modifying and supplementing (i.e. ameliorating) it), the trouble is, it doesn’t work: students don’t learn what they’re taught. And that applies to the other arguments used to defend coursebooks, such as that parents expect their kids to use them, that they give direction to the course, and so on: such arguments simply ignore the evidence that students do not, indeed cannot, learn in the way assumed by a coursebook.

Thus, the points above fail to address the main criticisms levelled against coursebooks, which are that they fly in the face of robust research findings and that they deprive teachers and learners of control of the learning process, leading to a lose-lose classroom environment.  In order to reply to these arguments, those wishing to defend coursebooks must first confront the three false assumptions on which coursebook use is based (i.e. they must confront the evidence of how SLA actually happens) and they must then argue the case for dictating what is learned. That coursebooks are the dream of teachers working in Ethiopia; that coursebooks are cherished by millions of teachers who just really love them; that the Headway team have succeeded in keeping their products fresh and lively; that Outcome includes recordings of people who don’t have RP accents; that coursebooks are mediated by teachers; that coursebooks are here to stay, so get real and get used to it; none of these statements does anything to answer the case against them, and none carries any weight for those who wish to base their teaching practice on critical thinking and rational argument. No matter how “different” coursebooks are, or how flexibly they can be used, coursebooks rely on false assumptions about L2 learning, and impose a syllabus on learners who are largely excluded from decisions about what and how they learn.

Managing a process syllabus is no more difficult than mastering the complexities of a modern coursebook. All you need to get started is a materials bank and a crystal-clear explanation of roles and procedures. Part 2 of Breen 1987 provides a framework; the collection of articles edited by Breen (2000) has at least 5 really helpful “road maps”; Meddings and Thornbury (2009) give a detailed account of their approach in this excellent book; and I outline a process syllabus on my blog. As befits an approach based on libertarian, co-operative educational principles, a process syllabus is best seen in local rather than global settings. If the managers of local ELT centres have the will to break the grip of the coursebook, they only have to make a small initial investment in local training and materials, and to then support teachers in their efforts to involve their students in the new venture. I dare to say that such efforts will transform the learning experience of everybody involved.


Coursebooks oblige teachers to work within a framework where students are presented with and then practice dislocated bits of English in a sequence which is pre-determined and externally imposed on them by coursebook writers. Most teachers have little say in the syllabus design which shapes their work, and their students have even less say in what and how they’re taught. Furthermore, results of coursebook-based teaching are bad; most learners don’t reach the level they aim for, and most don’t reach the level of proficiency the coursebook promises (English Proficiency Index, 2015). At the same time, alternatives to coursebook-driven ELT which are much more attuned to what we know about psycholinguistic, cognitive, and socio-educational principles for good language teaching don’t get the exposure or the fair critical evaluation that they deserve.

Despite flying in the face of what we know about L2 learning, despite denying teachers and learners a decision-making voice, and despite poor results, the coursebook dominates current ELT practice to an alarming extent. The main pillars of the ELT establishment, from teacher organisations like TESOL and IATEFL, through bodies like the British Council, examination boards like Cambridge English Language Assessment and TEFL, to the teacher training certification bodies like Cambridge and Trinity, all support the use of coursebooks.

The increasing domination of coursebooks in a global ELT industry worth close to $200 billion (Pearson, 2016) means that they’re not just a symptom but a major cause of the current lose-lose situation we find ourselves in, where both teachers and learners are restrained and restricted by the demonstrably faulty methodological principles which coursebooks embody. I think we have a responsibility to raise awareness of the damage that coursebooks are doing, and to fight against the suffocating effects of continued coursebook consumption.


Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.

Breen, M.P. (1987) Contemporary Paradigms in Syllabus Design. Part I. Language Teaching, 20, pp 81-92.

Breen, M.P. (1987) Contemporary Paradigms in Syllabus Design. Part II. Language Teaching, 20, 20, Issue 03.

Breen, M.P. and Littlejohn, A. (2000) Classroom Decision Making: Negotiation and Process Syllabuses in Practice. Cambridge: CUP.

English Proficiency Index (2015) Accessed from http://www.ef.edu/epi/  9th November, 2015

Hong, Z. and Tarone, E. (Eds.) (2016) Interlanguage Forty years later. Amsterdam, Benjamins.

Long, M.H. (2011) “Language Teaching”. In Doughty, C. and Long, M.  Handbook of Language Teaching. NY Routledge.

Long, M.H. (2015) SLA and Task Based Language Teaching. N.Y., Routledge.

Long, M.H. & Crookes, G. (1993). Units of analysis in syllabus design: the case for the task. In G. Crookes & S.M. Gass (Eds.). Tasks in a Pedagogical Context. Cleveland, UK: Multilingual Matters. 9-44.

Meddings, L. And Thornbury, S. (2009) Teaching Unplugged. Delta.

Mitchell, R. and Myles, F. (2004)  Second Language Learning Theories.  London: Arnold.

Myles, F. (2013): Theoretical approaches to second language acquisition research. In Herschensohn, J. & Young-Scholten, M. (Eds.) The Cambridge Handbook of Second Language Acquisition. CUP

Ortega, L. (2009) Sequences and Processes in Language Learning. In Long and Doughty Handbook of Language Teaching. Oxford, Wiley.

Pearson (2016) GSE  Global Report Retrieved from https://www.english.com/blog/global-framework-raising-standards 5/12/2016.

Pienemann, M. (1987) Psychological constraints on the teachability of languages. In C. Pfaff (Ed.) First and Second Language Acquisition Processes. Rowley, MA: Newbury House. 143-168.

Rea-Dickins, P. M. (2001) Mirror, mirror on the wall: identifying processes of classroom assessment. Language Testing 18 (4), p. 429 – 462.

Selinker, L. (1972) Interlanguage. International Review of Applied Linguistics 10, 209-231.

Statista (2015) Publisher sales of ELT books in the United Kingdom from 2009 to 2013. Accessed from  http://www.statista.com/statistics/306985/total-publisher-sales-of-elt-books-in-the-uk/ 9th November, 2015.

Thornbury, S. (2014) Who ordered the Mcnuggets? Accessed from http://eltjam.com/who-ordered-the-mcnuggets/ 9th November, 2015.

Walkley, A. And Dellar, H. (2015) Outcomes:Intermediate. National Geographics.

Wilkins, D. (1976) Notional Syllabuses: A Taxonomy and its Relevance to Foreign Language Curriculum Development. London: Oxford University Press.