SLA Part 9: Emergentism

(NOTE: I’m not at all “happy” with this post, but I’ll leave it here till I’ve finished other new posts. See Empiricist Emergentist for a more up to date view.)

Up to now, I’ve been reviewing theories of SLA that try to explain the psychological processes involved in learning an L2: what goes on between our ears, you could say. In a recent tweet, David Deubelbeiss, in reference to my review, debunked what he referred to as “goofy black box ideas like a special lang. processing device in the brain”. This was enough to provoke a rude comment from me, but of course, David is simply expressing the increasingly widely held view that Chomsky’s time is up and it’s about time we left behind all this “rubbish” about black boxes. There is undoubtably something unsatisfactory about an appeal to a black box, but, to be fair, Chomsky’s UG theory is, first and foremost a theory of language, and the claim that the language knowledge we have is partly innate, that we’re “hard wired” for language, is an example of inference to the best explanation as it’s called by philosophers.  In other words, the LAD is a “logical” response to the poverty of the stimulus conundrum: given the knowledge that very young children have of language, and the limitations of the information they get from the environment, the best explanation is that they were born with some boot-strapping device, and we’ll call that the LAD. Furthermore, the cognitive theories I’ve looked at are actually attempts to describe, however indirectly, the black box and what’s going on inside it, and I don’t think it’s entirely fair to call them all “goofy”.

Nick Ellis: Emergentism 

One alternative to the innatist approach to SLA is emergentism, an umbrella term referring to a fast growing range of usage-based theories which adopt “connectionist” and associative learning views based on the premise that language emerges from communicative use. A leading spokesman for emergentism is Nick Ellis (not to be confused with Rod Ellis, who also writes about SLA). In his article “Frequency Effects in Language Processing” (part of a special issue of Studies in Second Language Acquisition, 2002, Vol. 24, 2, devoted to emergentism) Ellis argues that language processing is “intimately tuned to input frequency”, and expounds a usage-based theory which holds that “acquisition of language is exemplar based”. The paper is a real tour de force and I strongly recommend it. In fact, Nick Ellis writes beautifully; all his  papers are master classes in how to talk coherently and cohesively about complex issues, and to forcefully present your case.

The power law of practice is taken by Ellis as the underpinning for his frequency-based account, and then, through an impressive review of literature on phonology and phonotactics, reading and spelling, lexis, morphosyntax, formulaic language production, language comprehension, grammaticality, and syntax, Ellis argues that “a huge collection of memories of previously experienced utterances”, rather than knowledge of abstract rules, is what underlies the fluent use of language. In short, emergentists take language learning to be “the gradual strengthening of associations between co-occurring elements of the language”, and they see fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173).

Ellis often repeats his commitment to a Saussurean view, which sees “the linguistic sign” as a set of mappings between phonological forms and communicative intentions. He claims that “simple associative learning mechanisms operating in and across the human systems for perception, motor-action and cognition, as they are exposed to language data as part of a communicatively-rich human social environment by an organism eager to exploit the functionality of language are what drives the emergence of complex language representations” (see this link for the article).

Seidenberg and MacDonald: Emergentism

Another example of emergentist views is Seidenberg and MacDonald’s 1999 paper, which puts forward a similar “probabilistic constraints approach” to language acquisition. They explain that instead of equating knowing a language with knowing a grammar, emergentists adopt the functionalist assumption that language knowledge is “something that develops in the course of learning how to perform the primary communicative tasks of comprehension and production” (Seidenberg and MacDonald, 1999: 571). This knowledge is viewed as a neural network that maps between forms and meanings, and further levels of linguistic representation, such as syntax and morphology, are said to emerge in the course of learning tasks.

An alternative to “Competence” is also offered by Seidenberg and Macdonald, who argue that the competence-performance distinction excludes information about statistical and probabilistic aspects of language, and that these aspects play an important role in acquisition. The alternative is to characterize a performance system which handles all and only those structures that people actually use. Performance constraints are embodied in the system responsible for producing and comprehending utterances, not extrinsic to it.

Elizabeth Bates and associates

As a final example of emergentism, Bates et al., (1998) attempt to translate innateness claims into empiricist statements. They argue that innateness is often used as a logically inevitable, fall back explanation.

In the absence of a better theory, innateness is often confused with

  1.  domain specificity (Outcome X is so peculiar that it must be innate),
  2. species specificity (we are the only species who do X so X must lie in the human genome),
  3. localization (Outcome X is mediated by a particular part of the brain, so X must be innate), and
  4. learnability (we cannot figure out how X could be learned so X must be innate (Bates, et al., 1998: 590).

Instead of this unsatisfactory “explanation”, Bates et. al. believe that an empirically-based theory of interaction, a theory that will explain the process by which nature and nuture, genes and the environment, interact without recourse to innate knowledge, is “around the corner”.  Reviewing a taxonomy proposed by Elman et al. to identify different types of innateness and their location in the brain, Bates et. al. say

If the notion of a language instinct means anything at all, it must refer to a claim about cortical microcircuitry, because this is (to the best of our knowledge) the only way that detailed information can be laid out in the brain (Bates et al., 1998: 594).


Emergentism claims that complex systems exhibit ‘higher-level’ properties that are neither explainable, nor predictable from ‘lower-level’ physical properties, which puts them in a bit of a jam if they want to remain faithful to the empiricist doctrine and deny any kind of contribution from innate sources of knowledge. This is the big problem for emergentists: explaining complex representational systems without the concept of innate knowledge (and even of the mind) forces them to take a radical, sub-atomic view of the components of language. Gregg (2003), in his discussion of emergentism in SLA, notes that empiricist emergentism (which excludes the work of O’Grady) wants to do away with innate, domain-specific representational systems, and replace them with “an ability to do distributional analyses and to remember the products of the analyses” (Gregg, 2003: 55). Given this agenda, it’s surprising, says Gregg, that Ellis seems to accept the validity of the linguist’s account of grammatical structure. Surely this is contradictory.

As to the explanation of the language learning process, it is, as Ellis agrees, based on associative learning, and rests on advances in IT that have produced models of  associative learning processes in the form of connectionist networks. The severe limitations of connectionist models are highlighted by Gregg, who goes to the trouble of examining the Ellis and Schmidt model (see Gregg, 2003: 58 – 66) in order to emphasise just how little the model has learned and how much is left unexplained. The sheer implausibility of the enterprise strikes me as forcefully as it seems to strike Gregg. How can emergentists seriously propose that the complexity of language emerges from simple cognitive processes being exposed to frequently co-occurring items in the environment? How can “simple associative learning mechanisms operating in and across the human systems for perception, motor-action and cognition” explain our language knowledge?

Gregg wrote his article in 2003, since when Ellis has written a great deal more about emergentism (see his personal website – many of the articles can be downloaded  and I’d particularly recommend his 2015 article Implicit AND Explicit Language Learning: Their dynamic interface and complexity, available for free download), but, despite lots of powerful argument, Ellis can’t point to much advance in the ability of connectionist models to do what children do, namely, learn the complexities of a natural language.

The Poverty of the Stimulus -again!

At the root of the problem of any empiricist account is the poverty of the stimulus argument. By adopting an associative learning model and an empiricist epistemology (where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations), emergentists have a very difficult job explaining how children come to have the linguistic knowledge they do. How can general conceptual representations acting on stimuli from the environment explain the representational system of language that children demonstrate?

Gregg summarises Laurence and Margolis’ (2001: 221) “lucid formulation” of the poverty of the stimulus argument:

  1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.
    2. The correct set of principles need not be (and typically is not) in any pre-theoretic sense simpler or more natural than the alternatives.
    3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.
    4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.
    5. Children do reliably arrive at the correct grammar for their language.
    6. Therefore children are not empiricist learners.   (Gregg, 2003: 48)

Combining observed frequency effects with the power law of practice, and thus explaining acquisition order by appealing to frequency in the input doesn’t go very far in explaining the acquisition process itself. What role do frequency effects have? How do they interact with other aspects of the SLA process? In other words, we need to know how frequency effects fit into a theory of SLA, because frequency and the power law of practice in themselves don’t provide a sufficient theoretical framework, and neither does connectionism. As Gregg points out “connectionism itself is not a theory; it is a method, and one that in principle is neutral as to the kind of theory to which it is applied” (Gregg, 2003: 55).

My view is that emergentism stands or falls on connectionist models and that so far the results are disappointing. A theory that will explain the process by which nature and nuture, genes and the environment, interact without recourse to innate knowledge, remains “around the corner”.  It will be fantastic if Nick Ellis and all those working on emergentism turn out to be right, and, I’ll enthusiastically join in the celebrations, partly because if they’re right, then language learning will be shown to be an essentially implicit process, a process that gets little help from teaching based on using coursebooks to implement a grammar-based synthetic syllabus through PPP. I’ll discuss this a bit more in the final episode, Part 9, coming soon.


Bates, E., Elman, J., Johnson, M., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1998).  Innateness and emergentism. In W. Bechtel & G. Graham (Eds.), A companion to cognitive science (pp. 590-601). Oxford: Basil Blackwell.

Ellis, N. (2002) Frequency effects in language processing: A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in SLA, 24,2, 143-188.

Eubank, L. and Gregg, K. R. (2002) News Flash – Hume Still Dead. Studies in Second Language Acquisition, 24, 2, 237-248.

Greeg, K. R. (2003) The state of Emergentism in SLA. Second Language Research, 19, 2, 95-128.

Seidenburg, M. and Macdonald, M. (1997) A Probabilistic Constraints Approach to Language Acquisition and Processing. Cognitive Science Vol 23 (4), 569–588.

SLA Part 8: Two final processing models

Susanne Carroll

I’ll start by noting Carroll’s objection to Schmidt’s and Gass’s theories. She argues that if input refers to observable sensory stimuli in the environment, then it can’t play any significant role in L2 learning because the stuff of acquisition – phonemes, syllables, morphemes, nouns, verbs, cases, etc. – consists of mental constructs that exist in the mind and not in the external environment. As Gregg has repeatedly said “You can’t notice grammar!” Carroll (2001) says:

The view that input is comprehended speech is mistaken and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!”

In Carroll’s theory, learners don’t attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal. Basing herself partly on Fodor’s work, Carroll (2017) argues that we have to make a distinction between types of input.

On the one hand, we need to talk about INPUT-TO-LANGUAGE-PROCESSORS, e.g., bits of the speech signal that are fed into language processors and which will be analysable if the current state of the grammar permits it. On the other hand, we need a distinct notion of INPUT-TO-THE-LANGUAGE-ACQUISITION-MECHANISMS, which will be whatever it is that those mechanisms need to create a novel representation. For most of the learning problems that we are interested in, the input-to-the-language acquisition-mechanisms will not be coming directly from the environment.  

The Autonomous Induction Model 

Carroll uses Jackendoff’s (1987) modularity model and Holland, Holyoak, Nisbett, and Thagard’s (1986) induction model to build her own Autonomous Induction model. She sees our linguistic faculty as comprised of a chain of representations acting on different levels: the lowest level interacts with physical stimuli, and the highest with conceptual representations. At the lowest level of representation, the integrative processor combines smaller representations into larger units, while the correspondence processor is responsible for moving the representations from one level to the next. Once representations are formed, they are categorized and combined according to UG-based or long-term memory-based rules. During successful parsing, rules are activated in each processor to categorize and combine representations. Failures occur when the rules are inadequate or missing. Consequently, the rule that comes closest to successfully parse the specific unit would be selected and would undergo the most economical and incremental revision. This process is repeated until parsing succeeds or is at least passable at that given level.

So Carroll’s explanation of how stimuli from the environment end up as linguistic knowledge is that two different types of processing are involved: processing for parsing and processing for acquisition. When the parsers fail, the acquisitional mechanisms are triggered (a view, as I’ve already suggested, which aligns with the notion of incomprehensible input).

For my purposes, what’s important in Carroll’s account is that speech signal processing doesn’t involve noticing – it’s a subconconscious process where learners detect, encode and respond to linguistic sounds. Furthermore, Carroll argues that once the representation enters the interlanguage, learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; and that the processing of forms and meanings are often not noticed.

Caroll sees intake as a subset of stimuli from the environment, while Gass defines intake as a set of processed structures waiting to be incorporated into IL grammar. It seems to me that Carroll provides a better description of input, and she is surely right to say that cognitive comparison of representations (wherever it takes place) is largely automatic and subconconscious. Awareness, Carroll concludes, is something that occurs, if at all, only after the fact; and that’s a conclusion which just about all the theories I’ve looked at come to.

Towell and Hawkins’ Model of SLA

Towell and Hawkins (1994) model begins with UG, which sets the framework within which linguistic forms in the L1 and L2 are related. Learners of an L2 after the age of seven years old have only partial access to UG, namely UG principles; they will transfer parameter settings from their L1, and where such settings conflict with L2 data, they may construct rules to mimic the surface properties of the L2. The second internal source is thus the first language.  Learners may transfer a parameter setting, or UG may make possible a kind of mimicking.

The adoption of the “partial access to UG” hypothesis leads Towell and Hawkins to assume that there are two different sorts of knowledge involved in interlanguage development:  linguistic competence (derived from UG and L1 transfer) and learned linguistic knowledge (derived from explicit instruction and negative feedback).

To explain the way in which interlanguage develops when simple triggering of parameters doesn’t happen, Towell and Hawkins use Levelt’s model of language production (1989)  to introduce the distinction between procedural (subconscious, automatic) and declarative (conscious) knowledge, and then Anderson’s ACT* (Adaptive Control of Thought) model (1983), to explain how the declarative knowledge gets processed (see below).

The information processing mechanisms condition the way in which input  provides data for hypotheses, the way in which hypotheses must be turned into productions for fluent use, and the final output of productions. (Towell and Hawkins, 1994: 248)

The full model is presented below:

Input and output pass through short-term memory, which determines the information available to long-term memories, and is used to pass information between the two types of long-term memory proposed: the declarative memory and the procedural memory.

Short-term memory consists in that set of nodes activated in memory at the same time and allows certain operations to be performed on relatively small amounts of information for a given time.  The processes are either controlled (the subject is required to pay attention to the process while it is happening), or automatic.  Automatic processes are inflexible and take a long time to set up.  Once processes have been automatised the limited capacity can be used for new tasks.

All knowledge initially goes into declarative memory; the internally derived hypotheses offer substantive suggestions for the core of linguistic knowledge and those parameters common to both L1 and L2.  The other areas of language are worked out by the interaction of data with the internally derived hypotheses.

The model suggests four learning routes:

Route one:

confirmation by external data of an internal hypothesis leading to the creation of a production to be stored in procedural memory first in associative form (i.e. under attentional control) and then in autonomous form for rapid use via the short term memory. (Towell and Hawkins, 1994: 250)

Route two:

initial storage of a form-function pair in declarative memory as an unanalysed whole.  If it cannot be analysed by the learner’s grammar but can be remembered for use in a given context, it may be shifted to procedural memory at the associative level.  It may be re-called into declarative knowledge where it may be re-examined, and if it is now analysable, it may be converted to another level of mental organisation before being passed back to the procedural level. (Towell and Hawkins, 1994: 250-251)

Route three

concerns explicit rules, like verb paradigms, vocabulary lists, lists of prepositions.  This knowledge can only be recalled in the form in which it was learned, and can be used to revise and correct output. (Towell and Hawkins, 1994: 251)

Route four

concerns strategies, which facilitate the proceduralisation of mechanisms for faster processing of input and greater fluency.  These strategies do not interact with internal hypotheses. (Towell and Hawkins, 1994: 251)

Hypotheses derived from UG either directly or via L1 are available as declarative knowledge, i.e. hypotheses which are tested via controlled processing where learners pay attention to what they are receiving and producing.  If the hypotheses are confirmed, Towell and Hawkins say they “can be quickly launched on the staged progression described by Anderson (1983, 1985).”


UG  is used to explain transfer, staged development and cross-learner systematicity.  The UG prevents the learner from entertaining “wild” hypotheses about the L2, and allows the learner to “learn” a series of structures by perceiving that a certain relationship between the L1 and L2 exists. Towell and Hawkins’ “partial access” view of UG and SLA is reflected in their belief that there is a lack of positive evidence available to L2 learners to enable them to reset the parameters already set in the L1, and that “the older you are at first exposure to an L2, the more incomplete your grammar will be”.

I’m not convinced! The part played by declarative knowledge seems particularly odd. How does knowledge of  UG principles form part of declarative knowdge? And the ACT model looks to me like an awkward bolt-on. The distinction between declarative and procedural knowledge leaves unanswered the question of the nature of the storage of information in declarative and procedural forms, and there’s no explanation of how the externally-provided data interact with the internally-derived hypotheses.

More generally, this a very complex model which pays scant regard to the Occam’s Razor criterion. There’s a profusion of terms and entities postulated by the theory – principles and parameters, declarative memory and production memory, procedural and declarative knowledge, associative and automatic procedural knowledge, linguistic competence and linguistic knowledge, mimicking, the use of a language module, a conceptualiser, and a formulator – which means that only the accumulation of research results from testing would make a proper evaluation possible. This hasn’t happened.

I’ve outlined it here, first because it was once quite influential; second because it’s another example of an attempt to explain how input leads to L2 knowledge by passing through a series of mental processing routines, and third because it gives me the chance to discuss Anderson’s ACT model.

Anderson’s ACT model

When applied to second language learning, the ACT model suggests that learners are first presented with information about the L2 (declarative knowledge ) and then, via practice, this is converted into unconscious knowledge of how to use the L2 (procedural knowledge). The learner moves from controlled to automatic processing, and through intensive linguistically focused rehearsal, achieves increasingly faster access to, and more fluent control over the L2 (see DeKeyser, 2007, for example).

The fact that nearly everybody successfully learns at least one language as a child without starting with declarative knowledge, and that millions of people learn additional languages without studying them (migrant workers, for example) are good reasons to doubt that learning a language is the same as learning a skill such as driving a car. Furthermore, the phenomenon of L1 transfer doesn’t fit well with a skills based approach, and neither do putative senstive periods (critical periods) for language learning. But the main reason for rejecting such an approach is that it contradicts all the SLA research findings related to interlanguage development which we’ve been examining in this review of SLA theories.

Firstly, as has been made it doesn’t make sense to present grammatical constructions one by one in isolation because most of them are inextricably inter-related. As Long (2015) says:

Producing English sentences with target-like negation, for example, requires control of word order, tense, and auxiliaries, in addition to knowing where the negator is placed. Learners cannot produce even simple utterances like “John didn’t buy the car” accurately without all of those. It is not surprising, therefore, that interlanguage development of individual structures has very rarely been found to be sudden, categorical, or linear, with learners achieving native-like ability with structures one at a time, while making no progress with others. Interlanguage development just does not work like that.

Secondly, as we have seen, research has shown that L2 learners follow their own developmental route, a series of interlocking linguistic systems called “interlanguages”.  Myles (2013) states that the findings on the route of interlanguage (IL) development is one of the most well documented findings of SLA research of the past few decades. She asserts that the route is “highly systematic” and that it “remains largely independent of both the learner’s mother tongue and the context of learning (e.g. whether instructed in a classroom or acquired naturally by exposure)”. The claim that instruction can influence the rate but not the route of IL development is probably the most widely accepted claim among SLA scholars today.

Pienemann comments:

Fifteen years later, Anderson appears to have revised his position. He states  “With very little and often no deliberate instruction, children by the time they reach age 10 have accomplished implicitly what generations of Ph.D. linguists have not accomplished explicitly.  They have internalised all the major rules of a language..” (Anderson, 1995, 364). In other words, Anderson no longer sees language acquisition as an instance of the conversion of declarative into procedural knowledge.

In  addition, it is well-documented that procedural knowledge does not have to progress through a declarative phase.  In fact, human participants in experiments on non-conscious learning were not only unaware of the rules they applied, they were not even aware that they had acquired any knowledge  (Pienemann, 1998: 41).

Next up: emergentism. And that will be the last part of the review.



Anderson, J. (1983) The Architecture of Cognition.Cambridge, MA: Harvard University Press

Carroll, S.(2015)  Expose and input in bilingual development. Bilingualism, Language and Cognition, 20,1, 16-31.

Carroll, S. (2001) Input and Evidence. Amsterdam: Benjamins.

Krashen, S. (1985) The Input Hypothesis: Issues and Implications. New York: Longman.

Levelt, W. (1989) Speaking: From Intention to Articulation.Cambridge, MA: MIT Press.

Long, M. (2015) SLA and TBLT. Wiley.

Towell, R. and Hawkins, R. (1994) Approaches to second language acquisition. Clevedon: Multilingual Matters.


SLA Part 7: Processing Input

Corder’s (1967) paper is often given as the starting point for SLA discussion of input. It included the famous claim:

The simple fact of presenting a certain linguistic form to a learner in the classroom does not necessarily qualify it for the status of input, for the reason that input is “what goes in” not what is available for going in, and we may reasonably suppose that it is the learner who controls this input, or more properly his intake (p. 165).

Corder here suggests that SLA is a process of learner-controlled development of interlanguages, “interlanguages” referring to learner grammars, their evolving knowledge of the language. This marks a shift in the way SLA researchers perceived input. No longer a strictly external phenomenon, input is now the interface between the external stimuli and learners’ internal systems. Input is potential intake, and intake is what learners use for IL development; but it remains unclear  what mechanisms and sub processes are responsible for the input-to-intake conversion. We can start with Krashen.


Krashen’s Input Model 

Here, comprehensible input is the same as intake. It contains mostly language the learner already knows, but also unknown elements, including some that correspond to the next immediate step along the interlanguage development continuum. This comprehensible input has to get through the affective filter and is then processed by a special language processor which Krashen says is the same as Chomsky’s LAD. Thanks to this processor, some of the new elements in the input are subconsciously acquired and become part of the learner’s interlanguage. A completely different part of the mind processes a different kind of knowledge which is learned by paying conscious attention to what teachers and books and people tell the learner about the language. This conscious knowledge can be used to monitor and change output. It’s referred to in the top right part of the diagram. Just by the way, Hulstijn (2013) points out that nearly 30 years after Krashen made his much-criticised acquisition / learning distinction, cognitive neuro-scientists now agree that declarative, factual knowledge (Krashen’s ‘learned knowledge’) is stored in the medial temporal lobe (in particular in the hippocampus), whereas procedural, relatively unconscious knowledge (Krashen’s ‘acquired knowledge’) is stored and processed in various (mainly frontal) regions of the cortex.

We’ve already seen a number of objections to Krashen’s Theory as a theory, but the important thing here is to see how he relies on the LAD (plus a monitor) to explain how we learn an L2. The theory thus leans heavily on Chomsky’s explanation of L1 acquisition and says that L2 acquisition is more or less the same – all we need to learn a language is comprehensible input, because we’re hard wired with a device that allows us to make enough sense of enough of the input to slowly work out the system for ourselves.

Black boxes: the Processors 

All theories of SLA – even usage-based theories – assume that there are some parts of the mind (or brain for the strict empiricists) involved in processing stimuli from the environment. The LAD is simply one attempt to describe what the processor does; namely provide rules for making sense of the input. The rules, which Chomsky describes in successive formulations of UG ( best understood, I think in terms of the principles and parameters model) help young children to map form to meaning. O’Grady gives the example of the rules which help the child make sense of information about the type of meaning most often associated with particular word classes.

“For example, the acquisition device might “tell” children that words referring to concrete things must be nouns. So language learners would know right away that words like dog, boy, house, and tree belong to that word class. This might just be enough to get started. Once children knew what some nouns looked like, they could start noticing other things on their own – like the fact that items in the noun class can occur with locator words like this and that, that they can take the plural ending, that they can be used as subjects and direct objects, that they are usually stressed, and so on.

Nouns with locator words: That dog looks tired. This house is ours.

Nouns with the plural ending: Cats make me sneeze. I like cookies.

Nouns used as subject or direct object: Dogs chase cats. A man painted our house.  

Information of this sort can then be used to deal with words like idea and attitude, which cannot be classified on the basis of their meaning. (They are nouns, but they don’t refer to concrete things.) Sooner or later a child will hear these words used with this or that, or with a plural, or in a subject position. If she’s learned that these are the signs of nounhood, it’ll be easy to recognize nouns that don’t refer to concrete things. If all of this is on the right track, then the procedure for identifying words belonging to the noun class would go something like this. (Similar procedures exist for verbs, adjectives, and other categories.)

What the acquisition device tells the child: If a word refers to a concrete object, it’s a noun.

What the child then notices: Noun words can also occur with this and that; they can be pluralized; they can be used as subjects and direct objects.

What the child can then do: Identify less typical nouns (idea, attitude, etc.) based on how they are used in sentences.

This whole process is sometimes called bootstrapping. The basic idea is that the acquisition device gives the child a little bit of information to get started (e.g., a language must distinguish between nouns and verbs; if a word refers to a concrete object, it’s a noun) and then leaves her to pull herself up the rest of the way by these bootstraps.”

Of course, the rules that govern how the words are put together are also needed. O’Grady calls this “a blueprint”.  In a sentence like Jean helped Roger the three words combine – but how?

“Does the verb combine directly with the two nouns?

Or does it combine first with its subject, forming a larger building block that then combines with the direct object?

Or does it perhaps combine first with its direct object, creating a building block that then combines with the subject?

How could a child possibly figure out which of these design options is right? For that matter, how could an adult? Once again, the acquisition device must come to the rescue by providing the following vital bits of information:

  • Words are grouped into pairs.
  • Subjects (doers) are higher than direct objects (undergoers).

With this information in hand, it’s easy for children to build sentences with the right design” (O’Grady,

So that’s one view of the “black box”, the language processor. It is, in the view of many scholars working on SLA, the best explanation so far of how children acquire linguistic knowledge, and of how they know things about the language which are not present in the input – it answers the poverty of the stimulus question. The LAD offers an innate system of grammatical categories and principles which define language, confine how language can vary and change, and explain how children learn language so successfully. And, using a few additional assumptions, it can can explain SLA and why most people find it so challenging, too.

VanPatten’s Input Processing Theory

VanPatten sees things slightly differently. His Input Processing (IP) theory is concerned with how learners derive intake from input, where intake is defined as the linguistic data actually processed from the input and held in working memory for further processing.

As such, IP attempts to explain how learners get form from input and how they parse sentences during the act of comprehension while their primary attention is on meaning. VanPatten’s model consists of a set of principles that interact in working memory, and takes account of the fact that working memory has very limited processing capacity. Content lexical items are searched out first since words are the principal source of referential meaning. When content lexical items and a grammatical form both encode the same meaning and when both are present in an utterance, learners attend to the lexical item, not the grammatical form. Here are VanPatten’s Principles of Input Processing:

P1. Learners process input for meaning before they process it for form.

P1a. Learners process content words in the input first. P1b. Learners prefer processing lexical items to grammatical items (e.g., morphology) for the same semantic information.

P1c. Learners prefer processing “more meaningful” morphology before “less” or “nonmeaningful” morphology.

P2. For learners to process form that is not meaningful, they must be able to process informational or communicative content at no (or little) cost to attention.

P3. Learners possess a default strategy that assigns the role of agent (or subject) to the first noun (phrase) they encounter in a sentence/utterance. This is called the first-noun strategy.

P3a. The first-noun strategy may be overridden by lexical semantics and event probabilities.

P3b. Learners will adopt other processing strategies for grammatical role assignment only after their developing system has incorporated other cues (e.g., case marking, acoustic stress).

P4. Learners process elements in sentence/utterance initial position first.

P4a. Learners process elements in final position before elements in medial position.

Perhaps the most important construct in the IP model is “Communicative value”: the more a form has communicative value, the more likely it is to get processed and made available in the intake data for acquisition, and it’s thus the forms with no or little communicative value which are least likely to get processed and, without help, may never get acquired. Notice that this account, like Pienemann’s discussed in Part 5, and indeed like VanPatten’s and O’Grady’s (see below), explains the input processing in terms of rational decisions taken on the basis of making the best use of relatively scarce processing resources.

I’m zooming through these theories without doing any of them real justice, and I apologise to all the scholars’ work that’s getting such brief treatment, but I hope that both a picture of the various architectures proposed, and the story of how SLA theories progressed, can be got from all this. Before we go on, I can’t resist quoting what VanPatten says at the end of one his books:

1. If you teach communicatively, you’d better have a working definition of communication. My argument for this is that you cannot evaluate what is communicative and what is appropriate for the classroom unless you have such a definition.

2. Language is too abstract and complex to teach and learn explicitly. That is, language must be handled in the classroom differently from other subject matter (e.g., history, science, sociology) if the goal is communicative ability. This has profound consequences for how we organize language-teaching materials and approach the classroom.

3. Acquisition is severely constrained by internal (and external) factors. Many teachers labor under the old present + practice + test model. But the research is clear on how acquisition happens. So, understanding something about acquisition pushes the teacher to question the prevailing model of language instruction.

4. Instructors and materials should provide student learners with level-appropriate input and interaction. This principle falls out of the previous one. Since the role of input often gets lip service in language teaching, I hope to give the reader some ideas about moving input from “technique” to the center of the curriculum.

5. Tasks (and not Exercises or Activities) should form the backbone of the curriculum. Again, language teaching is dominated by the present + practice + test model. One reason is that teachers do not understand what their options are, what is truly “communicative” in terms of activities in class, and how to alternatively assess. So, this principle is crucial for teachers to move toward contemporary language instruction.

6. A focus on form should be input-oriented and meaning-based. Teachers are overly preoccupied with teaching and testing grammar. So are textbooks. Students are thus overly preoccupied with the learning of grammar.

O’Grady (How Children learn language is the best book you’ll ever read on the subject) offers a different view. He proposes a ‘general nativist’ theory of first and second language acquisition where a modular acquisition device that does not include Universal Grammar is described. O’Grady sees his work as forming part of the emergentist rubric, but obviously, since he sees the acquisition device as a modular part of mind, he’s a long way from the real empiricists in the emergentist camp. Interestingly, for us, O’Grady accepts that there are sensitive periods involved in language learning, and that problems adults face in L2 acquisition can be explained by the fact that adults have only partial access to the (non-UG) L1 acquisition device.

O’Grady describes a different kind of processor, doing more general things, but it’s still a language processor and it’s still working not just on segments of voice streams, and words, but on syntax, and thus still seeing language as an abstract system governed by rules of syntax. When it comes to the more empiricist type of emergentist – Bates and MacWhinney’s Competition Model, for example – then here the talk is of a very general kind of processor doing the work, and this processor works almost exclusively on words and their meanings. Which brings us to the rub, so to speak.

As O’Grady argues so forcefully, the real disagreement between nativists and the emergentists who, unlike O’Grady, adopt a more or less empiricist epistemology, is that they can’t agree on what syntactic categories and structures are like. The dispute over the nature of how input gets processed is really a dispute about the nature of language. If you see language as a highly complex formal system best described by abstract rules that have no counterparts in other areas of cognition (O’Grady gives the requirement that sentences have a binary branching syntactic structure as one example of such a “rule”), then you see the processor, the acquisition device, as designed specifically for language. But if you see language in terms of its communicative function, then, since communication involves different types of considerations (O’Grady gives the examples of new versus old information, point of view, the status of speaker and addressee, the situation) then you’ll see the processor, as a  multipurpose acquisition device working on very simple data. In my opinion, such a view actually fails to explain either language or the acquisition process, but we’ll come to that. For now, I’m trying to sketch theories of SLA in such a way that we may draw teaching implications from them.

Just to remind you, my argument is that language is best seen as a formal system of representations and that we learn it in a different way to the way we learn other things. We learn language implicitly, subconsciously, but as adults learning an L2, our access to the key processors is limited, so we need to supplement this learning with a bit of attention to some “fragile” (non salient, for example) elements. Which gets us nicely back to the main narrative.

Swain’s (1985) famous study of French immersion programmes led to her claim that comprehensible input alone can allow learners to reach high levels of comprehension, but their proficiency and accuracy in production will lag behind, even after years of exposure. Further studies gave more support to this view, and to the opinion that comprehensible input is the necessary but not sufficient condition for proficiency in an L2. Swain’s argument was that we must give more attention to output, but what took greater hold was the view that we need to “notice” formal features of the input.

Schmidt’s Noticing  (again)

In Part 4, I discussed Schmidt’s view. Here’s the diagram:

As we saw, Schmidt completely rejects Kashen’s model, and insists that it’s ‘noticing’, not the unconscious workings of  the LAD, that drives interlanguage development.  I outlined my objections to even the modified 2001 version of Schmidt’s noticing construct in Part 4, so let’s focus on the main one here: the construct doesn’t clearly indicate the roles of conscious and subconscious, or explicit and implicit learning. In the case of children learning their L1, the processing of input is mostly a subconscious affair, whether or not UG has anything to do with it.  For those over the age of 16 learning an L2, according to Krashen, it’s also mostly a subconscious process, although even Krashen admits that some conscious hard work at learning helps to speed up the process and to reach a higher level of proficiency.  But it’s not clear, at least to me, what Schmidt means by noticing, and to what extent he sees SLA as involving conscious learning. I think that his 2001 paper seems to concede that implicit learning is still the main driver of interlanguage development, and I think that’s what Long, for example, takes Schmidt to mean.

Gass: An Integrated view of SLA  

Gass (1997), influenced by Schmidt, offers a more complete picture of what happens to input. She says it goes through stages of apperceived input, comprehended input, intake, integration, and output, thus subdividing Krashen’s comprehensible input into three stages: apperceived input, comprehended input, and intake. I don’t quite get “apperceived input”; Gass says it’s the result of attention, in the similar sense as Tomlin and Villa’s (1994) notion of orientation, and Schmidt says it’s the same as his noticing, which doesn’t help me much. In any case, once the intake has been worked on in working memory, Gass stresses the importance of negotiated interaction during input processing and eventual acquisition. Here, she adopts Long’s highly influential construct of negotiation for meaning which refers to what learners do when there’s a failure in communicative interaction. As a result of this negotiation, learners get more usable input, they give attention (of some sort) to problematic features in the L2, and make mental comparisons between their IL and the L2. Gass says that negotiated interaction enhances the input in three ways:

  1. it’s made more comprehensible,
  2. problematic forms that impede comprehension are highlighted and forced to be processed to achieve successful communication.
  3. through negotiation, learners receive both positive and negative feedback that are juxtaposed immediately to the problematic form, and the close proximity facilitates hypothesis-testing and revision (Doughty, 2001).

Many scholar have commented that  these effects should be regarded as a facilitator of learning, not a mechanism for learning, and I have to say that in general I find the Gass model a rather unsatisfactory compilation of bits. Still, it’s part of the story, and it’s certainly a well-considered, thorough attempt to explain how input gets processed.

We still have to look at the theories of Towell and Hawkins, Susanne Carroll, Jan Hulstijn, and then Bates & MacWhinney and Nick Ellis.  The models reviewed so far agree on the need for comprehensible input; learners decode enough of the input to make some kind of  conceptual representation, which can then be compared with linguistic structures which already form part of the interlanguage.  As is so often the case with theories of learning (Darwin comes to mind), it’s the bits that don’t fit, or that can’t be parsed, that  cause a “mental jolt in processing”, as Sun (2008) calls it. It’s the incomprehensibility of the input that triggers learning, as I’m sure Schmidt would agree.



 Corder, S. P. (1967). The significance of learners’ errors. IRAL, 5, 161-170.

Faerch, C., & Kasper, G. (1980). Processing and strategies in foreign language learning and communication. The Interlangauge Studies Bulletin—Utrecht, 5, 47-118.

Gass, S. M. (1997). Input, interaction, and the second language learner. Mahwah, NJ: Lawrence Elrbaum.

Hulstijn, J. (2013) Is the Second Language Acquisition discipline disintegrating? Language Teaching, 46, 4 , pp 511 – 517.

Krashen, S. D. (1982). Principles and practice in second language acquisition. Oxford, UK: Pergamon.

Krashen, S. D. (1985). The input hypothesis: Issues and implications. London: Longman.

speech perception, reading, and psycholinguistics. New York: Academic Press.

O’Grady, W. (2912) How Children Learn Language. Cambridge, CUP

Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3-32). Cambridge UK: Cambridge University Press.

Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. M. Gass & C. G. Madden (Eds.), Input in second language acquisition (pp. 235-253). Rowley, MA: Newbury House.

VanPatten, B. (2017). While we’re on the topic…. BVP on Language, Language Acquisition, and Classroom Practice. Alexandria, VA: The American Council on the Teaching of Foreign Languages.

VanPatten, B. (2003). From Input to Output: A Teacher’s Guide to Second Language Acquisition. New York: McGraw-Hill.

VanPatten, B. (1996). Input Processing and Grammar Instruction: Theory and Research. Norwood, NJ: Ablex.

SLA Part 6: Pienemann’s Processability Theory

This theory started out as the Multidimensional Model, which came from work done by the ZISA group mainly at the University of Hamburg in the late seventies.  One of the first findings of the group was that all the children and adult learners of German as a second language in the study adhered to a five-stage developmental sequence.

Stage X – Canonical order (SVO)

die kinder spielen mim bait   the children play with the ball

Stage X + I- Adverb preposing (ADV)

da kinder spielen   there children play

Stage X + 2- Verb separation (SEP)

alle kinder muss die pause machen  all children must the break have

Stage X+3- Inversion (INV)

dam hat sie wieder die knock gebringt  then has she again the bone brought

Stage X+4- Verb-end (V-END)

er sagte, dass er nach house kommt  he said that he home comes

Learners didn’t abandon one interlanguage rule for the next as they progressed; they added new ones while retaining the old, and thus the presence of one rule implies the presence of earlier rules.

The explanation offered for this developmental sequence is that each stage reflects the learner’s use of three speech-processing strategies. Clahsen and Pienemann argue that processing is “constrained” by the strategies and development consists of the gradual removal of these constraints, or the “shedding of the strategies”, which allows the processing of progressively more complex structures. The strategies are:

(i) The Canonical Order Strategy.  The construction of sentences at Stage X obeys simple canonical order that is generally assumed to be “actor – action – acted upon.”  This is a pre-linguistic phase of acquisition where learners build sentences according to meaning, not on the basis of any grammatical knowledge.

(ii) The Initialisation-Finalisation Strategy. Stage X+1 occurs when learners notice discrepancies between their rule and input.  But the areas of input where discrepancies are noticed are constrained by perceptual saliency – it is easier to notice differences at the beginnings or the ends of sentences since these are more salient than the middle of sentences. As a result, elements at the initial and final positions may be moved around, while leaving the canonical order undisturbed.

Stage X+2 also involves this strategy, but verb separation is considered more difficult than adverb fronting, because the former requires not just movement to the end position but also disruption of a continuous constituent, the verb + particle, infinitive, or particle.

Stage X+3 is even more complex, since it involves both disruption and movement of an internal element to a non-salient position, and so requires the learner to abandon salience and recognise different grammatical categories.

(iii) The Subordinate Clause Strategy.  This is used in Stage X+4 and requires the most advanced processing, skills because the learner has to produce a hierarchical structure, which involves identifying sub-strings within a string and moving elements out of those sub-strings into other positions.

These constraints on interlanguage development are argued to be universal; they include all developmental stages, not just word order, and they apply to all second languages, not just German.

The ZISA model also proposed a variational dimension to SLA, and hence the name “Multidimensional”.  While the developmental sequence of SLA is fixed by universal processing restraints, individual learners follow different routes in SLA, depending primarily on whether they adopt a predominantly “standard” orientation, favouring accuracy, or a predominantly “simplifying” one, favouring communicative effectiveness.

Processability Theory

Pienemann ‘s next development (1998) is to expand the Multidemensional model into a Processability Theory, which predicts which grammatical structures an L2 learner can process at a given level of development.

This capacity to predict which formal hypotheses are processable at which point in development provides the basis for a uniform explanatory framework which can account for a diverse range of phenomena related to language development (Pienemann, 1998: xv).

The important thing about this theory is that while Pienemann describes the same route as other scholars have done for interlanguage development, in addition now he is offering an explanation for why interlanguage grammars develop in the way they do. His theory proposes that

for linguistic hypotheses to transform into executable procedural knowledge the processor needs to have the capacity of processing those hypotheses (Pienemann, 1998: 4).

Pienemann, in other words, argues that there will be certain linguistic hypotheses that, at a particular stage of development, the L2 learner cannot access because he or she doesn’t have the necessary processing resources available. At any stage of development, the learner can produce and comprehend only those L2 linguistic forms which the current state of the language processor can handle.

The processing resources that have to be acquired by the L2 learner will, according to Processability Theory, be acquired in the following sequence:

  1. lemma access,
  2. the category procedure,
  3. the phrasal procedure
  4. the S-procedure,
  5. the subordinate clause procedure – if applicable. (Pienemann, 1998: 7)

The theory states that each procedure is a necessary prerequisite for the following procedure, and that

the hierarchy will be cut off in the learner grammar at the point of the missing processing procedures and the rest of the hierarchy will be replaced by a direct mapping of conceptual structures onto surface form (Pienemann, 1998: 7).

The SLA process can therefore be seen as one in which the L2 learner entertains hypotheses about the L2 grammar and that this “hypothesis space” is determined by the processability hierarchy.


In this account of the SLA process, the mechanism at work is an information processing device, which is constrained by limitations in its ability to process input. The device adds new rules while retaining the old ones, and as the limiting “speech-processing strategies” which constrain processing are removed, this allows the processing of progressively more complex structures.

What is most impressive about the theory (it provides an explanation for the interlanguage development route) is also most problematic, since the theory takes as self-evident that our cognition works in the way the model suggests. We are told that people see things in a canonical order of “actor – action – acted upon.”, that people prefer continuous to discontinuous entities, that the beginnings and ends of sentences are more salient than the middles of sentences and so on, without being offered much justification for such a view, beyond the general assumption of what is easy and difficult to process. As Towell and Hawkins say of the Multidimensional Model:

They require us to take on faith assumptions about the nature of perception. The perceptual constructs are essentially mysterious, and what is more, any number of new ones may be invented in an unconstrained way (Towell and Hawkins, 1994: 50).

This criticism isn’t actually as damning as it might appear – there are obviously good reasons to suppose that simple things will be more easily processed than complex ones, there is a mountain of evidence from L1 acquisition studies to support some of the claims, and, of course, whatever new assumptions “may be invented” can be dealt with if and when they appear. As Pienemann makes clear, the assumptions he makes are common to most cognitive models, and, importantly, they result in making predictions that are highly falsifiable.

Apart from some vagueness about precisely how the processing mechanism works, and exactly what constitutes the acquisition of each level, the theory has little to say about transfer, and deals with a limited domain, restricting itself to an account of processing that accounts for speech production, and avoiding any discussion of linguistic theory.

In brief, the two main strengths of this theory are that it provides not just a description, but an explanation of interlanguage development, and that it is testable. The explanation is taken from experimental psycholinguistics, not from the data, and is thus able to make wide, strong predictions, and to apply to all future data.  The predictions the theory makes are widely-applicable and, to some extent, testable: if we can find an L2 learner who has skipped a stage in the developmental sequence, then we will have found empirical evidence that challenges the theory. Since the theory also claims that the constraints on processability are not affected by context, even classroom instruction should not be able to change or reduce these stages.

The Teachability Hypothesis 

Which brings us to the most important implication of Pieneamann’s theory: the Teachability Hypothesis. First proposed in 1984, this predicts that items can only be successfully taught when learners are at the right stage of interlanguage development to learn them. Note immediately that neither Pienemann or anybody else is claiming to know anything but the outlines of the interlanguage development route. We don’t have any route map, and even if we did, and even if we could identify the point where each of our students was on the map (i.e., where he or she was on his or her her interlanguage trajectory) this wouldn’t mean that explicit teaching of any particular grammar point or lexical chunk, for example, would lead to procedural knowledge of it. No; what Pienamann’s work does is to give further support to the view that interlanguage development is a cognitive process involving slow, dynamic reformulation and constrained by processing limitations.

Whether Pienemann’s theory gives a good explanation of SLA is open to question, to be settled by an appeal to empirical research and more critical interrogation of the constructs. But there’s no question that Pieneamann’s research adds significantly to the evidence for the claim that SLA is a process whose route is unaffected by teaching. In order to respect our students interlanguage development, we must teach in such a way that they are given the maximum opportunities to work things out for themselves, and avoid the mistake of trying to teach them things they’re not ready, or motivated, to learn.

For a good discussion of Pienemann’s theory, see the peer commentaries in the first issue of  Biligualism: Language and Cognition: Vol. 1, Number 1, 1998: entirely devoted to Processibility Theory.

SLA Part 5: Schmidt’s Noticing Hypothesis

(Note: Following a few emails I’ve received, I should make it clear that unless referring to UG, I use the word “grammar” in the sense that linguists use it; viz., “knowledge of a language”.)

Schmidt, undeterred by McLaughlin’s warning to stay clear of attempts to define “consciousness” , attempts to do away with its “terminological vagueness” by examining three senses of the term:

  1. consciousness as awareness,
  2. consciousness as intention,
  3. consciousness as knowledge.

1 Consciousness as awareness

Schmidt distinguishes between three levels of awareness: Perception, Noticing and Understanding. The second level, Noticing, is the key to Schmidt’s eventual hypothesis. Noticing is focal awareness.

When reading, for example, we are normally aware of (notice) the content of what we are reading, rather than the syntactic peculiarities of the writer’s style, the style of type in which the text is set, music playing on a radio in the next room, or background noise outside a window.  However, we still perceive these competing stimuli and may pay attention to them if we choose (Schmidt, 1990: 132).

Noticing refers to a private experience, but it can be operationally defined as “availability for verbal report”, and these reports can be used to both verify and falsify claims concerning the role of noticing in cognition.

2 Consciousness as intention

This distinguishes between awareness and intention behaviour. “He did it consciously”, in this second sense, means “He did it intentionally.” Intentonal learning is not the same as noticing.

3 Cnsciousness as knowledge

Schmidt suggests that 6 different contrasts (C) need to be distinguished:

C1: Unconscious learning refers to unawareness of having learned something.

C2: Conscious learning refers to noticing and unconscious learning to picking up stretches of speech without noticing them.  Schmidt calls this the “subliminal”  learning question: is it possible to learn aspects of a second language that are not consciously noticed?

C3: Conscious learning refers to intention and effort.  This is the incidental learning question: if noticing is required, must learners consciously pay attention?

C4: Conscious learning is understanding principles of the language, and unconscious learning is the induction of such principles.  This is the implicit learning question: can second language learners acquire rules without any conscious understanding of them?

C5: Conscious learning is a deliberate plan involving study and other intentional learning strategies, unconscious learning is an unintended by-product of communicative interaction.

C6: Conscious learning allows the learner to say what they appear to “know”.

Addressing C2, Schmidt points to diasagreement on a definition of intake. While Krashen seems to equate intake with comprehensible input, Corder distinguishes between what is available for going in and what actually goes in, but neither Krashen nor Corder explains what part of input functions as intake for the learning of form.   Schmidt also notes the distinction Slobin (1985), and Chaudron (1985) make between preliminary intake (the processes used to convert input into stored data that can later be used to construct language), and final intake (the processes used to organise stored data into linguistic systems).

Schmidt proposes that all this confusion is resolved by defining intake as:

that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139).

The implication of this is that:

subliminal language learning is impossible, and that noticing is the necessary and sufficient condition for converting input into intake (Schmidt, 1990:  130).

The only study mentioned by Schmidt in support of his hypothesis is by Schmidt and Frota (1986) which examined Schmidt’s own attempts to learn Portuguese, and found that his notes matched his output quite closely.  Schmidt himself admits that the study does not show that noticing is sufficient for learning, or that noticing is necessary for intake.  Nevertheless, Schmidt does not base himself on this study alone; there is, Schmidt claims evidence from a wider source:

… the primary evidence for the claim that noticing is a necessary condition for storage comes from studies in which the focus of attention is experimentally controlled. The basic finding, that memory requires attention and awareness, was established at the very beginning of research within the information processing model (Schmidt, 1990: 141).

Addressing C3, the issue of incidental learning versus paying attention, Schmidt acknowledges that the claim that conscious attention is necessary for SLA runs counter to both Chomsky’s rejection of any role for conscious attention or choice in L1 learning, and the arguments made by Krashen, Pienemann and others for the existence of a natural order or a developmental sequence in SLA.  Schmidt says that Chomsky’s arguments do not necessarily apply to SLA, and that

natural orders and acquisition sequences do not pose a serious challenge to my claim of the importance of noticing in language learning, …they constrain but do not eliminate the possibility of a role for selective, voluntary attention (Schmidt, 1990: 142).

Schmidt accepts that “language learners are not free to notice whatever they want” (Schmidt, 1990: 144), but, having discussed a number of factors that might influence noticing, such as expectations, frequency, perceptual salience, skill level, and task demands, concludes that

those who notice most, learn most, and it may be that those who notice most are those who pay attention most.  (Schmidt, 1990: 144)

As for C4, the issue of implicit learning versus learning based on understanding, Schmidt judges the question of implicit second language learning to be the most difficult “because it cannot be separated from questions concerning the plausibility of linguistic theories” (Schmidt, 1990: 149). But Schmidt rejects the “null hypothesis” which claims that, as he puts it, “understanding is epiphenomenal to learning, or that most second language learning is implicit” (Schmidt, 1990: 149).


Schmidt’s hypothesis caused an immediate stir within the academic community and quickly became widely-accepted.  It caused Mike Long to re-write his Interaction hypothesis and has been used by many scholars as the basis for studies of SLA. More importantly for my thesis, “noticing” is being increasingly used by teacher trainers, often with scant understanding of it, to justify concentrating on explicit grammar teaching.

I have the following criticisms to make of Schmidt’s noticing hypothesis.

1. Empirical support for the Noticing Hypothesis is weak

In response to a series of criticisms of his original 1990 paper, Schmidt’s 2001 paper gives various sources of evidence of noticing, all of which have been subsequently challenged:

a) Schmidt says learner production is a source of evidence, but no clear method for identifying what has been noticed is given.

b) Likewise, learner reports in diaries. Schmidt cites Schmidt & Frota (1986), and Warden, Lapkin, Swain and Hart (1995), but, as Schmidt himself points out, diaries span months, while cognitive processing of L2 input takes place in seconds. Furthermore, as Schmidt admits, making diaries requires not just noticing but reflexive self-awareness.

c) Think-aloud protocols. Schmidt agrees with the objection made to such protocols that studies based on them cannot assume that the protocols include everything that is noticed.  Schmidt cites Leow (1997), Jourdeais, Ota, Stauffer, Boyson, and Doughty (1995) who used think-aloud protocols in focus-on-form instruction, and Schmidt concludes that such experiments cannot identify all the examples of target features that were noticed.

d) Learner reports in a CALL context (Chapelle, 98) and programs that track the interface between user and program – recording mouse clicks and eye movements (Crosby 1998). Again, Schmidt concedes that it is still not possible to identify with any certainty what has been noticed.

e) Schmidt claims that the noticing hypothesis could be falsified by demonstrating the existence of subliminal learning either by showing positive priming of unattended and unnoticed novel stimuli or by showing learning in dual task studies in which central processing capacity is exhausted by the primary task. The problem in this case is that in positive priming studies one can never really be sure that subjects did not allocate any attention to what they could not later report, and similarly, in dual task experiments one cannot be sure that no attention is devoted to the secondary task. Jacoby, Lindsay, & Toth (1996, cited in Schmidt, 2001: 28) argue that the way to demonstrate true non-attentional learning is to use the logic of opposition, to arrange experiments where unconscious processes oppose the aims of conscious processes.

f) Merikle and Cheesman distinguish between the objective and subjective thresholds of perception. The clearest evidence that something has exceeded the subjective threshold and been consciously perceived or noticed is a concurrent verbal report, since nothing can be verbally reported other than the current contents of awareness. Schmidt argues that this is the best test of noticing, and that after the fact recall is also good evidence that something was noticed, providing that prior knowledge and guessing can be controlled.  For example, if beginner level students of Spanish are presented with a series of Spanish utterances containing unfamiliar verb forms, are forced to recall immediately afterwards the forms that occurred in each utterance, and can do so, that is good evidence that they did notice them. On the other hand, it is not safe to assume that failure to do so means that they did not notice.  It seems that it is easier to confirm that a particular form has not been noticed than that it has: failure to achieve above-chance performance in a forced-choice recognition test is a much better indication that the subjective threshold has not been exceeded and that noticing did not take place.

g) Truscott (1998) points out that the reviews by Brewer (1974) and Dawson and Schell (1987), cited by Schmidt, 1990), dealt with simple conditioning experiments and that, therefore, inferences regarding learning an L2 were not legitimate. Brewer specifically notes that his conclusions do not apply to the acquisition of syntax, which probably occurs ‘in a relatively unconscious ,  automatic fashion’ (p . 29). Truscott further points out that while most current research on unconscious learning is plagued by continuing controversy, “one can safely conclude that the evidence does not show that awareness of the information to be acquired is necessary for learning” (p. 108).

h) Altman (1990) gathered data in a similar way to Schmidt (1986) in studying her learning of Hebrew over a five-year period. Altman found that while half her verbalisation of Hebrew verbs could be traced to diary entries of noticing, it was not possible to identify the source of the other half and they may have become intake subconsciously.

i) Alanen’s (1992) study of Finnish L2 learning found no significant statistical difference between an enhanced input condition group and the control group.

j) Robinson’s (1997) study found mixed results for noticing under implicit, incidental, rule-search and instructed conditions.

Furthermore, studies of ‘noticing’ have been criticised for serious methodological problems:

i) The studies are not comparable due to variations in focus and in the conditions operationalized.

ii) The level of noticing in the studies may have been affected by variables which casts doubt on the reliability of the findings.

iii) Cross (2002) notes that “only Schmidt and Frota’s (1986) and Altman’s (1990) research considers how noticing target structures positively relates to their production as verbal output (in a communicative sense), which seems to be the true test of whether noticing has an effect on second language acquisition. A dilemma associated with this is that, as Fotos (1993) states, there is a gap of indeterminate length between what is noticed and when it appears as output, which makes data collection, analysis and correlation problematic.”

iv) Ahn (2014) points to a number of problems that have been identified in eye-tracking studies, especially those using heat map analyses. (See Ahn (2014) for the references that follow.)Heat maps are only “exploratory” (p. 239), and they cannot provide temporal information on eye movement, such as regression duration, “the duration of the fixations when the reader returns to the lookzone” (Simard & Foucambert, 2013, p. 213), which might tempt researchers to rush into a conclusion that favors their own predictions. Second, as Godfroid et al. (2013) accurately noted, the heat map analyses in Smith (2012) could not control the confounding effects of “word length, word frequency, and predictability, among other factors” (p. 490). This might have yielded considerable confounding effects as well. As we can infer from the analyses shown in Smith (2012), currently the utmost need in the field is for our own specific guidelines for using eye-tracking methodology to conduct research focusing on L2 phenomena (Spinner, Gass, & Behney, 2013). Because little guidance is available, the use of eye tracking is often at risk of misleading researchers into making unreliable interpretations of their results.


2 The construct of “noticing” is not clearly defined. Thus, it’s not clear what exactly it refers to, and, as has already been suggested above, there’s no way of assertaining when it is, and when it isn’t being used by L2 learners.

Recall that in his original 1990 paper, Schmidt claimed that “intake” was the sub-set of  input which is noticed, and that the parts of input that aren’t noticed are lost. Thus, Schmidt’s Noticing Hypothesis, in its 1990 version, claims that noticing is the necessary condition for learning an L2. Noticing is said to be the first stage of the process of converting input into implicit knowledge. It takes place in short-term memory (where, according to the original claim, the noticed ‘feature’ is compared to features produced as output) and it is triggered by these factors: instruction, perceptual salience, frequency, skill level, task demands, and comparing.

But what is it? It’s “focused attention”, and, Schmidt argues, attention research supports the claim that consciousness in the form of attention is necessary for learning, Truscott (1998) points out that such claims are “difficult to evaluate and interpret”. He cites a number of scholars and studies to support the view that the notion of attention is “very confused”, and that it’s “very difficult to say exactly what attention is and to determine when it is or is not allocated to a given task. Its relation to the notoriously confused notion of consciousness is no less problematic”. He concludes (1998, p. 107) “The essential point is that current  research and theory on attention, awareness and learning are not clear enough to  support any strong claims about relations among the three.”

In an attempt to clarify matters and answer his critics, Schmidt re-formulated his Noticing Hypothesis in 2001. A number of concessions are made, resulting in a much weaker version of the hypothesis. To minimise confusion, Schmidt says  he will use ‘noticing’ as a technical term equivalent to what Gass (1988) calls  “apperception”, what Tomlin and Villa (1994) call “detection within selective attention,” and what Robinson’s (1995) calls “detection plus rehearsal in short term memory.” So now, what is noticed are “elements of the surface structure of utterances in the input, instances of language” and not “rules or principles of which such instances may be exemplars”. Noticing does not refer to comparisons across instances or to reflecting on what has been noticed.

In a further concession, in the section “Can there be learning without attention?”, Schmidt admits there can, with the L1 as a source that helps learners of an L2 being an obvious example. Schmidt says that it’s “clear that successful second language learning goes beyond what is present in input”. Schmidt presents evidence which, he admits, “appears to falsify the claim that attention is necessary for any learning whatsoever”, and this prompts him to propose the weaker version of the Noticing Hypothesis, namely “the more noticing, the more learning”.

There are a number of problems with this reformulation.

Gass: Apperception

As was mentioned, Schmidt (2001) says that he is using ‘noticing’ as a technical term equivalent to Gass’ apperception. True to dictionary definitions of apperception, Gass defines apperception as “the process of understanding by which newly observed qualities of an object are initially related to past experiences”. The light goes on, the learner realises that something new needs to be learned. It’s “an internal cognitive act in which a linguistic form is related to some bit of existing knowledge (or gap in knowledge)”. It shines a spotlight on the identified form and prepares it for further analysis. This seems to clash with Schmidt’s insistence that noticing does not refer to comparisons across instances or to reflecting on what has been noticed, and in any case, Gass provides no clear explanation of how the subsequent stages of her model convert apperceptions into implicit knowledge of the L2 grammar.

Tomlin and Villa: Detection

Schmidt says that ‘noticing’ is also equivalent to what Tomlin and Villa (1994) call “detection within selective attention.” But is it? Surely Tomlin and Villa’s main concern is detection that does not require awareness. According to Tomlin and Villa, the three components of attention are alertness, orientation, and detection, but only detection is essential for further processing and awareness plays no important role in L2 learning.

Carroll: input doesn’t contain mental constructs; therefore they can’t be noticed

As Gregg commented when I discussed Scmidt’s hypthesis in my earlier blog “You can’t notice grammar!” Schmidt’s 2010 paper attempts to deal with Suzanne Carroll’s objection by first succinctly summarising Carroll’s view that attention to input plays little role in L2 learning because most of what constitutes linguistic knowledge is not in the input to begin with. She argues that Krashen, Schmidt and Gass all see “input” as observable sensory stimuli in the environment from which forms can be noticed,

whereas in reality the stuff of acquisition (phonemes, syllables, morphemes, nouns, verbs, cases, etc.) consists of mental constructs that exist in the mind and not in the environment at all. If not present in the external environment, there is no possibility of noticing them (Carroll, 2001, p.47).

Schmidt’s answer is:

In general, ideas about attention, noticing, and understanding are more compatible with instance-based, construction-based and usage-based theories (Bley-Vroman, 2009; Bybee & Eddington, 2006; Goldberg, 1995) than with generative theories.

It seems that Schmidt, in an attempt to save his hypothesis, is prepared, to ditch what Carroll refers to as “100 years of linguistic research, which  demonstrates that linguistic cognition is structure dependent”, and to adopt the connectionist view that linguistic knowledge is encoded as activated neural nets, and that it is linked to acoustic events by no more than association.

I think it’s worth quoting a bit more from Carroll’s impressive 2001 book. Commenting on all those who start with input, she says:

The view that input is comprehended speech is mistaken  and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!” 

Learners do not attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal.  Thus, it is possible to have speech-signal processing without attention-as-noticing or attention-as-awareness. Learners may unconsciously and without awareness detect, encode and respond to linguistic sounds; learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; the processing of forms and meanings are often not noticed; and attention is thus the result of processing not a prerequisite for processing.


In brief:

1. In his 2010 paper, Schmidt confirms the concessions made in 2001, which amount to saying that ‘noticing’ is not needed for all L2 learning, but that the more you notice the more you learn. He also confirms that noticing does not refer to reflecting on what is noticed.

2. The Noticing Hypothesis even in its weaker version doesn’t clearly describe the construct of ‘noticing’.

3. The empirical support claimed for the Noticing Hypothesis is not as strong as Schmidt (2010) claims.

4. A theory of SLA based on noticing a succession of forms faces the impassable obstacle that, as Schmidt seemed to finally admit, you can’t ‘notice’ rules, or principles of grammar.

5. “Noticing the gap” is not sanctioned by Schmidt’s ammended Noticing Hypothesis.

6. The way that so many writers and ELT trainers use “noticing” to justify all kinds of explicit grammar and vocabulary teaching demonstrates that Scmidt’s Noticing Hypothesis is widely misunderstood and misused.



Ahn, J. I. (2014) Attention, Awareness, and Noticing in SLA: A Methodological Review.  MSU Working Papers in SLS, Vol. 5.

Carroll, S. (2001) Input and Evidence. Amsterdam; Benjamins.

Corder, P. (1967) The significance of learners’ errors. International Review of Applied Linguistics, 5, 161-169

Cross, J. (2002) ‘Noticing’ in SLA: Is it a valid concept? Downloaded from

Ellis, N. (1998) Emergentism, Connectionism and Language Learning. Language Learning 48:4,  pp. 631–664.

O’Grady, W. (2005) How Children learn language. CUP.

Schmidt,R.W. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. and Frota, S.N.  (1986) Developing  basic  conversational  ability in  a  second language:  a  case  study of an adult learner of Portuguese . In Day , R.R., editor,  Talking to learn: conversation in second language acquisition. Rowley, MA: Newbury.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Truscott, J. (1998)  Noticing in second language acquisition: a critical review. Second Language Research, 2.

SLA Part 4: Behaviourism and Mentalism

The Shift From a Behaviourist to a Cognitivist View of SLA

Before proceeding with the review of SLA, I need to recap the story so far, in order to highlight the difference between two contradictory epistemologies. I do so for two reasons. Firstly, we are seeing a return to behaviourism in the guise of increasingly popular, and increasingly misinterpreted, usage-based theories of language learning such as emergentism. The epistemological underpinnings of these theories are rarely mentioned, particularly by ELT teacher trainers who either clumsily endorse them or airily dismiss them. Secondly, it gives me an opportunity to restate the implications of the shift to a more cognitive view of the SLA process.


Behaviourism has much in common with logical positivism, the most spectacularly misguided movement in the history of philosophy. Chasing the chimera of absolute truth, the logical positivists, most famously, those in the Vienna Circle formed in the early 1920s, had as their goal nothing less than to clean up language and put science on a sure empirical footing. The mad venture they embarked on didn’t last long – it was all over before the second world war broke out, leaving a legacy of unusually unpleasant academic in-fighting behind it.

But behaviourism has a much longer history. It began with the 1913 work of pioneering American psychologist John B. Watson, and went on when B.F. Skinner took over after WW2. Watson was influenced by the work of Pavlov (1897) and Bekhterev (1896) on conditioning of animals, but later was much taken by the works of two stars of the logical positivist movement, namely Mach (1924) and Carnap (1927) from the Vienna School, under whose influence he attempted to make psychological research “scientific”, by using only “objective procedures”, such as laboratory experiments which were designed to establish statistically significant results. Watson formulated a stimulus-response theory of psychology according to which all complex forms of behaviour are explained in terms of simple muscular and glandular elements that can be observed and measured.  No mental “reasoning”, no speculation about the workings of any “mind”, were allowed. Thousands of researchers adopted this methodology, and from the end of the first world war until the 1950s an enormous amount of research on learning in animals and in humans was conducted under this strict empiricist regime.

In 1950 behaviourism could justly claim to have achieved paradigm status, and at that moment, B.F. Skinner became its new champion.  Skinner’s contribution to behaviourism was to challenge the stimulus-response idea at the heart of Watson’s work and replace it by a type of psychological conditioning known as reinforcement (see Skinner, 1957, and Toates and Slack, 1990).  Important as this modification was, it is Skinner’s insistence on a strict empiricist epistemology, and his claim that language is learned in just the same way as any other complex skill is learned, by social interaction, that is important here.

The strictly empiricist epistemology of  behaviourism outlaws any talk of mental structure or of internal mental states. While it’s perfectly OK to talk about these things in every day parlance, they have no place in scientific discourse. Strictly speaking –  which is how scientists, including psychologists should speak – there is no such thing as the mind, and there is no sense (sic) in talking about feelings or any other stuff that can’t be observed by appeal to the senses. Behaviourism sees psychology as the science of behaviour, not the science of mind. Behaviour can be described and explained without any ultimate reference to mental events or to any internal psychological processes. The sources of behaviour are external (in the environment), not internal (in the mind). If mental terms or concepts are used to describe behaviour, then they must be replaced by behavioural terms or paraphrased into behavioural concepts.

Behaviour is all there is: humans and animals are organisms that can be observed doing things, and the things they do are explained in terms of responses to their environment, which also explains all types of learning.  Learning a language is like learning anything else – it’s the result of repeated responses to stimuli.  There are no innate rules by which organisms learn, which is to say that organisms learn without being innately or pre-experientially provided with explicit procedures by which to learn. Before organisms interact with the environment they know nothing – by definition. Learning doesn’t consist of rule-governed behaviour; learning is what organisms do in response to stimuli. An organism learns from what it does, from its successes and mistakes, as it were.

The minimalist elegance of such a stark view is impressive, even attractive, – especially if you’re sick of trying to make sense of Freud, Jung, or Adler, perhaps – but it makes explaining unobservable phenomena, whatever they happen to be, problematic, to say the least. Still, for Amerrican scholars immersed in the field of foreign language learning in the post WW2 era, a field not exactly renowned  for its contributions to philosophy or scientific method, behaviourism had a lot going for it: an easily-grasped theory with crystal clear pedagogic implications. The opposition to the Chomskian threat was entirely understandable, but, historically at least, we may note that their case collapsed like a house of cards. Casti (1989) points out that a Kuhnian paradigm shift is nowhere more completely and swiftly brought about in the 20th century than by Chomsky in linguistics.

In his 1957 Verbal Behaviour, Skinner put forward his view that language learning is a  process of habit formation involving associations between an environmental stimulus and a particular automatic response, produced through repetition with the help of reinforcement. This view of learning was challenged by Chomsky’s (1959) Review of Skinner’s Verbal Behaviour, where he argued that language learning was quite different from other types of learning and could not be explained in terms of habit-formation. Chomsky’s revolutionary argument, begun in Syntactic Structures (1957), and consequently developed in Aspects of the Theory of Syntax (1965) and Knowledge of Language (1986) was that all human beings are born with an innate grammar – a fixed set of mental rules that enables children to create and utter sentences they have never heard before. Chomsky asserted that language learning was a uniquely human capacity, a result of Homo Sapiens’s possession of what Chomsky at first referred to as a Language Acquisition Device. Chomsky developed his theory and later claimed that language consists of a set of abstract principles that characterise the core grammars of all natural languages, and that the task of learning one’s L1 is thus simplified since one has an innate mechanism that constrains possible grammar formation.  Children do not have to learn those features of the particular language to which they are exposed that are universal, because they know them already.  The job of the linguistic was to describe this generative, or universal, grammar, as rigorously as possible.

So the lines are clearly drawn. For Skinner, language learning is a behavioural phenomenon, for Chomsky, it’s a mental phenomenon. For Skinner, verbal behaviour is the source of learning; for Chomsky it’s the manifestation of what had been learned. For Skinner, talk of innate knowledge is little short of gibberish; for Chomsky it’s the best explanation he can come up with for the knowledge children have of language.

In SLA Part 1, I described how, under the sway of a behaviourist paradigm, researchers in SLA viewed the learner’s L1 as a source of interference, resulting in errors. In SLA Part 2, I described how, under the new influence of a mentalist paradigm, researchers now viewed learners as drawing on their innate language learning capacity to construct their own distinct linguistic system, or  interlanguage. The view of learning an L2 changes from one of accumulating new habits while trying to avoid mistakes (which only entrench bad past habits), to one of a cognitive process, where errors are evidence of the learner’s ‘creative construction’ of the L2.  Research into learner errors and into learning specific grammatical features, gave clear evidence to support the mentalist view. The research showed that all learners, irrespective of their L1, seemed to make the same errors, which in turn supported the view that learners were testing hypotheses about the target language on the basis of their limited experience, and making appropriate adjustments to their developing interlanguage system. Far from being evidence of non-learning, errors were thus clear signs of interlanguage development.

Furthermore, and very importantly in terms of its pedagogic implications, interlanguage development, seen as a kind of built-in syllabus, could be observed following the same route, regardless of differences in the L1 or of the linguistic environment. It was becoming clear that (leaving aside the question of maturational constraints for a moment) learning an L2 involved moving along a universal route which was unaffected by the L1, or by the learning environment – classroom, workplace, home, wherever. Just as importantly, the research showed that L2 learning is not a matter of successively accumulating parts of the language one bit after the other. Rather, SLA is a dynamic process involving the gradual development of a complex system. Learners can sometimes take several months to fully acquire a particular  feature, and the learning process is anything but linear: it involves slowly and unsystematically moving through a series of transitional stages, including zigzags, u-shaped patterns, stalls, and plateaus, as learners’ interlanguages are constantly adjusted, reformulated, and rebuilt in such a way that they gradually approximate more to the L1 model.

A picture is thus emerging of SLA as a learning process with two important characteristics.

  1. Knowledge of the L2 develops along a route which is impervious to instruction, and
  2. it develops in a dynamic, nonlinear way, where lots of different parts of the developing system are being worked on at the same time.

As we continue the review, we’ll look at declarative and procedural knowledge, explicit and implicit knowledge, and explicit and implicit learning, and this will indicate the third important characteristic of the SLA process:

3. Implicit learning is the default mechanism for learning an L2.

We’ll then be in a stronger position to argue that teacher trainers who advise their trainees to devote the majority of classroom time to the explicit teaching of a sequence of formal elements of the L2 are grooming those trainees for failure.

For References See “Bibliography ..” in Header 

SLA Part 3: From Krashen to Schmidt

Developing a transition theory

What is the process of SLA? How do people get from no knowledge of the L2 to some level of proficiency? Before going on, I should make it clear that I’m only looking at psycholinguistic theories, thus ignoring important social aspects of L2 learning and, even within the realm of cognition, leaving out such factors as aptitude and motivation.

 Krashen’s Monitor Model

Krashen’s (1977a, 1977b, 1978, 1981, 1982, 1985) Monitor Model  came hard on the heels of Corder’s work, and contains the following five hypotheses:

The Acquisition-Learning Hypothesis.

Adults have two ways of developing L2 competence:

  1. via acquisition, that is, picking up a language naturally, more or less like children do their L1, by using language for communication. This is a subconscious process and the resulting acquired competence is also subconscious.
  2. via language learning, which is a conscious process and results in formal knowledge of the language.

For Krashen, the two knowledge systems are separate.”Acquired” knowledge is what explains communicative competence.  Knowledge gained through “learning” can’t be internalised and thus serves only the very minor role of acting as a monitor of the acquired system, checking the correctness of utterances against the formal knowledge stored therein.

 The Natural Order Hypothesis

The rules of language are acquired in a predictable way, some rules coming early and others late. The order is not determined solely by formal simplicity, and it is independent of the order in which rules are taught in language classes.

The Monitor Hypothesis

The learned system has only one, limited, function: to act as a Monitor.  Further, the Monitor cannot be used unless three conditions are met:

  1. Enough time. “In order to think about and use conscious rules effectively, a second language performer needs to have sufficient time” (Krashen, 1982:12).
  2. Focused on form “The performer must also be focused on form, or thinking about correctness” (Krashen, 1982: 12).
  3. Knowledge of the rule.

The Input Hypothesis

Second languages are acquired by understanding language that contains structure “a bit beyond our current level of competence (i + 1) by receiving “comprehensible input”.  “When the input is understood and there is enough of it, i + 1 will be provided automatically.  Production ability emerges.  It is not taught directly”  (Krashen, 1982: 21-22).

The Affective Filter Hypothesis

The Affective Filter is “that part of the internal processing system that subconsciously screens incoming language based on … the learner’s motives, needs, attitudes, and emotional states” (Dulay, Burt, and Krashen, 1982: 46). If the affective Filter is high, (because of lack of motivation, or dislike of the L2 culture, or feelings of inadequacy, for example) input is prevented from passing through and hence there is no acquisition.  The Affective Filter is responsible for individual variation in SLA (it is not something children use) and explains why some learners never acquire full competence.


The biggest problem with Krashen’s account is thatThere is no way of testing the Acquisition-Learning hypothesis: we are given no evidence to support the claim that two distinct systems exist, nor any means of determining whether they are, or are not, separate.  Similarly, there is no way of testing the Monitor hypothesis: with no way to determine whether the Monitor is in operation or not, it is impossible to determine the validity of its extremely strong claims. The Input Hypothesis is equally mysterious and incapable of being tested: the levels of knowledge are nowhere defined and so it is impossible to know whether i + 1 is present in input, and, if it is, whether or not the learner moves on to the next level as a result.  Thus, the first three hypotheses make up a circular and vacuous argument.  The Monitor accounts for discrepancies in the natural order, the learning-acquisition distinction justifies the use of the Monitor, and so on.

Further, the model lacks explanatory adequacy. At the heart of the model is the Acquisition-Leaning Hypothesis which simply states that L2 competence is picked up through comprehensible input in a staged, systematic way, without giving any explanation of the process by which comprehensible input leads to acquisition.  Similarly, we are given no account of how the Affective Filter works, of how input is filtered out by an unmotivated learner.

Finally, Krashen’s use of key terms, such as “acquisition and learning”, and “subconscious and conscious”, is vague, confusing, and, not always consistent.

In summary, while the model is broad in scope and is intuitively appealing, Krashen’s key terms are ill-defined, and circular, so that the set is incoherent. The lack of empirical content in the five hypotheses means that there is no means of testing them.  As a theory it has such serious faults that it is not really a theory at all.

And yet, Krashen’s work has had an enormous influence, and in my opinion, rightly so. While the acquisition / learning distinction is badly defined, it is, nevertheless, absolutely crucial to current attempts to explain SLA; all the subsequent work on implicit and explicit learning, knowledge, and instruction starts here, as does the work on interlanguage development. Since the questions of conscious and unconscious learning, and of interlanguage development are the two with the biggest teaching implications, and since I think Krashen was basically right about both issues, I personally see Krashen’s work as of enormous and enduring importance.

Processing Approaches

A) McLaughlin: Automaticity and Restructuring

McLaughlin’s (1987) review of Krashen’s Monitor Model is considered one of the most complete rebuttals offered (but see Gregg, 1984). In an attempt to overcome the problems of finding operational definitions for concepts used to describe and explain the SLA process, McLaughlin went on the argue (1990) that the distinction between conscious and unconscious should be abandoned in favour of clearly-defined empirical concepts.  McLaughlin substitutes the use of the conscious /unconscious dichotomy with the distinction between controlled and automatic processing. Controlled processing requires attention, and humans’ capacity for it is limited; automatic processing does not require attention, and takes up little or no processing capacity.  So, McLaughlin argues, the L2 learner begins the process of acquisition of a particular aspect of the L2 by relying heavily on controlled processing; then, through practice the learner’s use of that aspect of the L2 becomes automatic.

McLaughlin uses the twin concepts of Automaticity and Restructuring to describe the cognitive processes involved in SLA. Automaticity occurs when an associative connection between a certain kind of input and some output pattern occurs.   Many typical greetings exchanges illustrate this:

Speaker 1: Morning.

Speaker 2: Morning. How are you?

Speaker 1: Fine, and you?

Speaker 2: Fine.

Since humans have a limited capacity for processing information, automatic routines free up more time for processing new information. The more information that can be handled automatically, the more attentional resources are freed up for new information.  Learning takes place by the transfer of information to long-term memory and is regulated by controlled processes which lay down the stepping stones for automatic processing.

The second concept, restructuring, refers to qualitative changes in the learner’s interlanguage as they move from stage to stage, not to the simple addition of new structural elements.  These restructuring changes are, according to McLaughlin, often reflected in “U-shaped behaviour”, which refers to three stages of linguistic use:

  • Stage 1: correct utterance,
  • Stage 2: deviant utterance,
  • Stage 3: correct usage.

In a study of French L1 speakers learning English, Lightbown (1983) found that, when acquiring the English  “ing” form, her subjects passed through the three stages of U-shaped behaviour.  Lightbown argued that as the learners, who initially were only presented with the present progressive, took on new information – the present simple – they had to adjust their ideas about the “ing” form.  For a while they were confused and the use of “ing” became less frequent and less correct. TBelow is a diagram showing the same process for past tense forms:


McLaughlin suggested getting rid of the unconscious / conscious distinction because it wasn’t properly defined by Krashen, but in doing so he threw the baby out with the bathwater. Furthermore, we have to ask to what extent the terms “controlled processing” and “automatic processing” are any better; after all, measuring the length of time necessary to perform a given task is a weak type of measure, and one that does little to solve the problem it raises.

Still, the “U-shaped” nature of staged development has been influential in successive attempts to explain interlanguage development, and we may note that McLaughlin was, with Bialystock, among the first scholars to apply general cognitive psychological concepts of computer-based information-processing models to SLA research.  Chomsky’s Minimalist Program confirms his commitment to the view that cognition consists in carrying out computations over mental representations.  Those adopting a connectionist view, though taking a different view of the mind and how it works, also use the same metaphor.  Indeed the basic notion of “input – processing – output” has become an almost unchallenged account of how we think about and react to the world around us.  While in my opinion the metaphor can be extremely useful, it is worth making the obvious point that we are not computers.  One may well sympathise with Foucault and others who warn us of the blinding power of such metaphors.

Schmidt’s noticing hypothesis

Rather than accept McLaughlin’s advice to abandon the search for a definition of “consciousness”, Schmidt attempts to do away with its “terminological vagueness” by examining it in detail. His work has proved enormously influential, but I think there are serious problems with the “Noticing Hypothesis”, and that is has been widely misinterpreted in order to justify types of explicit instruction that are not actually supported by a more considered view of the evidence. I’ll deal with this in Part 4.

See Bibliography in Header for all references 

SLA Part 2: Cognitive Theories  


The paradigm shift from behaviourism to “mentalism”, caused by the publication of Chomsky’s Syntactic Structures in 1957 and subsequent confrontations with Skinner, meant that language learning was now seen as a psychological process going on in “the mind”, a revived construct which had been proscribed by the behaviourists. SLA scholars turned their attention away from teaching and towards the mental process of learning an L2. We may note that they usually ignored Gregg’s perfectly justifiable demand for a property theory; they worked in different, limited domains; their explanations used confusing, sometimes contradictory constructs; their hypotheses were often difficult, sometimes impossible, to test; their studies got support from inconclusive, insufficient, and flawed research; none of the theories provided a full or satisfactory explanation of SLA; and there was little consensus among researchers about fundamental questions of domain or research methodology. While, 60 years later, I think it’s fair to say that there is still no full or satisfactory explanation of SLA, I think it’s also fair to say that a great deal of progress has been made, and that there are now robust, reliable reseach findings which, if given the attention they deserve and acted on, would transform ELT practice.

Error Analysis

We may begin with Pit Corder, who, in 1967, argued that errors were neither random nor best explained in terms of interference from the learner’s L1; errors were indications of learners’ attempts to figure out an underlying rule-governed system.  Corder distinguished between errors and mistakes: mistakes are slips of the tongue and not systematic, whereas errors are indications of an as yet non-native-like, but nevertheless, systematic, rule-based grammar.  It’s easy to see such a distinction as reflecting Chomsky’s distinction between performance and competence, and to interpret Corder’s interest in errors as an interest in the development of a learner’s grammar.

But error analysis, by concentrating exclusively on errors, failed to capture the full picture of a learner’s linguistic behaviour. Schachter (1974) in a study of the compositions of Persian, Arabic, Chinese and Japanese learners of English focusing on their use of relative clauses, found that the Persian and Arabic speakers had a far greater number of errors, but that the Chinese and Japanese students produced only  half as many relative clauses as did the Persian and Arabic students. Schachter then looked at the students’ L1 and found that Persian and Arabic relative clauses are similar to English in that the relative clause is placed after the noun it modifies, whereas in Chinese and Japanese the relative clause comes before the noun.  She concluded that Chinese and Japanese speakers of English use relative clauses cautiously but accurately because of the distance between the way their L1 and the L2 (English) form relative clauses.  While Schacter’s main aim was to challenge the strong claims of the Contrastive Analysis Hypothesis, her study drew attention to the fact that one needed to look at what learners get right as well as what they get wrong.


Error analysis had a pedagogical goal: by identifying, classifying, and quantifying errors, remedial work could be planned, based on the kind and frequency of the error.  Nevertheless, the seeds of a powerful SLA theory, covering a much wider domain, were planted.  Although Corder focused on teaching methodology, the long-term effect of error analysis was to shift SLA research away from teaching and towards  studying how L2 learners formulate and internalise a grammar of the L2, on the basis of exposure to the language and some kind of internal processing.


9.3. The Morpheme Order Studies

While not in a strict chronological sense, the next development in SLA theory was provoked by the morpheme order studies.  Dulay and Burt (1975) claimed that fewer than 5% of errors were due to native language interference, and that errors were, as Corder suggested, in some sense systematic, that there was something akin to a Language Acquisition Device at work not just in first language acquisition, but also in SLA.  The morpheme studies of Brown in L1 (1973) led to studies in L2 by Dulay & Burt (1973, 1974a, 1974b, 1975), and Bailey, Madden & Krashen (1974), all of which suggested that there was a natural order in the acquisition of English morphemes, regardless of L1. This became known as the L1 = L2 Hypothesis, and further studies (by Ravem (1974), Cazden, Cancino, Rosansky & Schumann (1975), Hakuta (1976), and Wode (1978), cited in Larsen-Freeman and Long, 1991), all pointed to systematic staged development in SLA.

Some of these studies, particularly those of Dulay and Burt, and of Bailey, Madden and Krashen, were soon challenged.  Among the objections to the findings were:

  • The Bilingual Syntax Measure was claimed to have skewed results – it was suggested that any group of learners taking the test would have produced similar results.
  • The category “non-Spanish” was too wide.
  • Morphemes of different meanings were categorised together, e.g., the English article system.
  • Accuracy orders do not necessarily reflect developmental sequences. The total picture of a learner’s use of a form was not taken into account.
  • The type of data elicited was “forced”.

After the original studies, over fifty new L2 morpheme studies were carried out, many using more sophisticated data collection and analysis procedures (including an analysis of the subjects’ performance in supplying morphemes in non-obligatory, as well as obligatory, contexts), and the results of these studies went some way to restoring confidence in the earlier findings (Larsen Freeman and Long, 1991: 91).


For all its imperfections, this research marked a decisive turning-point in the development of SLA theory, because it focused attention on learners’ staged development in the L2.  Even so, it was a modest start: the morpheme studies left most questions unanswered, since even if English morphemes are acquired in a predictable order, this doesn’t mean that all acquisition takes place in a predictable order. As Gregg (1984) pointed out, the morpheme studies lacked any explanation of why this “natural order” was systematic.







Base Camp: Interlanguages Identified 

The emerging cognitive paradigm of language learning perhaps received its full expression in Selinker’s (1972) paper which argues that L2 learners develop their own autonomous mental grammar (interlanguage (IL) grammar) with its own internal organising principles.

Question forms were the first stage of this interlanguage to be identified. In a study of six Spanish L1 students over a 10-month period, Cazden, Cancino, Rosansky and Schumann (1975) found that the participants produced interrogative forms in a predictable sequence:

  1. Rising intonation (e.g., He works today?),
  2. Uninverted WH (e.g., What he (is) saying?),
  3. “Overinversion” (e.g., Do you know where is it?),
  4. Differentiation (e.g., Does she like where she lives?).

Negation was the next example, reported in Larsen-Freeman and Long (1991: 94), where learners from a variety of different L1 backgrounds were seen to go through the same four stages in acquiring English negation:

  1. External (e.g., No this one./No you playing here),
  2. Internal, pre-verbal (e.g., Juana no/don’t have job),
  3. Auxiliary + negative (e.g., I can’t play the guitar),
  4. Analysed don’t (e.g., She doesn’t drink alcohol.)


Attention is now being overtly focused on the phenomena of staged development and systematicity in L2 learning; the work is attempting to explain the process of SLA, and to suggest what mechanisms are involved.  To the extent that such studies can be taken as support for the view that interlanguage development is at least in part explained by language universals, they can be seen as related to UG theory, but we must stress the distance between the two types of analysis. Interlanguage is not seen in terms of principles and parameters, it is concerned, among other things, with surface grammar, with processing, with skills acquisition.

It’s important to note the agreement among researchers at this time that “incompleteness” was a much-commented on phenomenon: most L2 learners’ ultimate level of L2 attainment falls well short of native-like competence.  Here again, the difference between L1 acquisition and SLA, and the resultant differences in the appropriate approaches to research and explanation in the two fields is clear.  Selinker and Lamendella (1978) suggested that both internal factors (e.g., age, lack of desire to acculturate) and external factors (e.g., communicative pressure, lack of input and opportunities to practice, lack of negative feedback) are at work, but the precise role these factors play and how they interact was not explained.

So, there are signs of progress. As work progresses, problems of defining key terms and constructs (particularly the implicit/explicit, procedural/declarative, automatic/ controlled dichotomies) are worked on; and, as more stages in interlanguage development are identified  (questions, negation, word order, embedded clauses and pronouns being the most important areas (see Braidi, 1999)), things get at once clearer and more complex: the dynamic nature of SLA means that differentiating between different stages is difficult, the stages overlap, and there are variations within stages as McLaughlin’s theory (see Part 3) suggests.


References: See Bibliography in Header

SLA Part 1: Contrastive Analysis

This is Question 2 of the 5 that I think teacher trainers should answer. Here, 1 I look at Contrastive Analysis. I take the terms “learning an L2” and “SLA” to refer to the same thing.

Contrastive Analysis concentrated on the role of the “native language”, and suggested that language transfer was the key to explaining SLA.  The Contrastive Analysis Hypothesis (CAH) was founded on

  • structural linguistics, which was a corpus-based descriptive approach, providing detailed linguistic descriptions of a particular language, and
  • behavioural psychology, which held that learning was establishing a set of habits.

Lado (1957), following the behaviourist argument, assumed that learning a language was like learning anything else, and that, in line with this general learning theory,  learning task A will affect the subsequent learning of task B.  Consequently, SLA is crucially affected by learning of the L1. If acquisition of the L1 involved the formation of a set of habits, then the same process must also be involved in SLA, with the difference that some of the habits appropriate to the L2 will already have been acquired, while other habits will need to be modified, and still others will have to be learned from scratch. Lado went on to suggest that there were two types of language transfer: positive transfer (facilitation) and negative transfer (interference).  There were, in turn, two types of interference: retroactive, where learning acts back on previously learned matter (language loss), and proactive inhibition, where a series of responses already learned tend to appear in situations where a new series of responses is needed.

To summarise, the CAH claimed that language learning is habit formation and SLA involves establishing a new set of habits.  By considering the main differences between L1 and L2, one can anticipate the errors learners will make when learning an L2: errors indicate differences and these differences have to be learned.


The CAH was immediately challenged by evidence from studies, which showed that errors occurred when not predicted by contrastive analysis, and did not occur when predicted.  Initial studies led to subsequent work, and lots more counterevidence. For example, Zobl (1980) found that while English-speaking learners of French negatively transferred English postverbal pronoun placement to produce ungrammatical utterances such as  Le chien a mangé les (Le chien les a mangés), French-speaking learners of English did not make such errors, even though both languages have preverbal object pronouns. This is a case of a one-way learning difficulty.  Furthermore, not all areas of similarity between an L1 and an L2 lead to positive transfer.  Odlin (1989), for example, reported that although Spanish has a copula verb similar to English be in sentences like That’s very simple, or The picture’s very dark, Spanish-speaking learners of L2 English usually omit the copula in early stages of acquisition, saying That very simple, and The picture very dark.

More systematic classification of learners’ errors suggested that only a small percentage of them could be attributed to contrasting properties between L1 and L2.  Lococo (1975) for example found that in the corpus she examined, only 25% of errors resulted from L1/L2 contrast, and Dulay and Burt’s study (1975) claimed that only 5% of errors were thus accounted for. The Dulay and Burt study was subsequently seriously questioned (see Ellis, 1993: 45), but later morheme studies did much to resore the credibility of the underlying argument.


Contrastive analysis has a lot to recommend it.  As a theory of SLA, the following points can be made about the CAH:

  • It is a coherent and cohesive consequence of a general theory of learning.
  • It embraces a well-developed theory of languages. Just as learning is seen in behaviourist terms, so languages are seen from a well-defined structuralist viewpoint: languages are studied in the true Baconian, “botanist” tradition, and it is the careful description and analysis of their differences which is the researcher’s main concern.
  • It occupies a limited domain, dealing almost exclusively with the phenomenon of transfer of properties of the L1 grammar into the L2 grammar.
  • It is a testable hypothesis: empirical evidence can support or challenge it and research studies can be replicated. The research methods can be scrutinised and improved.
  • It is extremely economical in its use of key terms and constructs.

It may also be noted that there are crystal-clear pedagogical implications. Contrastive Analysis indicates what particular habits have to be learned, and pedagogical practice – the audio-lingual method (speech is primary, and is learned through drills and practice) – fits perfectly with the theory of SLA.  One would venture to say that this was no coincidence, that the agenda of the early SLA researchers was clearly focused on pedagogical concerns.

The fundamental difficulty of the theory lies in its underlying behaviouristic theory of learning, according to which all learning is repetition, a question of habit-formation.  Such a view of learning adopts an empiricist / positivist epistemology which denies the validity of the mind as a construct, and the possibility of causal explanations. Following the disasterous trajectory of the logical positivists in the 1930s, behaviourism and positivism were almost universally rejected, although, more recently, connectionist and emergetist approaches to learning seem to herald a return to behaviourism. Of course,  a sleight of hand is required in order to allow for the use of theoretical constructs and for the inference to a causal explanation of L2 learning.

As we’ll see, the shift away from behaviourism meant saying farewell to a very comfortable state of affairs in ELT. Before the shift, learning a second language was explained in terms of a general learning theory, and there was no doubt as to the practical applications of that theory: you learn the L2 in the same way as you learned the L1, and in the same way as you learn anything else, by forming stimulus-response behaviour patterns.

It is instructive to see what happened to the CAH.  While the strong claims of the CAH have been refuted by research findings, there has rarely been any doubt that the L1 does indeed affect SLA.  Later studies concentrated on when and how the L1 influenced SLA.

Regarding “When”, Wode (1978) suggested that it is the similarities, not the differences, between L1 and L2 which cause the biggest problems, and Zobl (1982) proposed that “markedness” constrains L1 transfer.  Zobl argued that linguistically unmarked L1 features will transfer, but linguistically marked features will not, where markedness is measured in terms of infrequency or departure from something basic or typical in a language.

Regarding “How”, Zobl identified two patterns of L1 influence on SLA: (a) the pace at which a developmental stage is traversed (where the L1 can inhibit or accelerate the process), and (b) the number of developmental structures in a stage.  Larsen-Freeman and Long, in their discussion of markedness conclude:

When L1 transfer occurs, it generally does so in harmony with developmental processes, modifying learners’ encounters with interlanguage sequences rather than altering them in fundamental ways. (Larsen-Freeman and Long, 1991: 106)

Today, L1 transfer can be seen as playing an important part in all of the most interesting current views of SLA, including Nick Ellis’ and Mike Long’s agreement that one of the biggest tasks facing adult L2 learners is that of “re-setting the dial”. We’ll come to that.

References can be found by clicking on the “Bibliography for Theory Construction in SLA” menu in the Header.

Coursebooks and the commodification of ELT

(Note: This is a copy of a post from my CriticELT blog. I think it’s relevant to the subject of teacher trainers because, with the noteable exception of Scott Thornbury and Luke Meddings, leading teacher trainers, and TD SIGs, work on the assumption that modern coursebooks are an essential tool for current ELT practice.)


Wilkins (1976) distinguished between 2 types of syllabus

1, a ‘synthetic’ syllabus: items of language are presented one by one in a linear sequence to the learner. The learner is expected to to build up, or ‘synthesize’, the knowledge incrementally,

2 an ‘analytic’ syllabus: the learner does the ‘analysis’, i.e. ‘works out’ the system, through engagement with natural language data.

Coursebooks embody a synthetic approach to syllabus design. Coursebook writers take the target language (the L2) as the object of instruction, and they divide the language up into bits of one kind or another – words, collocations, grammar rules, sentence patterns, notions and functions, for example – which are presented and practiced in a sequence. The criteria for sequencing can be things like valency, criticality, frequency, or saliency, but the most common criterion is ‘level of difficulty’, which is intuitively defined by the writers themselves.

The approach is thus based on taking incremental steps towards proficiency; “items”, “entitities”, sliced up bits of the target language are somehow accumulated through a process of presentation, practice and re-cycling, and communicative competence is the result.


Different coursebooks claim to use different types of syllabus, – grammatical, lexical, or notional-functional, for example – deal with different topics, adopt different styles, and so and so; but, in the end, the vast majority of them use synthetic syllabuses with the same features described above, and they all give pride of place to explicit teaching and learning. The syllabus is delivered by the teacher, who first presents the bits of the L2 chosen by the coursebook writers (in written and spoken texts, grammar boxes, vocabulary lists, diagrams, pictures, and so on), and leads students through a series of activities aimed at practicing the language, like drills, written exercises, discussions, games, tasks and practice of the four skills.

Among the courseboooks currently on sale from UK and US publishers, and used around the world are the following:

Headway;     English File;      Network;      Cutting Edge;      Language Leader;      English in Common;      Speakout;      Touchstone; Interchange;      Mosaic;      Inside Out;      Outcomes.

Each of these titles consists of a series of five or six books aimed at different levels, from beginner to advanced, and offers a Student’s Book, a Teacher’s Book and a Workbook, plus other materials such as video and on-line resources. Each Student’s Book at each level is divided into a number of units, and each unit consists of a number of activities which teachers lead students through. The Student’s Book is designed to be used systematically from start to finish – not just dipped into wherever the teacher fancies. The different activities are designed to be done one after the other; so that Activity 1 leads into Activity 2, and so on. Two examples follow.

In New Headway, Pre-Intermediate, Unit 3, we see this progression of activities:

  1. Grammar (Past tense) leads into ( ->)
  2. Reading Text (Travel) ->
  3. Listening (based on reading text) ->
  4. Reading (Travel) ->
  5. Grammar – (Past tense) ->
  6. Pronunciation ->
  7. Listening (based on Pron. activity) ->
  8. Discussing Grammar –>
  9. Speaking (A game & News items) ->
  10. Listening & Speaking (News) ->
  11. Dictation (from listening) ->
  12. Project (News story) ->
  13. Reading and Speaking (About the news) ->
  14. Vocabulary (Adverbs) ->
  15. Listening (Adverbs) ->
  16. Grammar (Word order) ->
  17. Everyday English (Time expressions)

And if we look at Outcomes Intermediate, Unit 2, we see this:

  1. Vocab. (feelings) ->
  2. Grammar (be, feel, look, seem, sound + adj.) ->
  3. Listening (How do they feel?) ->
  4. Developing Conversations (Response expressions) ->
  5. Speaking (Talking about problems) ->
  6. Pronunciation (Rising & fallling stress) ->
  7. Conversation Practice (Good / bad news) ->
  8. Speaking (Physical greetings) ->
  9. Reading (The man who hugged) ->
  10. Vocabulary (Adj. Collocations) ->
  11. Grammar (ing and ed adjs.) ->
  12. Speaking (based on reading text) ->
  13. Grammar (Present tenses) ->
  14. Listening (Shopping) ->
  15. Grammar (Present cont.) ->
  16. Developing conversations (Excuses) ->
  17. Speaking (Ideas of heaven and hell).

All the other coursebooks mentioned are similar in that they consist of a number of units, each of them containing activities involving the presentation and practice of target versions of L2 structures, vocabulary, collocations, functions, etc., using the 4 skills. All of them assume that the teacher will lead students through each unit and do the succession of activities in the order that they’re set out. And all of them wrongly assume that if learners are exposed to selected bits of the L2 in this way, one bit at a time in a pre-determined sequence, then, after enough practice, the new bits, one by one, in the same sequence, will become part of the learners’ growing L2 competence. This false assumption flows from a skill-based view of second-language acquisition, which sees language learning as the same as learning any other skill, such as driving a car or playing the piano.

Skill-based theories of SLA

The most well-known of these theories is John Anderson’s (1983) ‘Adaptive Control of Thought’ model, which makes a distinction between declarative knowledge – conscious knowledge of facts; and procedural knowledge – unconscious knowledge of how an activity is done. When applied to second language learning, the model suggests that learners are first presented with information about the L2 (declarative knowledge ) and then, via practice, this is converted into unconscious knowledge of how to use the L2 (procedural knowledge). The learner moves from controlled to automatic processing, and through intensive linguistically focused rehearsal, achieves increasingly faster access to, and more fluent control over the L2 (see DeKeyser, 2007, for example).

The fact that nearly everybody successfully learns at least one language as a child without starting with declarative knowledge, and that millions of people learn additional languages without studying them (migrant workers, for example), might make one doubt that learning a language is the same as learning a skill such as driving a car. Furthermore, the phenomenon of L1 transfer doesn’t fit well with a skill based approach, and neither do putative critical periods for language learning. But the main reason for rejecting such an approach is that it contradicts SLA research findings related to interlanguage development.

Firstly, it doesn’t make sense to present grammatical constructions one by one in isolation because most of them are inextricably inter-related. As Long (2015) says:

Producing English sentences with target-like negation, for example, requires control of word order, tense, and auxiliaries, in addition to knowing where the negator is placed. Learners cannot produce even simple utterances like “John didn’t buy the car” accurately without all of those. It is not surprising, therefore, that Interlanguage development of individual structures has very rarely been found to be sudden, categorical, or linear, with learners achieving native-like ability with structures one at a time, while making no progress with others. Interlanguage development just does not work like that. Accuracy in a given grammatical domain typically progresses in a zigzag fashion, with backsliding, occasional U-shaped behavior, over-suppliance and under-suppliance of target forms, flooding and bleeding of a grammatical domain (Huebner 1983), and considerable synchronic variation, volatility (Long 2003a), and diachronic variation.


Secondly, research has shown that L2 learners follow their own developmental route, a series of interlocking linguistic systems called “interlanguages”.  Myles (2013) states that the findings on the route of interlanguage (IL) development is one of the most well documented findings of SLA research of the past few decades. She asserts that the route is “highly systematic” and that it “remains largely independent of both the learner’s mother tongue and the context of learning (e.g. whether instructed in a classroom or acquired naturally by exposure)”. The claim that instruction can influence the rate but not the route of IL development is probably the most widely accepted claim among SLA scholars today.

Selinker (1972) introduced the construct of interlanguages to explain learners’ transitional versions of the L2. Studies show that interlanguages exhibit common patterns and features, and that learners pass through well-attested developmental sequences on their way to different end-state proficiency levels. Examples of such sequences are found in morpheme studies; the four-stage sequence for ESL negation; the six-stage sequence for English relative clauses; and the sequence of question formation in German (see Hong and Tarone, 2016, for a review).  Regardless of the order or manner in which target-language structures are presented in coursebooks, learners analyse input and create their own interim grammars, slowly mastering the L2 in roughly the same manner and order. The  acquisition sequences displayed in interlanguage development don’t reflect the sequences found in any of the coursebooks mentioned; on the contrary, they prove to be  impervious to coursebooks, as they are to different classroom methodologies, or even whether learners attend classroom-based courses or not.

Note that interlanguage development refers not just to grammar; pronunciation, vocabulary, formulaic chunks, collocations, sentence patterns, are all part of the development process. To take just one example, U-shaped learning curves can be observed in learning the lexicon. Learners have to master the idiosyncratic nature of words, not just their canonical meaning. While learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. Only by passing through a period of incorrectness, in which the lexicon is used in a variety of ways, can they climb back up the U-shaped curve.

Interlanguage development takes place in line with what Corder (1967) referred to as the internal “learner syllabus”, not the external syllabus embodied in coursebooks. Students don’t learn different bits of the L2 when and how a coursebook says that they should, but only when they are developmentally ready to do so. As Pienemann demonstrates (e.g. Pienemann, 1987) learnability (i.e., what learners can process at any one time), determines teachability (i.e., what can be taught at any one time). Coursebooks flout the learnability and teachability conditions; they don’t respect the learner’s internal learner syllabus.

 False Assumptions made by Coursebooks

To summarise the above, we may list the 3 false assumptions made by coursebooks.

Assumption 1: In SLA, declarative knowledge converts to procedural knowledge. Wrong! No such simple conversion occurs. Knowing that the past tense of has is had and then doing some controlled practice, does not lead to fluent and correct use of had in real-time communication.

Assumption 2: SLA is a process of mastering, one by one, accumulating structural items. Wrong! All the items are inextricably inter-related. As Long (2015, 67) says:

The assumption that learners can move from zero knowledge to mastery of negation, the present tense, subject- verb agreement, conditionals, relative clauses, or whatever, one at a time, and move on to the next item in the list, is a fantasy.

Assumption 3: Learners learn what they’re taught when they’re taught it. Wrong – as every teacher knows! Pienemann (1987) has demonstrated that teachability is constrained by learnability.

Objections to Coursebooks

1. Using a coursebook means that most of classroom time is devoted to the teacher talking about the L2. In order to develop communicative competence, better results are obtained by most classroom time being devoted to students talking in the L2 about matters that are relevant to their needs. 

2. Presenting and practicing a pre-set series of linguistic forms (pronunciation, grammar, notions, functions, lexical items, collocations, etc.) simply doesn’t work – it contradicts the robust results of SLA research into how people learn an L2. Even if a form coincidentally happens to be learnable (by some students in a class), and so teachable, at the time it is presented, teaching via PPP doesn’t ensure that it will be learned.

3. The approach is counterproductive: both teachers and students feel frustrated by the constant mismatch between teaching and learning.

4. The cutting up of language into manageable pieces (or “McNuggets” as Thornbury calls them) usually results in impoverished input and output opportunities.

5. Results are poor. It’s hard to get reliable data on this, but evidence strongly suggests that most students who do coursebook-driven courses do not achieve the level of proficiency they expected.

6. Both the content and methodology of the course are externally pre-determined and imposed. This point will be developed below.

7. Coursebooks pervade the ELT industry and stunt the growth of innovation and teacher training. The publishing companies that produce coursebooks also produce exams, teacher training courses and everything else connected to ELT. Publishing companies spend tens of millions of dollars on marketing, aimed at persuading stakeholders that coursebooks represent the best practical way to manage ELT.  In the powerful British ELT establishment, key players like the British Council and Cambridge Assessment have huge influence on teacher training and language testing and they all accept the coursebook as central to ELT practice. TESOL and IATEFL, bodies that are supposed to represent teachers’ interests, have also succumbed to the influence of the big publishers, as their annual conferences make clear. So the coursebook rules, at the expense of teachers, of good educational practice, and of language learners.

8. Coursebooks represent the commofification of ELT. Grammar, vocabulary, lexical chunks, discourse, the whole messy chaotic stuff of language is neatly packaged into items, granules, chunks, seved up in sanitised short texts and summarised in lists and tables.  Communicative competence itself, as Leung (cited in Thornbury 2014) points out, is turned into “inert and decomposed knowledge”, and language teaching is increasingly prepackaged and delivered as if it were a standardised, marketable product.  ELT becomes just another market transaction; in this case between de-skilled teachers, who pass on a set of standardised, testable knowledge and skills to learners, who have been reconfigured as consumers.

Brian Tomlinson’s Reviews

I’ve written a post on Brian Tomlinson’s reviews of coursebooks, but here let me just mention what he and co-author Masuhara said in their most recent review of coursebooks, including Headway and Outlooks. Tomlinson and Masuhara found that none of the coursebooks was likely to be effective in facilitating long-term acquisition. They described the texts found in the books as short, contrived, inauthentic, mundane, decontextualised, unappealing, uninteresting, dull. They described the activities as unchallenging, unimiginative, unstimulating, mechanical, superficial.

Referring to the New Headway and Outcomes Intermediate Students books, they say, ”the focus is on explicit learning of language rather than engagement”. Students are led through a course which consists largely of teachers presenting and practicing bits of the language in such a way that only shallow processing is required, and, as a result, only short term memory is engaged. There are very few opportunities for cognitive engagement; most of the time, teachers talk about the language, and students are asked to read or listen to short, artificial, unchallenging texts devised to illustrate language points. When they are not being told about this or that aspect of the language,students are being led through a succession of frequently mechanical linguistic decoding and encoding activities which are unlikely to have any permanent effects on interlanguage development.

In their concluding remarks, the authors say that the explanation for these unsatisfactory results is simple: publishers‘ interests prevailed. Publishers like profit; they’re  risk averse and have no interest in any radical reform of a model that has endured for over thirty years. They choose to give priority to face validity and the achievement of “instant progress”, rather than to helping learners towards the eventual achievement of communicative competence.

An Alternative: The Analytic or Process Syllabus

An analytic syllabus rejects the method of cutting up a language into manageable pieces, and instead organises the syllabus according to the needs of the learners and the kinds of language performance that are necessary to meet those needs. “Analytic” refers not to what the syllabus designer does, but to what learners are invited to do. Grammar isn’t “taught” as such; rather learners are provided with opportunities to engage in meaningful communication on the assumption that they will slowly analyse and induce language rules, by exposure to the language and by the teacher providing scaffolding, feedback, and information about the language.

Breen’s (1987) distinction between product and process syllabuses contrasts the focus on content and the pre-specification of linguistic or skill objectives, with a “natural growth” approach which aims to expose the learners to to real-life communication without any pre-selection or arrangement of items. Figure 1, below, summarises the differences.

A process approach focuses on how the language is to be learned. There is no pre-selection or arrangement of items; the syllabus is negotiated between learners and teacher as joint decision makers, and emphasises the process of learning rather than the subject matter. No coursebook is used. The teacher implements the evolving syllabus in consultation with the students who participate in decision-making about course objectives, content, activities and assessment.


Hugh Dellar has made a number of attempts to defend coursebooks, and here are some examples of what he’s said:

  • “Attempts to talk about coursebook use as one unified thing that we all understand and recognise are incredibly myopic. Coursebooks differ greatly in terms of the way they frame the world and in terms of the questions and positions they expect or allow students to take towards these representations. …. So hopefully it’s clear that far from being one homogenous unified mass of media, coursebooks are wildly heterogeneous in both their world views and their presentations of language.”
  • “Teachers mediate coursebooks”.
  • “The kind of broad brush smearing of coursebooks you’re engaging in does those teachers a profound disservice as it’s essentially denying the possibility of them still being excellent practitioners. I’d also suggest that grammar DOES still seem to be the primary – though not the only – thing that the vast majority of teachers around the world expect and demand from material, whether you like it or not (and I don’t, personally, but there you go. We live in an imperfect world). To pretend this isn’t the case or to denigrate all those who believe this is wipe out a huge swathe of the teaching profession and preach mainly to the converted.”
  • Teachers in very poor parts of the world would just love to have coursebooks.
  • Coursebooks are based on the presentation and practice of discrete bits of grammar because that’s what teachers want.
  • Coursebooks help teachers do their jobs.
  • Coursebooks save time on lesson preparation.
  • Coursebooks meet student and parental expectations.

These remarks are echoed by others (e.g. Harmer, Scrivener, Prodromou, Ur, Lansford, Walter), and can be summed up by the following:

  1. Coursebooks are not all the same.
  2. Teachers adapt, modify and supplement them.
  3. They’re convenient.
  4. They give continuity and direction to a language course.

I accept that some coursebooks don’t follow the synthetic syllabus I describe, but these are the exceptions. All the coursebooks I list at the start of this article, and I’d say those that make up 90% of the total sales of coursebooks worldwide, use a synthetic syllabus and make the 3 assumptions I suggest, including Dellar’s. All the stuff about coursebooks differing greatly “in terms of the way they frame the world and in terms of the questions and positions they expect or allow students to take towards these representations” has absolutely no relevance to the arguments made against them.

As for teachers adapting, modifying and supplementing coursebooks, the question is to what extent they do so. If they do so to a great extent,  then the coursebook no longer serves as the syllabus, but they’ve rather contradicted the main point of having a coursebook, and one wonders how they can justify getting their students to buy the book if it’s only used let’s say 30% of the time. If they only modify and supplement to a small extent, then the coursebook drives the course, learners are led through a pre-determined series of steps, and my argument applies. The most important thing to note is that what teachers actually do is ameliorate coursebooks; they make them less terrible, more bearable, in dozens of different clever and inventive ways. But this, of course, is no argument in favour of the coursebook itself; indeed, to the extent that students learn, it will be more despite than because of the damn coursebook.

Which brings us to the claim that the coursebook is convenient, time-saving, etc.. Even if it’s true (which it won’t be if you spend lots of time adapting, modifying and supplementing (i.e. ameliorating) it), the trouble is, it doesn’t work: students don’t learn what they’re taught. And that applies to the other arguments used to defend coursebooks, such as that parents expect their kids to use them, that they give direction to the course, and so on: such arguments simply ignore the evidence that students do not, indeed cannot, learn in the way assumed by a coursebook.

Thus, the points above fail to address the main criticisms levelled against coursebooks, which are that they fly in the face of robust research findings and that they deprive teachers and learners of control of the learning process, leading to a lose-lose classroom environment.  In order to reply to these arguments, those wishing to defend coursebooks must first confront the three false assumptions on which coursebook use is based (i.e. they must confront the evidence of how SLA actually happens) and they must then argue the case for dictating what is learned. That coursebooks are the dream of teachers working in Ethiopia; that coursebooks are cherished by millions of teachers who just really love them; that the Headway team have succeeded in keeping their products fresh and lively; that Outcome includes recordings of people who don’t have RP accents; that coursebooks are mediated by teachers; that coursebooks are here to stay, so get real and get used to it; none of these statements does anything to answer the case against them, and none carries any weight for those who wish to base their teaching practice on critical thinking and rational argument. No matter how “different” coursebooks are, or how flexibly they can be used, coursebooks rely on false assumptions about L2 learning, and impose a syllabus on learners who are largely excluded from decisions about what and how they learn.

Managing a process syllabus is no more difficult than mastering the complexities of a modern coursebook. All you need to get started is a materials bank and a crystal-clear explanation of roles and procedures. Part 2 of Breen 1987 provides a framework; the collection of articles edited by Breen (2000) has at least 5 really helpful “road maps”; Meddings and Thornbury (2009) give a detailed account of their approach in this excellent book; and I outline a process syllabus on my blog. As befits an approach based on libertarian, co-operative educational principles, a process syllabus is best seen in local rather than global settings. If the managers of local ELT centres have the will to break the grip of the coursebook, they only have to make a small initial investment in local training and materials, and to then support teachers in their efforts to involve their students in the new venture. I dare to say that such efforts will transform the learning experience of everybody involved.


Coursebooks oblige teachers to work within a framework where students are presented with and then practice dislocated bits of English in a sequence which is pre-determined and externally imposed on them by coursebook writers. Most teachers have little say in the syllabus design which shapes their work, and their students have even less say in what and how they’re taught. Furthermore, results of coursebook-based teaching are bad; most learners don’t reach the level they aim for, and most don’t reach the level of proficiency the coursebook promises (English Proficiency Index, 2015). At the same time, alternatives to coursebook-driven ELT which are much more attuned to what we know about psycholinguistic, cognitive, and socio-educational principles for good language teaching don’t get the exposure or the fair critical evaluation that they deserve.

Despite flying in the face of what we know about L2 learning, despite denying teachers and learners a decision-making voice, and despite poor results, the coursebook dominates current ELT practice to an alarming extent. The main pillars of the ELT establishment, from teacher organisations like TESOL and IATEFL, through bodies like the British Council, examination boards like Cambridge English Language Assessment and TEFL, to the teacher training certification bodies like Cambridge and Trinity, all support the use of coursebooks.

The increasing domination of coursebooks in a global ELT industry worth close to $200 billion (Pearson, 2016) means that they’re not just a symptom but a major cause of the current lose-lose situation we find ourselves in, where both teachers and learners are restrained and restricted by the demonstrably faulty methodological principles which coursebooks embody. I think we have a responsibility to raise awareness of the damage that coursebooks are doing, and to fight against the suffocating effects of continued coursebook consumption.


Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.

Breen, M.P. (1987) Contemporary Paradigms in Syllabus Design. Part I. Language Teaching, 20, pp 81-92.

Breen, M.P. (1987) Contemporary Paradigms in Syllabus Design. Part II. Language Teaching, 20, 20, Issue 03.

Breen, M.P. and Littlejohn, A. (2000) Classroom Decision Making: Negotiation and Process Syllabuses in Practice. Cambridge: CUP.

English Proficiency Index (2015) Accessed from  9th November, 2015

Hong, Z. and Tarone, E. (Eds.) (2016) Interlanguage Forty years later. Amsterdam, Benjamins.

Long, M.H. (2011) “Language Teaching”. In Doughty, C. and Long, M.  Handbook of Language Teaching. NY Routledge.

Long, M.H. (2015) SLA and Task Based Language Teaching. N.Y., Routledge.

Long, M.H. & Crookes, G. (1993). Units of analysis in syllabus design: the case for the task. In G. Crookes & S.M. Gass (Eds.). Tasks in a Pedagogical Context. Cleveland, UK: Multilingual Matters. 9-44.

Meddings, L. And Thornbury, S. (2009) Teaching Unplugged. Delta.

Mitchell, R. and Myles, F. (2004)  Second Language Learning Theories.  London: Arnold.

Myles, F. (2013): Theoretical approaches to second language acquisition research. In Herschensohn, J. & Young-Scholten, M. (Eds.) The Cambridge Handbook of Second Language Acquisition. CUP

Ortega, L. (2009) Sequences and Processes in Language Learning. In Long and Doughty Handbook of Language Teaching. Oxford, Wiley.

Pearson (2016) GSE  Global Report Retrieved from 5/12/2016.

Pienemann, M. (1987) Psychological constraints on the teachability of languages. In C. Pfaff (Ed.) First and Second Language Acquisition Processes. Rowley, MA: Newbury House. 143-168.

Rea-Dickins, P. M. (2001) Mirror, mirror on the wall: identifying processes of classroom assessment. Language Testing 18 (4), p. 429 – 462.

Selinker, L. (1972) Interlanguage. International Review of Applied Linguistics 10, 209-231.

Statista (2015) Publisher sales of ELT books in the United Kingdom from 2009 to 2013. Accessed from 9th November, 2015.

Thornbury, S. (2014) Who ordered the Mcnuggets? Accessed from 9th November, 2015.

Walkley, A. And Dellar, H. (2015) Outcomes:Intermediate. National Geographics.

Wilkins, D. (1976) Notional Syllabuses: A Taxonomy and its Relevance to Foreign Language Curriculum Development. London: Oxford University Press.