SLA Part 8: Emergentism

Up to now, I’ve been reviewing theories of SLA that try to explain the psychological processes involved in learning an L2: what goes on between our ears, you could say. In a recent tweet, David Deubelbeiss, in reference to my review, debunked what he referred to as “goofy black box ideas like a special lang. processing device in the brain”. This was enough to provoke a rude comment from me, but of course, David is simply expressing the increasingly widely held view that Chomsky’s time is up and it’s about time we left behind all this “rubbish” about black boxes. There is undoubtably something unsatisfactory about an appeal to a black box, but, to be fair, Chomsky’s UG theory is, first and foremost a theory of language, and the claim that the language knowledge we have is partly innate, that we’re “hard wired” for language, is an example of inference to the best explanation as it’s called by philosophers.  In other words, the LAD is a “logical” response to the poverty of the stimulus conundrum: given the knowledge that very young children have of language, and the limitations of the information they get from the environment, the best explanation is that they were born with some boot-strapping device, and we’ll call that the LAD. Furthermore, the cognitive theories I’ve looked at are actually attempts to describe, however indirectly, the black box and what’s going on inside it, and I don’t think it’s entirely fair to call them all “goofy”.

Nick Ellis: Emergentism 

One alternative to the innatist approach to SLA is emergentism, an umbrella term referring to a fast growing range of usage-based theories which adopt “connectionist” and associative learning views based on the premise that language emerges from communicative use. A leading spokesman for emergentism is Nick Ellis (not to be confused with Rod Ellis, who also writes about SLA). In his article “Frequency Effects in Language Processing” (part of a special issue of Studies in Second Language Acquisition, 2002, Vol. 24, 2, devoted to emergentism) Ellis argues that language processing is “intimately tuned to input frequency”, and expounds a usage-based theory which holds that “acquisition of language is exemplar based”. The paper is a real tour de force and I strongly recommend it. In fact, Nick Ellis writes beautifully; all his  papers are master classes in how to talk coherently and cohesively about complex issues, and to forcefully present your case.

The power law of practice is taken by Ellis as the underpinning for his frequency-based account, and then, through an impressive review of literature on phonology and phonotactics, reading and spelling, lexis, morphosyntax, formulaic language production, language comprehension, grammaticality, and syntax, Ellis argues that “a huge collection of memories of previously experienced utterances”, rather than knowledge of abstract rules, is what underlies the fluent use of language. In short, emergentists take language learning to be “the gradual strengthening of associations between co-occurring elements of the language”, and they see fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173).

Ellis often repeats his commitment to a Saussurean view, which sees “the linguistic sign” as a set of mappings between phonological forms and communicative intentions. He claims that “simple associative learning mechanisms operating in and across the human systems for perception, motor-action and cognition, as they are exposed to language data as part of a communicatively-rich human social environment by an organism eager to exploit the functionality of language are what drives the emergence of complex language representations” (see this link for the article).

Seidenberg and MacDonald: Emergentism

Another example of emergentist views is Seidenberg and MacDonald’s 1999 paper, which puts forward a similar “probabilistic constraints approach” to language acquisition. They explain that instead of equating knowing a language with knowing a grammar, emergentists adopt the functionalist assumption that language knowledge is “something that develops in the course of learning how to perform the primary communicative tasks of comprehension and production” (Seidenberg and MacDonald, 1999: 571). This knowledge is viewed as a neural network that maps between forms and meanings, and further levels of linguistic representation, such as syntax and morphology, are said to emerge in the course of learning tasks.

An alternative to “Competence” is also offered by Seidenberg and Macdonald, who argue that the competence-performance distinction excludes information about statistical and probabilistic aspects of language, and that these aspects play an important role in acquisition. The alternative is to characterize a performance system which handles all and only those structures that people actually use. Performance constraints are embodied in the system responsible for producing and comprehending utterances, not extrinsic to it.

Elizabeth Bates and associates

As a final example of emergentism, Bates et al., (1998) attempt to translate innateness claims into empiricist statements. They argue that innateness is often used as a logically inevitable, fall back explanation.

In the absence of a better theory, innateness is often confused with

  1.  domain specificity (Outcome X is so peculiar that it must be innate),
  2. species specificity (we are the only species who do X so X must lie in the human genome),
  3. localization (Outcome X is mediated by a particular part of the brain, so X must be innate), and
  4. learnability (we cannot figure out how X could be learned so X must be innate (Bates, et al., 1998: 590).

Instead of this unsatisfactory “explanation”, Bates et. al. believe that an empirically-based theory of interaction, a theory that will explain the process by which nature and nuture, genes and the environment, interact without recourse to innate knowledge, is “around the corner”.  Reviewing a taxonomy proposed by Elman et al. to identify different types of innateness and their location in the brain, Bates et. al. say

If the notion of a language instinct means anything at all, it must refer to a claim about cortical microcircuitry, because this is (to the best of our knowledge) the only way that detailed information can be laid out in the brain (Bates et al., 1998: 594).


Emergentism claims that complex systems exhibit ‘higher-level’ properties that are neither explainable, nor predictable from ‘lower-level’ physical properties, which puts them in a bit of a jam if they want to remain faithful to the empiricist doctrine and deny any kind of contribution from innate sources of knowledge. This is the big problem for emergentists: how to explain complex representational systems, and, as I mentioned in Part 6, the only way they can do it is to take a radically different, sub-atomic, view of the components of language (that’s my paraphrase which O’Grady might not agree with). Attempting to do without the concept of innate knowledge (and even of the mind) forces you into this extreme view, which appealing to functionalism and doesn’t actually account for, in my opinion.

Gregg (2003), in his discussion of emergentism in SLA, notes that empiricist emergentism (which excludes the work of O’Grady) wants to do away with innate, domain-specific representational systems, and replace them with “an ability to do distributional analyses and to remember the products of the analyses” (Gregg, 2003: 55). Given this agenda, it’s surprising, says Gregg, that Ellis seems to accept the validity of the linguist’s account of grammatical structure. Surely this is contradictory. As to the explanation of the language learning process, it is, as Ellis agrees, based on associative learning, and rests on advances in IT that have produced models of  associative learning processes in the form of connectionist networks.

The severe limitations of connectionist models are highlighted by Gregg, who goes to the trouble of examining the Ellis and Schmidt model (see Gregg, 2003: 58 – 66) in order to emphasise just how little the model has learned and how much is left unexplained. The sheer implausibility of the enterprise strikes me as forcefully as it seems to strike Gregg. How can emergentists seriously propose that the complexity of language emerges from simple cognitive processes being exposed to frequently co-occurring items in the environment? How can “simple associative learning mechanisms operating in and across the human systems for perception, motor-action and cognition” explain our language knowledge?

Gregg wrote his article in 2003, since when Ellis has written a great deal more about emergentism (see his personal website – many of the articles can be downloaded  and I’d particularly recommend his 2015 article Implicit AND Explicit Language Learning: Their dynamic interface and complexity, available for free download), but, despite lots of powerful argument, Ellis can’t point to much advance in the ability of connectionist models to do what children do, namely, learn the complexities of a natural language.

The Poverty of the Stimulus -again!

At the root of the problem of any empiricist account is the poverty of the stimulus argument. By adopting an associative learning model and an empiricist epistemology (where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations), emergentists have a very difficult job explaining how children come to have the linguistic knowledge they do. How can general conceptual representations acting on stimuli from the environment explain the representational system of language that children demonstrate?

Gregg summarises Laurence and Margolis’ (2001: 221) “lucid formulation” of the poverty of the stimulus argument:

  1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.
    2. The correct set of principles need not be (and typically is not) in any pre-theoretic sense simpler or more natural than the alternatives.
    3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.
    4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.
    5. Children do reliably arrive at the correct grammar for their language.
    6. Therefore children are not empiricist learners.   (Gregg, 2003: 48)

Combining observed frequency effects with the power law of practice, and thus explaining acquisition order by appealing to frequency in the input doesn’t go very far in explaining the acquisition process itself. What role do frequency effects have? How do they interact with other aspects of the SLA process? In other words, we need to know how frequency effects fit into a theory of SLA, because frequency and the power law of practice in themselves don’t provide a sufficient theoretical framework, and neither does connectionism. As Gregg points out “connectionism itself is not a theory; it is a method, and one that in principle is neutral as to the kind of theory to which it is applied” (Gregg, 2003: 55).

My view is that emergentism stands or falls on connectionist models and that so far the results are disappointing. A theory that will explain the process by which nature and nuture, genes and the environment, interact without recourse to innate knowledge, remains “around the corner”.  It will be fantastic if Nick Ellis and all those working on emergentism turn out to be right, and, I’ll enthusiastically join in the celebrations, partly because if they’re right, then language learning will be shown to be an essentially implicit process, a process that gets little help from teaching based on using coursebooks to implement a grammar-based synthetic syllabus through PPP. I’ll discuss this a bit more in the final episode, Part 9, coming soon.



Bates, E., Elman, J., Johnson, M., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1998).  Innateness and emergentism. In W. Bechtel & G. Graham (Eds.), A companion to cognitive science (pp. 590-601). Oxford: Basil Blackwell.

Ellis, N. (2002) Frequency effects in language processing: A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in SLA, 24,2, 143-188.

Eubank, L. and Gregg, K. R. (2002) News Flash – Hume Still Dead. Studies in Second Language Acquisition, 24, 2, 237-248.

Greeg, K. R. (2003) The state of Emergentism in SLA. Second Language Research, 19, 2, 95-128.

Seidenburg, M. and Macdonald, M. (1997) A Probabilistic Constraints Approach to Language Acquisition and Processing. Cognitive Science Vol 23 (4), 569–588.


SLA Part 7: Two final processing models

Susanne Carroll

I’ll start by noting Carroll’s objection to Schmidt’s and Gass’s theories. She argues that if input refers to observable sensory stimuli in the environment, then it can’t play any significant role in L2 learning because the stuff of acquisition – phonemes, syllables, morphemes, nouns, verbs, cases, etc. – consists of mental constructs that exist in the mind and not in the external environment. As Gregg has repeatedly said “You can’t notice grammar!” Carroll (2001) says:

The view that input is comprehended speech is mistaken and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!”

In Carroll’s theory, learners don’t attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal. Basing herself partly on Fodor’s work, Carroll (2017) argues that we have to make a distinction between types of input.

On the one hand, we need to talk about INPUT-TO-LANGUAGE-PROCESSORS, e.g., bits of the speech signal that are fed into language processors and which will be analysable if the current state of the grammar permits it. On the other hand, we need a distinct notion of INPUT-TO-THE-LANGUAGE-ACQUISITION-MECHANISMS, which will be whatever it is that those mechanisms need to create a novel representation. For most of the learning problems that we are interested in, the input-to-the-language acquisition-mechanisms will not be coming directly from the environment.  

The Autonomous Induction Model 

Carroll uses Jackendoff’s (1987) modularity model and Holland, Holyoak, Nisbett, and Thagard’s (1986) induction model to build her own Autonomous Induction model. She sees our linguistic faculty as comprised of a chain of representations acting on different levels: the lowest level interacts with physical stimuli, and the highest with conceptual representations. At the lowest level of representation, the integrative processor combines smaller representations into larger units, while the correspondence processor is responsible for moving the representations from one level to the next. Once representations are formed, they are categorized and combined according to UG-based or long-term memory-based rules. During successful parsing, rules are activated in each processor to categorize and combine representations. Failures occur when the rules are inadequate or missing. Consequently, the rule that comes closest to successfully parse the specific unit would be selected and would undergo the most economical and incremental revision. This process is repeated until parsing succeeds or is at least passable at that given level.

So Carroll’s explanation of how stimuli from the environment end up as linguistic knowledge is that two different types of processing are involved: processing for parsing and processing for acquisition. When the parsers fail, the acquisitional mechanisms are triggered (a view, as I’ve already suggested, which aligns with the notion of incomprehensible input).

For my purposes, what’s important in Carroll’s account is that speech signal processing doesn’t involve noticing – it’s a subconconscious process where learners detect, encode and respond to linguistic sounds. Furthermore, Carroll argues that once the representation enters the interlanguage, learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; and that the processing of forms and meanings are often not noticed.

Caroll sees intake as a subset of stimuli from the environment, while Gass defines intake as a set of processed structures waiting to be incorporated into IL grammar. It seems to me that Carroll provides a better description of input, and she is surely right to say that cognitive comparison of representations (wherever it takes place) is largely automatic and subconconscious. Awareness, Carroll concludes, is something that occurs, if at all, only after the fact; and that’s a conclusion which just about all the theories I’ve looked at come to.

Towell and Hawkins’ Model of SLA

Towell and Hawkins (1994) model begins with UG, which sets the framework within which linguistic forms in the L1 and L2 are related. Learners of an L2 after the age of seven years old have only partial access to UG, namely UG principles; they will transfer parameter settings from their L1, and where such settings conflict with L2 data, they may construct rules to mimic the surface properties of the L2. The second internal source is thus the first language.  Learners may transfer a parameter setting, or UG may make possible a kind of mimicking.

The adoption of the “partial access to UG” hypothesis leads Towell and Hawkins to assume that there are two different sorts of knowledge involved in interlanguage development:  linguistic competence (derived from UG and L1 transfer) and learned linguistic knowledge (derived from explicit instruction and negative feedback).

To explain the way in which interlanguage develops when simple triggering of parameters doesn’t happen, Towell and Hawkins use Levelt’s model of language production (1989)  to introduce the distinction between procedural (subconscious, automatic) and declarative (conscious) knowledge, and then Anderson’s ACT* (Adaptive Control of Thought) model (1983), to explain how the declarative knowledge gets processed (see below).

The information processing mechanisms condition the way in which input  provides data for hypotheses, the way in which hypotheses must be turned into productions for fluent use, and the final output of productions. (Towell and Hawkins, 1994: 248)

The full model is presented below:

Input and output pass through short-term memory, which determines the information available to long-term memories, and is used to pass information between the two types of long-term memory proposed: the declarative memory and the procedural memory.

Short-term memory consists in that set of nodes activated in memory at the same time and allows certain operations to be performed on relatively small amounts of information for a given time.  The processes are either controlled (the subject is required to pay attention to the process while it is happening), or automatic.  Automatic processes are inflexible and take a long time to set up.  Once processes have been automatised the limited capacity can be used for new tasks.

All knowledge initially goes into declarative memory; the internally derived hypotheses offer substantive suggestions for the core of linguistic knowledge and those parameters common to both L1 and L2.  The other areas of language are worked out by the interaction of data with the internally derived hypotheses.

The model suggests four learning routes:

Route one:

confirmation by external data of an internal hypothesis leading to the creation of a production to be stored in procedural memory first in associative form (i.e. under attentional control) and then in autonomous form for rapid use via the short term memory. (Towell and Hawkins, 1994: 250)

Route two:

initial storage of a form-function pair in declarative memory as an unanalysed whole.  If it cannot be analysed by the learner’s grammar but can be remembered for use in a given context, it may be shifted to procedural memory at the associative level.  It may be re-called into declarative knowledge where it may be re-examined, and if it is now analysable, it may be converted to another level of mental organisation before being passed back to the procedural level. (Towell and Hawkins, 1994: 250-251)

Route three

concerns explicit rules, like verb paradigms, vocabulary lists, lists of prepositions.  This knowledge can only be recalled in the form in which it was learned, and can be used to revise and correct output. (Towell and Hawkins, 1994: 251)

Route four

concerns strategies, which facilitate the proceduralisation of mechanisms for faster processing of input and greater fluency.  These strategies do not interact with internal hypotheses. (Towell and Hawkins, 1994: 251)

Hypotheses derived from UG either directly or via L1 are available as declarative knowledge, i.e. hypotheses which are tested via controlled processing where learners pay attention to what they are receiving and producing.  If the hypotheses are confirmed, Towell and Hawkins say they “can be quickly launched on the staged progression described by Anderson (1983, 1985).”


UG  is used to explain transfer, staged development and cross-learner systematicity.  The UG prevents the learner from entertaining “wild” hypotheses about the L2, and allows the learner to “learn” a series of structures by perceiving that a certain relationship between the L1 and L2 exists. Towell and Hawkins’ “partial access” view of UG and SLA is reflected in their belief that there is a lack of positive evidence available to L2 learners to enable them to reset the parameters already set in the L1, and that “the older you are at first exposure to an L2, the more incomplete your grammar will be”.

I’m not convinced! The part played by declarative knowledge seems particularly odd. How does knowledge of  UG principles form part of declarative knowdge? And the ACT model looks to me like an awkward bolt-on. The distinction between declarative and procedural knowledge leaves unanswered the question of the nature of the storage of information in declarative and procedural forms, and there’s no explanation of how the externally-provided data interact with the internally-derived hypotheses.

More generally, this a very complex model which pays scant regard to the Occam’s Razor criterion. There’s a profusion of terms and entities postulated by the theory – principles and parameters, declarative memory and production memory, procedural and declarative knowledge, associative and automatic procedural knowledge, linguistic competence and linguistic knowledge, mimicking, the use of a language module, a conceptualiser, and a formulator – which means that only the accumulation of research results from testing would make a proper evaluation possible. This hasn’t happened.

I’ve outlined it here, first because it was once quite influential; second because it’s another example of an attempt to explain how input leads to L2 knowledge by passing through a series of mental processing routines, and third because it gives me the chance to discuss Anderson’s ACT model.

Anderson’s ACT model

When applied to second language learning, the ACT model suggests that learners are first presented with information about the L2 (declarative knowledge ) and then, via practice, this is converted into unconscious knowledge of how to use the L2 (procedural knowledge). The learner moves from controlled to automatic processing, and through intensive linguistically focused rehearsal, achieves increasingly faster access to, and more fluent control over the L2 (see DeKeyser, 2007, for example).

The fact that nearly everybody successfully learns at least one language as a child without starting with declarative knowledge, and that millions of people learn additional languages without studying them (migrant workers, for example) are good reasons to doubt that learning a language is the same as learning a skill such as driving a car. Furthermore, the phenomenon of L1 transfer doesn’t fit well with a skills based approach, and neither do putative senstive periods (critical periods) for language learning. But the main reason for rejecting such an approach is that it contradicts all the SLA research findings related to interlanguage development which we’ve been examining in this review of SLA theories.

Firstly, as has been made it doesn’t make sense to present grammatical constructions one by one in isolation because most of them are inextricably inter-related. As Long (2015) says:

Producing English sentences with target-like negation, for example, requires control of word order, tense, and auxiliaries, in addition to knowing where the negator is placed. Learners cannot produce even simple utterances like “John didn’t buy the car” accurately without all of those. It is not surprising, therefore, that interlanguage development of individual structures has very rarely been found to be sudden, categorical, or linear, with learners achieving native-like ability with structures one at a time, while making no progress with others. Interlanguage development just does not work like that.

Secondly, as we have seen, research has shown that L2 learners follow their own developmental route, a series of interlocking linguistic systems called “interlanguages”.  Myles (2013) states that the findings on the route of interlanguage (IL) development is one of the most well documented findings of SLA research of the past few decades. She asserts that the route is “highly systematic” and that it “remains largely independent of both the learner’s mother tongue and the context of learning (e.g. whether instructed in a classroom or acquired naturally by exposure)”. The claim that instruction can influence the rate but not the route of IL development is probably the most widely accepted claim among SLA scholars today.

Pienemann comments:

Fifteen years later, Anderson appears to have revised his position. He states  “With very little and often no deliberate instruction, children by the time they reach age 10 have accomplished implicitly what generations of Ph.D. linguists have not accomplished explicitly.  They have internalised all the major rules of a language..” (Anderson, 1995, 364). In other words, Anderson no longer sees language acquisition as an instance of the conversion of declarative into procedural knowledge.

In  addition, it is well-documented that procedural knowledge does not have to progress through a declarative phase.  In fact, human participants in experiments on non-conscious learning were not only unaware of the rules they applied, they were not even aware that they had acquired any knowledge  (Pienemann, 1998: 41).

Next up: emergentism. And that will be the last part of the review.



Anderson, J. (1983) The Architecture of Cognition.Cambridge, MA: Harvard University Press

Carroll, S.(2015)  Expose and input in bilingual development. Bilingualism, Language and Cognition, 20,1, 16-31.

Carroll, S. (2001) Input and Evidence. Amsterdam: Benjamins.

Krashen, S. (1985) The Input Hypothesis: Issues and Implications. New York: Longman.

Levelt, W. (1989) Speaking: From Intention to Articulation.Cambridge, MA: MIT Press.

Long, M. (2015) SLA and TBLT. Wiley.

Towell, R. and Hawkins, R. (1994) Approaches to second language acquisition. Clevedon: Multilingual Matters.


SLA Part 6: Processing Input

Corder’s (1967) paper is often given as the starting point for SLA discussion of input. It included the famous claim:

The simple fact of presenting a certain linguistic form to a learner in the classroom does not necessarily qualify it for the status of input, for the reason that input is “what goes in” not what is available for going in, and we may reasonably suppose that it is the learner who controls this input, or more properly his intake (p. 165).

Corder here suggests that SLA is a process of learner-controlled development of interlanguages, “interlanguages” referring to learner grammars, their evolving knowledge of the language. This marks a shift in the way SLA researchers perceived input. No longer a strictly external phenomenon, input is now the interface between the external stimuli and learners’ internal systems. Input is potential intake, and intake is what learners use for IL development; but it remains unclear  what mechanisms and sub processes are responsible for the input-to-intake conversion. We can start with Krashen.


Krashen’s Input Model 

Here, comprehensible input is the same as intake. It contains mostly language the learner already knows, but also unknown elements, including some that correspond to the next immediate step along the interlanguage development continuum. This comprehensible input has to get through the affective filter and is then processed by a special language processor which Krashen says is the same as Chomsky’s LAD. Thanks to this processor, some of the new elements in the input are subconsciously acquired and become part of the learner’s interlanguage. A completely different part of the mind processes a different kind of knowledge which is learned by paying conscious attention to what teachers and books and people tell the learner about the language. This conscious knowledge can be used to monitor and change output. It’s referred to in the top right part of the diagram. Just by the way, Hulstijn (2013) points out that nearly 30 years after Krashen made his much-criticised acquisition / learning distinction, cognitive neuro-scientists now agree that declarative, factual knowledge (Krashen’s ‘learned knowledge’) is stored in the medial temporal lobe (in particular in the hippocampus), whereas procedural, relatively unconscious knowledge (Krashen’s ‘acquired knowledge’) is stored and processed in various (mainly frontal) regions of the cortex.

We’ve already seen a number of objections to Krashen’s Theory as a theory, but the important thing here is to see how he relies on the LAD (plus a monitor) to explain how we learn an L2. The theory thus leans heavily on Chomsky’s explanation of L1 acquisition and says that L2 acquisition is more or less the same – all we need to learn a language is comprehensible input, because we’re hard wired with a device that allows us to make enough sense of enough of the input to slowly work out the system for ourselves.

Black boxes: the Processors 

All theories of SLA – even usage-based theories – assume that there are some parts of the mind (or brain for the strict empiricists) involved in processing stimuli from the environment. The LAD is simply one attempt to describe what the processor does; namely provide rules for making sense of the input. The rules, which Chomsky describes in successive formulations of UG ( best understood, I think in terms of the principles and parameters model) help young children to map form to meaning. O’Grady gives the example of the rules which help the child make sense of information about the type of meaning most often associated with particular word classes.

“For example, the acquisition device might “tell” children that words referring to concrete things must be nouns. So language learners would know right away that words like dog, boy, house, and tree belong to that word class. This might just be enough to get started. Once children knew what some nouns looked like, they could start noticing other things on their own – like the fact that items in the noun class can occur with locator words like this and that, that they can take the plural ending, that they can be used as subjects and direct objects, that they are usually stressed, and so on.

Nouns with locator words: That dog looks tired. This house is ours.

Nouns with the plural ending: Cats make me sneeze. I like cookies.

Nouns used as subject or direct object: Dogs chase cats. A man painted our house.  

Information of this sort can then be used to deal with words like idea and attitude, which cannot be classified on the basis of their meaning. (They are nouns, but they don’t refer to concrete things.) Sooner or later a child will hear these words used with this or that, or with a plural, or in a subject position. If she’s learned that these are the signs of nounhood, it’ll be easy to recognize nouns that don’t refer to concrete things. If all of this is on the right track, then the procedure for identifying words belonging to the noun class would go something like this. (Similar procedures exist for verbs, adjectives, and other categories.)

What the acquisition device tells the child: If a word refers to a concrete object, it’s a noun.

What the child then notices: Noun words can also occur with this and that; they can be pluralized; they can be used as subjects and direct objects.

What the child can then do: Identify less typical nouns (idea, attitude, etc.) based on how they are used in sentences.

This whole process is sometimes called bootstrapping. The basic idea is that the acquisition device gives the child a little bit of information to get started (e.g., a language must distinguish between nouns and verbs; if a word refers to a concrete object, it’s a noun) and then leaves her to pull herself up the rest of the way by these bootstraps.”

Of course, the rules that govern how the words are put together are also needed. O’Grady calls this “a blueprint”.  In a sentence like Jean helped Roger the three words combine – but how?

“Does the verb combine directly with the two nouns?

Or does it combine first with its subject, forming a larger building block that then combines with the direct object?

Or does it perhaps combine first with its direct object, creating a building block that then combines with the subject?

How could a child possibly figure out which of these design options is right? For that matter, how could an adult? Once again, the acquisition device must come to the rescue by providing the following vital bits of information:

  • Words are grouped into pairs.
  • Subjects (doers) are higher than direct objects (undergoers).

With this information in hand, it’s easy for children to build sentences with the right design” (O’Grady,

So that’s one view of the “black box”, the language processor. It is, in the view of many scholars working on SLA, the best explanation so far of how children acquire linguistic knowledge, and of how they know things about the language which are not present in the input – it answers the poverty of the stimulus question. The LAD offers an innate system of grammatical categories and principles which define language, confine how language can vary and change, and explain how children learn language so successfully. And, using a few additional assumptions, it can can explain SLA and why most people find it so challenging, too.

VanPatten’s Input Processing Theory

VanPatten sees things slightly differently. His Input Processing (IP) theory is concerned with how learners derive intake from input, where intake is defined as the linguistic data actually processed from the input and held in working memory for further processing.

As such, IP attempts to explain how learners get form from input and how they parse sentences during the act of comprehension while their primary attention is on meaning. VanPatten’s model consists of a set of principles that interact in working memory, and takes account of the fact that working memory has very limited processing capacity. Content lexical items are searched out first since words are the principal source of referential meaning. When content lexical items and a grammatical form both encode the same meaning and when both are present in an utterance, learners attend to the lexical item, not the grammatical form. Here are VanPatten’s Principles of Input Processing:

P1. Learners process input for meaning before they process it for form.

P1a. Learners process content words in the input first. P1b. Learners prefer processing lexical items to grammatical items (e.g., morphology) for the same semantic information.

P1c. Learners prefer processing “more meaningful” morphology before “less” or “nonmeaningful” morphology.

P2. For learners to process form that is not meaningful, they must be able to process informational or communicative content at no (or little) cost to attention.

P3. Learners possess a default strategy that assigns the role of agent (or subject) to the first noun (phrase) they encounter in a sentence/utterance. This is called the first-noun strategy.

P3a. The first-noun strategy may be overridden by lexical semantics and event probabilities.

P3b. Learners will adopt other processing strategies for grammatical role assignment only after their developing system has incorporated other cues (e.g., case marking, acoustic stress).

P4. Learners process elements in sentence/utterance initial position first.

P4a. Learners process elements in final position before elements in medial position.

Perhaps the most important construct in the IP model is “Communicative value”: the more a form has communicative value, the more likely it is to get processed and made available in the intake data for acquisition, and it’s thus the forms with no or little communicative value which are least likely to get processed and, without help, may never get acquired. Notice that this account, like Pienemann’s discussed in Part 5, and indeed like VanPatten’s and O’Grady’s (see below), explains the input processing in terms of rational decisions taken on the basis of making the best use of relatively scarce processing resources.

I’m zooming through these theories without doing any of them real justice, and I apologise to all the scholars’ work that’s getting such brief treatment, but I hope that both a picture of the various architectures proposed, and the story of how SLA theories progressed, can be got from all this. Before we go on, I can’t resist quoting what VanPatten says at the end of one his books:

1. If you teach communicatively, you’d better have a working definition of communication. My argument for this is that you cannot evaluate what is communicative and what is appropriate for the classroom unless you have such a definition.

2. Language is too abstract and complex to teach and learn explicitly. That is, language must be handled in the classroom differently from other subject matter (e.g., history, science, sociology) if the goal is communicative ability. This has profound consequences for how we organize language-teaching materials and approach the classroom.

3. Acquisition is severely constrained by internal (and external) factors. Many teachers labor under the old present + practice + test model. But the research is clear on how acquisition happens. So, understanding something about acquisition pushes the teacher to question the prevailing model of language instruction.

4. Instructors and materials should provide student learners with level-appropriate input and interaction. This principle falls out of the previous one. Since the role of input often gets lip service in language teaching, I hope to give the reader some ideas about moving input from “technique” to the center of the curriculum.

5. Tasks (and not Exercises or Activities) should form the backbone of the curriculum. Again, language teaching is dominated by the present + practice + test model. One reason is that teachers do not understand what their options are, what is truly “communicative” in terms of activities in class, and how to alternatively assess. So, this principle is crucial for teachers to move toward contemporary language instruction.

6. A focus on form should be input-oriented and meaning-based. Teachers are overly preoccupied with teaching and testing grammar. So are textbooks. Students are thus overly preoccupied with the learning of grammar.

O’Grady (How Children learn language is the best book you’ll ever read on the subject) offers a different view. He proposes a ‘general nativist’ theory of first and second language acquisition where a modular acquisition device that does not include Universal Grammar is described. O’Grady sees his work as forming part of the emergentist rubric, but obviously, since he sees the acquisition device as a modular part of mind, he’s a long way from the real empiricists in the emergentist camp. Interestingly, for us, O’Grady accepts that there are sensitive periods involved in language learning, and that problems adults face in L2 acquisition can be explained by the fact that adults have only partial access to the (non-UG) L1 acquisition device.

O’Grady describes a different kind of processor, doing more general things, but it’s still a language processor and it’s still working not just on segments of voice streams, and words, but on syntax, and thus still seeing language as an abstract system governed by rules of syntax. When it comes to the more empiricist type of emergentist – Bates and MacWhinney’s Competition Model, for example – then here the talk is of a very general kind of processor doing the work, and this processor works almost exclusively on words and their meanings. Which brings us to the rub, so to speak.

As O’Grady argues so forcefully, the real disagreement between nativists and the emergentists who, unlike O’Grady, adopt a more or less empiricist epistemology, is that they can’t agree on what syntactic categories and structures are like. The dispute over the nature of how input gets processed is really a dispute about the nature of language. If you see language as a highly complex formal system best described by abstract rules that have no counterparts in other areas of cognition (O’Grady gives the requirement that sentences have a binary branching syntactic structure as one example of such a “rule”), then you see the processor, the acquisition device, as designed specifically for language. But if you see language in terms of its communicative function, then, since communication involves different types of considerations (O’Grady gives the examples of new versus old information, point of view, the status of speaker and addressee, the situation) then you’ll see the processor, as a  multipurpose acquisition device working on very simple data. In my opinion, such a view actually fails to explain either language or the acquisition process, but we’ll come to that. For now, I’m trying to sketch theories of SLA in such a way that we may draw teaching implications from them.

Just to remind you, my argument is that language is best seen as a formal system of representations and that we learn it in a different way to the way we learn other things. We learn language implicitly, subconsciously, but as adults learning an L2, our access to the key processors is limited, so we need to supplement this learning with a bit of attention to some “fragile” (non salient, for example) elements. Which gets us nicely back to the main narrative.

Swain’s (1985) famous study of French immersion programmes led to her claim that comprehensible input alone can allow learners to reach high levels of comprehension, but their proficiency and accuracy in production will lag behind, even after years of exposure. Further studies gave more support to this view, and to the opinion that comprehensible input is the necessary but not sufficient condition for proficiency in an L2. Swain’s argument was that we must give more attention to output, but what took greater hold was the view that we need to “notice” formal features of the input.

Schmidt’s Noticing  (again)

In Part 4, I discussed Schmidt’s view. Here’s the diagram:

As we saw, Schmidt completely rejects Kashen’s model, and insists that it’s ‘noticing’, not the unconscious workings of  the LAD, that drives interlanguage development.  I outlined my objections to even the modified 2001 version of Schmidt’s noticing construct in Part 4, so let’s focus on the main one here: the construct doesn’t clearly indicate the roles of conscious and subconscious, or explicit and implicit learning. In the case of children learning their L1, the processing of input is mostly a subconscious affair, whether or not UG has anything to do with it.  For those over the age of 16 learning an L2, according to Krashen, it’s also mostly a subconscious process, although even Krashen admits that some conscious hard work at learning helps to speed up the process and to reach a higher level of proficiency.  But it’s not clear, at least to me, what Schmidt means by noticing, and to what extent he sees SLA as involving conscious learning. I think that his 2001 paper seems to concede that implicit learning is still the main driver of interlanguage development, and I think that’s what Long, for example, takes Schmidt to mean.

Gass: An Integrated view of SLA  

Gass (1997), influenced by Schmidt, offers a more complete picture of what happens to input. She says it goes through stages of apperceived input, comprehended input, intake, integration, and output, thus subdividing Krashen’s comprehensible input into three stages: apperceived input, comprehended input, and intake. I don’t quite get “apperceived input”; Gass says it’s the result of attention, in the similar sense as Tomlin and Villa’s (1994) notion of orientation, and Schmidt says it’s the same as his noticing, which doesn’t help me much. In any case, once the intake has been worked on in working memory, Gass stresses the importance of negotiated interaction during input processing and eventual acquisition. Here, she adopts Long’s highly influential construct of negotiation for meaning which refers to what learners do when there’s a failure in communicative interaction. As a result of this negotiation, learners get more usable input, they give attention (of some sort) to problematic features in the L2, and make mental comparisons between their IL and the L2. Gass says that negotiated interaction enhances the input in three ways:

  1. it’s made more comprehensible,
  2. problematic forms that impede comprehension are highlighted and forced to be processed to achieve successful communication.
  3. through negotiation, learners receive both positive and negative feedback that are juxtaposed immediately to the problematic form, and the close proximity facilitates hypothesis-testing and revision (Doughty, 2001).

Many scholar have commented that  these effects should be regarded as a facilitator of learning, not a mechanism for learning, and I have to say that in general I find the Gass model a rather unsatisfactory compilation of bits. Still, it’s part of the story, and it’s certainly a well-considered, thorough attempt to explain how input gets processed.

We still have to look at the theories of Towell and Hawkins, Susanne Carroll, Jan Hulstijn, and then Bates & MacWhinney and Nick Ellis.  The models reviewed so far agree on the need for comprehensible input; learners decode enough of the input to make some kind of  conceptual representation, which can then be compared with linguistic structures which already form part of the interlanguage.  As is so often the case with theories of learning (Darwin comes to mind), it’s the bits that don’t fit, or that can’t be parsed, that  cause a “mental jolt in processing”, as Sun (2008) calls it. It’s the incomprehensibility of the input that triggers learning, as I’m sure Schmidt would agree.



 Corder, S. P. (1967). The significance of learners’ errors. IRAL, 5, 161-170.

Faerch, C., & Kasper, G. (1980). Processing and strategies in foreign language learning and communication. The Interlangauge Studies Bulletin—Utrecht, 5, 47-118.

Gass, S. M. (1997). Input, interaction, and the second language learner. Mahwah, NJ: Lawrence Elrbaum.

Hulstijn, J. (2013) Is the Second Language Acquisition discipline disintegrating? Language Teaching, 46, 4 , pp 511 – 517.

Krashen, S. D. (1982). Principles and practice in second language acquisition. Oxford, UK: Pergamon.

Krashen, S. D. (1985). The input hypothesis: Issues and implications. London: Longman.

speech perception, reading, and psycholinguistics. New York: Academic Press.

O’Grady, W. (2912) How Children Learn Language. Cambridge, CUP

Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3-32). Cambridge UK: Cambridge University Press.

Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. M. Gass & C. G. Madden (Eds.), Input in second language acquisition (pp. 235-253). Rowley, MA: Newbury House.

VanPatten, B. (2017). While we’re on the topic…. BVP on Language, Language Acquisition, and Classroom Practice. Alexandria, VA: The American Council on the Teaching of Foreign Languages.

VanPatten, B. (2003). From Input to Output: A Teacher’s Guide to Second Language Acquisition. New York: McGraw-Hill.

VanPatten, B. (1996). Input Processing and Grammar Instruction: Theory and Research. Norwood, NJ: Ablex.

SLA Part 5: Pienemann’s Processability Theory

This theory started out as the Multidimensional Model, which came from work done by the ZISA group mainly at the University of Hamburg in the late seventies.  One of the first findings of the group was that all the children and adult learners of German as a second language in the study adhered to a five-stage developmental sequence.

Stage X – Canonical order (SVO)

die kinder spielen mim bait   the children play with the ball

Stage X + I- Adverb preposing (ADV)

da kinder spielen   there children play

Stage X + 2- Verb separation (SEP)

alle kinder muss die pause machen  all children must the break have

Stage X+3- Inversion (INV)

dam hat sie wieder die knock gebringt  then has she again the bone brought

Stage X+4- Verb-end (V-END)

er sagte, dass er nach house kommt  he said that he home comes

Learners didn’t abandon one interlanguage rule for the next as they progressed; they added new ones while retaining the old, and thus the presence of one rule implies the presence of earlier rules.

The explanation offered for this developmental sequence is that each stage reflects the learner’s use of three speech-processing strategies. Clahsen and Pienemann argue that processing is “constrained” by the strategies and development consists of the gradual removal of these constraints, or the “shedding of the strategies”, which allows the processing of progressively more complex structures. The strategies are:

(i) The Canonical Order Strategy.  The construction of sentences at Stage X obeys simple canonical order that is generally assumed to be “actor – action – acted upon.”  This is a pre-linguistic phase of acquisition where learners build sentences according to meaning, not on the basis of any grammatical knowledge.

(ii) The Initialisation-Finalisation Strategy. Stage X+1 occurs when learners notice discrepancies between their rule and input.  But the areas of input where discrepancies are noticed are constrained by perceptual saliency – it is easier to notice differences at the beginnings or the ends of sentences since these are more salient than the middle of sentences. As a result, elements at the initial and final positions may be moved around, while leaving the canonical order undisturbed.

Stage X+2 also involves this strategy, but verb separation is considered more difficult than adverb fronting, because the former requires not just movement to the end position but also disruption of a continuous constituent, the verb + particle, infinitive, or particle.

Stage X+3 is even more complex, since it involves both disruption and movement of an internal element to a non-salient position, and so requires the learner to abandon salience and recognise different grammatical categories.

(iii) The Subordinate Clause Strategy.  This is used in Stage X+4 and requires the most advanced processing, skills because the learner has to produce a hierarchical structure, which involves identifying sub-strings within a string and moving elements out of those sub-strings into other positions.

These constraints on interlanguage development are argued to be universal; they include all developmental stages, not just word order, and they apply to all second languages, not just German.

The ZISA model also proposed a variational dimension to SLA, and hence the name “Multidimensional”.  While the developmental sequence of SLA is fixed by universal processing restraints, individual learners follow different routes in SLA, depending primarily on whether they adopt a predominantly “standard” orientation, favouring accuracy, or a predominantly “simplifying” one, favouring communicative effectiveness.

Processability Theory

Pienemann ‘s next development (1998) is to expand the Multidemensional model into a Processability Theory, which predicts which grammatical structures an L2 learner can process at a given level of development.

This capacity to predict which formal hypotheses are processable at which point in development provides the basis for a uniform explanatory framework which can account for a diverse range of phenomena related to language development (Pienemann, 1998: xv).

The important thing about this theory is that while Pienemann describes the same route as other scholars have done for interlanguage development, in addition now he is offering an explanation for why interlanguage grammars develop in the way they do. His theory proposes that

for linguistic hypotheses to transform into executable procedural knowledge the processor needs to have the capacity of processing those hypotheses (Pienemann, 1998: 4).

Pienemann, in other words, argues that there will be certain linguistic hypotheses that, at a particular stage of development, the L2 learner cannot access because he or she doesn’t have the necessary processing resources available. At any stage of development, the learner can produce and comprehend only those L2 linguistic forms which the current state of the language processor can handle.

The processing resources that have to be acquired by the L2 learner will, according to Processability Theory, be acquired in the following sequence:

  1. lemma access,
  2. the category procedure,
  3. the phrasal procedure
  4. the S-procedure,
  5. the subordinate clause procedure – if applicable. (Pienemann, 1998: 7)

The theory states that each procedure is a necessary prerequisite for the following procedure, and that

the hierarchy will be cut off in the learner grammar at the point of the missing processing procedures and the rest of the hierarchy will be replaced by a direct mapping of conceptual structures onto surface form (Pienemann, 1998: 7).

The SLA process can therefore be seen as one in which the L2 learner entertains hypotheses about the L2 grammar and that this “hypothesis space” is determined by the processability hierarchy.


In this account of the SLA process, the mechanism at work is an information processing device, which is constrained by limitations in its ability to process input. The device adds new rules while retaining the old ones, and as the limiting “speech-processing strategies” which constrain processing are removed, this allows the processing of progressively more complex structures.

What is most impressive about the theory (it provides an explanation for the interlanguage development route) is also most problematic, since the theory takes as self-evident that our cognition works in the way the model suggests. We are told that people see things in a canonical order of “actor – action – acted upon.”, that people prefer continuous to discontinuous entities, that the beginnings and ends of sentences are more salient than the middles of sentences and so on, without being offered much justification for such a view, beyond the general assumption of what is easy and difficult to process. As Towell and Hawkins say of the Multidimensional Model:

They require us to take on faith assumptions about the nature of perception. The perceptual constructs are essentially mysterious, and what is more, any number of new ones may be invented in an unconstrained way (Towell and Hawkins, 1994: 50).

This criticism isn’t actually as damning as it might appear – there are obviously good reasons to suppose that simple things will be more easily processed than complex ones, there is a mountain of evidence from L1 acquisition studies to support some of the claims, and, of course, whatever new assumptions “may be invented” can be dealt with if and when they appear. As Pienemann makes clear, the assumptions he makes are common to most cognitive models, and, importantly, they result in making predictions that are highly falsifiable.

Apart from some vagueness about precisely how the processing mechanism works, and exactly what constitutes the acquisition of each level, the theory has little to say about transfer, and deals with a limited domain, restricting itself to an account of processing that accounts for speech production, and avoiding any discussion of linguistic theory.

In brief, the two main strengths of this theory are that it provides not just a description, but an explanation of interlanguage development, and that it is testable. The explanation is taken from experimental psycholinguistics, not from the data, and is thus able to make wide, strong predictions, and to apply to all future data.  The predictions the theory makes are widely-applicable and, to some extent, testable: if we can find an L2 learner who has skipped a stage in the developmental sequence, then we will have found empirical evidence that challenges the theory. Since the theory also claims that the constraints on processability are not affected by context, even classroom instruction should not be able to change or reduce these stages.

The Teachability Hypothesis 

Which brings us to the most important implication of Pieneamann’s theory: the Teachability Hypothesis. First proposed in 1984, this predicts that items can only be successfully taught when learners are at the right stage of interlanguage development to learn them. Note immediately that neither Pienemann or anybody else is claiming to know anything but the outlines of the interlanguage development route. We don’t have any route map, and even if we did, and even if we could identify the point where each of our students was on the map (i.e., where he or she was on his or her her interlanguage trajectory) this wouldn’t mean that explicit teaching of any particular grammar point or lexical chunk, for example, would lead to procedural knowledge of it. No; what Pienamann’s work does is to give further support to the view that interlanguage development is a cognitive process involving slow, dynamic reformulation and constrained by processing limitations.

Whether Pienemann’s theory gives a good explanation of SLA is open to question, to be settled by an appeal to empirical research and more critical interrogation of the constructs. But there’s no question that Pieneamann’s research adds significantly to the evidence for the claim that SLA is a process whose route is unaffected by teaching. In order to respect our students interlanguage development, we must teach in such a way that they are given the maximum opportunities to work things out for themselves, and avoid the mistake of trying to teach them things they’re not ready, or motivated, to learn.


For a good discussion of Pienemann’s theory, see the peer commentaries in the first issue of  Biligualism: Language and Cognition: Vol. 1, Number 1, 1998: entirely devoted to Processibility Theory.

SLA Part 4: Schmidt’s Noticing Hypothesis

(Note: Following a few emails I’ve received, I should make it clear that unless referring to UG, I use the word “grammar” in the sense that linguists use it; viz., “knowledge of a language”.)

Schmidt, undeterred by McLaughlin’s warning to stay clear of attempts to define “consciousness” , attempts to do away with its “terminological vagueness” by examining three senses of the term:

  1. consciousness as awareness,
  2. consciousness as intention,
  3. consciousness as knowledge.

1 Consciousness as awareness

Schmidt distinguishes between three levels of awareness: Perception, Noticing and Understanding. The second level, Noticing, is the key to Schmidt’s eventual hypothesis. Noticing is focal awareness.

When reading, for example, we are normally aware of (notice) the content of what we are reading, rather than the syntactic peculiarities of the writer’s style, the style of type in which the text is set, music playing on a radio in the next room, or background noise outside a window.  However, we still perceive these competing stimuli and may pay attention to them if we choose (Schmidt, 1990: 132).

Noticing refers to a private experience, but it can be operationally defined as “availability for verbal report”, and these reports can be used to both verify and falsify claims concerning the role of noticing in cognition.

2 Consciousness as intention

This distinguishes between awareness and intention behaviour. “He did it consciously”, in this second sense, means “He did it intentionally.” Intentonal learning is not the same as noticing.

3 Cnsciousness as knowledge

Schmidt suggests that 6 different contrasts (C) need to be distinguished:

C1: Unconscious learning refers to unawareness of having learned something.

C2: Conscious learning refers to noticing and unconscious learning to picking up stretches of speech without noticing them.  Schmidt calls this the “subliminal”  learning question: is it possible to learn aspects of a second language that are not consciously noticed?

C3: Conscious learning refers to intention and effort.  This is the incidental learning question: if noticing is required, must learners consciously pay attention?

C4: Conscious learning is understanding principles of the language, and unconscious learning is the induction of such principles.  This is the implicit learning question: can second language learners acquire rules without any conscious understanding of them?

C5: Conscious learning is a deliberate plan involving study and other intentional learning strategies, unconscious learning is an unintended by-product of communicative interaction.

C6: Conscious learning allows the learner to say what they appear to “know”.

Addressing C2, Schmidt points to diasagreement on a definition of intake. While Krashen seems to equate intake with comprehensible input, Corder distinguishes between what is available for going in and what actually goes in, but neither Krashen nor Corder explains what part of input functions as intake for the learning of form.   Schmidt also notes the distinction Slobin (1985), and Chaudron (1985) make between preliminary intake (the processes used to convert input into stored data that can later be used to construct language), and final intake (the processes used to organise stored data into linguistic systems).

Schmidt proposes that all this confusion is resolved by defining intake as:

that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139).

The implication of this is that:

subliminal language learning is impossible, and that noticing is the necessary and sufficient condition for converting input into intake (Schmidt, 1990:  130).

The only study mentioned by Schmidt in support of his hypothesis is by Schmidt and Frota (1986) which examined Schmidt’s own attempts to learn Portuguese, and found that his notes matched his output quite closely.  Schmidt himself admits that the study does not show that noticing is sufficient for learning, or that noticing is necessary for intake.  Nevertheless, Schmidt does not base himself on this study alone; there is, Schmidt claims evidence from a wider source:

… the primary evidence for the claim that noticing is a necessary condition for storage comes from studies in which the focus of attention is experimentally controlled. The basic finding, that memory requires attention and awareness, was established at the very beginning of research within the information processing model (Schmidt, 1990: 141).

Addressing C3, the issue of incidental learning versus paying attention, Schmidt acknowledges that the claim that conscious attention is necessary for SLA runs counter to both Chomsky’s rejection of any role for conscious attention or choice in L1 learning, and the arguments made by Krashen, Pienemann and others for the existence of a natural order or a developmental sequence in SLA.  Schmidt says that Chomsky’s arguments do not necessarily apply to SLA, and that

natural orders and acquisition sequences do not pose a serious challenge to my claim of the importance of noticing in language learning, …they constrain but do not eliminate the possibility of a role for selective, voluntary attention (Schmidt, 1990: 142).

Schmidt accepts that “language learners are not free to notice whatever they want” (Schmidt, 1990: 144), but, having discussed a number of factors that might influence noticing, such as expectations, frequency, perceptual salience, skill level, and task demands, concludes that

those who notice most, learn most, and it may be that those who notice most are those who pay attention most.  (Schmidt, 1990: 144)

As for C4, the issue of implicit learning versus learning based on understanding, Schmidt judges the question of implicit second language learning to be the most difficult “because it cannot be separated from questions concerning the plausibility of linguistic theories” (Schmidt, 1990: 149). But Schmidt rejects the “null hypothesis” which claims that, as he puts it, “understanding is epiphenomenal to learning, or that most second language learning is implicit” (Schmidt, 1990: 149).


Schmidt’s hypothesis caused an immediate stir within the academic community and quickly became widely-accepted.  It caused Mike Long to re-write his Interaction hypothesis and has been used by many scholars as the basis for studies of SLA. More importantly for my thesis, “noticing” is being increasingly used by teacher trainers, often with scant understanding of it, to justify concentrating on explicit grammar teaching.

I have the following criticisms to make of Schmidt’s noticing hypothesis.

1. Empirical support for the Noticing Hypothesis is weak

In response to a series of criticisms of his original 1990 paper, Schmidt’s 2001 paper gives various sources of evidence of noticing, all of which have been subsequently challenged:

a) Schmidt says learner production is a source of evidence, but no clear method for identifying what has been noticed is given.

b) Likewise, learner reports in diaries. Schmidt cites Schmidt & Frota (1986), and Warden, Lapkin, Swain and Hart (1995), but, as Schmidt himself points out, diaries span months, while cognitive processing of L2 input takes place in seconds. Furthermore, as Schmidt admits, making diaries requires not just noticing but reflexive self-awareness.

c) Think-aloud protocols. Schmidt agrees with the objection made to such protocols that studies based on them cannot assume that the protocols include everything that is noticed.  Schmidt cites Leow (1997), Jourdeais, Ota, Stauffer, Boyson, and Doughty (1995) who used think-aloud protocols in focus-on-form instruction, and Schmidt concludes that such experiments cannot identify all the examples of target features that were noticed.

d) Learner reports in a CALL context (Chapelle, 98) and programs that track the interface between user and program – recording mouse clicks and eye movements (Crosby 1998). Again, Schmidt concedes that it is still not possible to identify with any certainty what has been noticed.

e) Schmidt claims that the noticing hypothesis could be falsified by demonstrating the existence of subliminal learning either by showing positive priming of unattended and unnoticed novel stimuli or by showing learning in dual task studies in which central processing capacity is exhausted by the primary task. The problem in this case is that in positive priming studies one can never really be sure that subjects did not allocate any attention to what they could not later report, and similarly, in dual task experiments one cannot be sure that no attention is devoted to the secondary task. Jacoby, Lindsay, & Toth (1996, cited in Schmidt, 2001: 28) argue that the way to demonstrate true non-attentional learning is to use the logic of opposition, to arrange experiments where unconscious processes oppose the aims of conscious processes.

f) Merikle and Cheesman distinguish between the objective and subjective thresholds of perception. The clearest evidence that something has exceeded the subjective threshold and been consciously perceived or noticed is a concurrent verbal report, since nothing can be verbally reported other than the current contents of awareness. Schmidt argues that this is the best test of noticing, and that after the fact recall is also good evidence that something was noticed, providing that prior knowledge and guessing can be controlled.  For example, if beginner level students of Spanish are presented with a series of Spanish utterances containing unfamiliar verb forms, are forced to recall immediately afterwards the forms that occurred in each utterance, and can do so, that is good evidence that they did notice them. On the other hand, it is not safe to assume that failure to do so means that they did not notice.  It seems that it is easier to confirm that a particular form has not been noticed than that it has: failure to achieve above-chance performance in a forced-choice recognition test is a much better indication that the subjective threshold has not been exceeded and that noticing did not take place.

g) Truscott (1998) points out that the reviews by Brewer (1974) and Dawson and Schell (1987), cited by Schmidt, 1990), dealt with simple conditioning experiments and that, therefore, inferences regarding learning an L2 were not legitimate. Brewer specifically notes that his conclusions do not apply to the acquisition of syntax, which probably occurs ‘in a relatively unconscious ,  automatic fashion’ (p . 29). Truscott further points out that while most current research on unconscious learning is plagued by continuing controversy, “one can safely conclude that the evidence does not show that awareness of the information to be acquired is necessary for learning” (p. 108).

h) Altman (1990) gathered data in a similar way to Schmidt (1986) in studying her learning of Hebrew over a five-year period. Altman found that while half her verbalisation of Hebrew verbs could be traced to diary entries of noticing, it was not possible to identify the source of the other half and they may have become intake subconsciously.

i) Alanen’s (1992) study of Finnish L2 learning found no significant statistical difference between an enhanced input condition group and the control group.

j) Robinson’s (1997) study found mixed results for noticing under implicit, incidental, rule-search and instructed conditions.

Furthermore, studies of ‘noticing’ have been criticised for serious methodological problems:

i) The studies are not comparable due to variations in focus and in the conditions operationalized.

ii) The level of noticing in the studies may have been affected by variables which casts doubt on the reliability of the findings.

iii) Cross (2002) notes that “only Schmidt and Frota’s (1986) and Altman’s (1990) research considers how noticing target structures positively relates to their production as verbal output (in a communicative sense), which seems to be the true test of whether noticing has an effect on second language acquisition. A dilemma associated with this is that, as Fotos (1993) states, there is a gap of indeterminate length between what is noticed and when it appears as output, which makes data collection, analysis and correlation problematic.”

iv) Ahn (2014) points to a number of problems that have been identified in eye-tracking studies, especially those using heat map analyses. (See Ahn (2014) for the references that follow.)Heat maps are only “exploratory” (p. 239), and they cannot provide temporal information on eye movement, such as regression duration, “the duration of the fixations when the reader returns to the lookzone” (Simard & Foucambert, 2013, p. 213), which might tempt researchers to rush into a conclusion that favors their own predictions. Second, as Godfroid et al. (2013) accurately noted, the heat map analyses in Smith (2012) could not control the confounding effects of “word length, word frequency, and predictability, among other factors” (p. 490). This might have yielded considerable confounding effects as well. As we can infer from the analyses shown in Smith (2012), currently the utmost need in the field is for our own specific guidelines for using eye-tracking methodology to conduct research focusing on L2 phenomena (Spinner, Gass, & Behney, 2013). Because little guidance is available, the use of eye tracking is often at risk of misleading researchers into making unreliable interpretations of their results.


2 The construct of “noticing” is not clearly defined. Thus, it’s not clear what exactly it refers to, and, as has already been suggested above, there’s no way of assertaining when it is, and when it isn’t being used by L2 learners.

Recall that in his original 1990 paper, Schmidt claimed that “intake” was the sub-set of  input which is noticed, and that the parts of input that aren’t noticed are lost. Thus, Schmidt’s Noticing Hypothesis, in its 1990 version, claims that noticing is the necessary condition for learning an L2. Noticing is said to be the first stage of the process of converting input into implicit knowledge. It takes place in short-term memory (where, according to the original claim, the noticed ‘feature’ is compared to features produced as output) and it is triggered by these factors: instruction, perceptual salience, frequency, skill level, task demands, and comparing.

But what is it? It’s “focused attention”, and, Schmidt argues, attention research supports the claim that consciousness in the form of attention is necessary for learning, Truscott (1998) points out that such claims are “difficult to evaluate and interpret”. He cites a number of scholars and studies to support the view that the notion of attention is “very confused”, and that it’s “very difficult to say exactly what attention is and to determine when it is or is not allocated to a given task. Its relation to the notoriously confused notion of consciousness is no less problematic”. He concludes (1998, p. 107) “The essential point is that current  research and theory on attention, awareness and learning are not clear enough to  support any strong claims about relations among the three.”

In an attempt to clarify matters and answer his critics, Schmidt re-formulated his Noticing Hypothesis in 2001. A number of concessions are made, resulting in a much weaker version of the hypothesis. To minimise confusion, Schmidt says  he will use ‘noticing’ as a technical term equivalent to what Gass (1988) calls  “apperception”, what Tomlin and Villa (1994) call “detection within selective attention,” and what Robinson’s (1995) calls “detection plus rehearsal in short term memory.” So now, what is noticed are “elements of the surface structure of utterances in the input, instances of language” and not “rules or principles of which such instances may be exemplars”. Noticing does not refer to comparisons across instances or to reflecting on what has been noticed.

In a further concession, in the section “Can there be learning without attention?”, Schmidt admits there can, with the L1 as a source that helps learners of an L2 being an obvious example. Schmidt says that it’s “clear that successful second language learning goes beyond what is present in input”. Schmidt presents evidence which, he admits, “appears to falsify the claim that attention is necessary for any learning whatsoever”, and this prompts him to propose the weaker version of the Noticing Hypothesis, namely “the more noticing, the more learning”.

There are a number of problems with this reformulation.

Gass: Apperception

As was mentioned, Schmidt (2001) says that he is using ‘noticing’ as a technical term equivalent to Gass’ apperception. True to dictionary definitions of apperception, Gass defines apperception as “the process of understanding by which newly observed qualities of an object are initially related to past experiences”. The light goes on, the learner realises that something new needs to be learned. It’s “an internal cognitive act in which a linguistic form is related to some bit of existing knowledge (or gap in knowledge)”. It shines a spotlight on the identified form and prepares it for further analysis. This seems to clash with Schmidt’s insistence that noticing does not refer to comparisons across instances or to reflecting on what has been noticed, and in any case, Gass provides no clear explanation of how the subsequent stages of her model convert apperceptions into implicit knowledge of the L2 grammar.

Tomlin and Villa: Detection

Schmidt says that ‘noticing’ is also equivalent to what Tomlin and Villa (1994) call “detection within selective attention.” But is it? Surely Tomlin and Villa’s main concern is detection that does not require awareness. According to Tomlin and Villa, the three components of attention are alertness, orientation, and detection, but only detection is essential for further processing and awareness plays no important role in L2 learning.

Carroll: input doesn’t contain mental constructs; therefore they can’t be noticed

As Gregg commented when I discussed Scmidt’s hypthesis in my earlier blog “You can’t notice grammar!” Schmidt’s 2010 paper attempts to deal with Suzanne Carroll’s objection by first succinctly summarising Carroll’s view that attention to input plays little role in L2 learning because most of what constitutes linguistic knowledge is not in the input to begin with. She argues that Krashen, Schmidt and Gass all see “input” as observable sensory stimuli in the environment from which forms can be noticed,

whereas in reality the stuff of acquisition (phonemes, syllables, morphemes, nouns, verbs, cases, etc.) consists of mental constructs that exist in the mind and not in the environment at all. If not present in the external environment, there is no possibility of noticing them (Carroll, 2001, p.47).

Schmidt’s answer is:

In general, ideas about attention, noticing, and understanding are more compatible with instance-based, construction-based and usage-based theories (Bley-Vroman, 2009; Bybee & Eddington, 2006; Goldberg, 1995) than with generative theories.

It seems that Schmidt, in an attempt to save his hypothesis, is prepared, to ditch what Carroll refers to as “100 years of linguistic research, which  demonstrates that linguistic cognition is structure dependent”, and to adopt the connectionist view that linguistic knowledge is encoded as activated neural nets, and that it is linked to acoustic events by no more than association.

I think it’s worth quoting a bit more from Carroll’s impressive 2001 book. Commenting on all those who start with input, she says:

The view that input is comprehended speech is mistaken  and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!” 

Learners do not attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal.  Thus, it is possible to have speech-signal processing without attention-as-noticing or attention-as-awareness. Learners may unconsciously and without awareness detect, encode and respond to linguistic sounds; learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; the processing of forms and meanings are often not noticed; and attention is thus the result of processing not a prerequisite for processing.


In brief:

1. In his 2010 paper, Schmidt confirms the concessions made in 2001, which amount to saying that ‘noticing’ is not needed for all L2 learning, but that the more you notice the more you learn. He also confirms that noticing does not refer to reflecting on what is noticed.

2. The Noticing Hypothesis even in its weaker version doesn’t clearly describe the construct of ‘noticing’.

3. The empirical support claimed for the Noticing Hypothesis is not as strong as Schmidt (2010) claims.

4. A theory of SLA based on noticing a succession of forms faces the impassable obstacle that, as Schmidt seemed to finally admit, you can’t ‘notice’ rules, or principles of grammar.

5. “Noticing the gap” is not sanctioned by Schmidt’s ammended Noticing Hypothesis.

6. The way that so many writers and ELT trainers use “noticing” to justify all kinds of explicit grammar and vocabulary teaching demonstrates that Scmidt’s Noticing Hypothesis is widely misunderstood and misused.



Ahn, J. I. (2014) Attention, Awareness, and Noticing in SLA: A Methodological Review.  MSU Working Papers in SLS, Vol. 5.

Carroll, S. (2001) Input and Evidence. Amsterdam; Benjamins.

Corder, P. (1967) The significance of learners’ errors. International Review of Applied Linguistics, 5, 161-169

Cross, J. (2002) ‘Noticing’ in SLA: Is it a valid concept? Downloaded from

Ellis, N. (1998) Emergentism, Connectionism and Language Learning. Language Learning 48:4,  pp. 631–664.

O’Grady, W. (2005) How Children learn language. CUP.

Schmidt,R.W. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. and Frota, S.N.  (1986) Developing  basic  conversational  ability in  a  second language:  a  case  study of an adult learner of Portuguese . In Day , R.R., editor,  Talking to learn: conversation in second language acquisition. Rowley, MA: Newbury.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Truscott, J. (1998)  Noticing in second language acquisition: a critical review. Second Language Research, 2.

SLA: Behaviourism and Mentalism

The Shift From a Behaviourist to a Cognitivist View of SLA

Before proceeding with the review of SLA, I need to recap the story so far, in order to highlight the difference between two contradictory epistemologies. I do so for two reasons. Firstly, we are seeing a return to behaviourism in the guise of increasingly popular, and increasingly misinterpreted, usage-based theories of language learning such as emergentism. The epistemological underpinnings of these theories are rarely mentioned, particularly by ELT teacher trainers who either clumsily endorse them or airily dismiss them. Secondly, it gives me an opportunity to restate the implications of the shift to a more cognitive view of the SLA process.


Behaviourism has much in common with logical positivism, the most spectacularly misguided movement in the history of philosophy. Chasing the chimera of absolute truth, the logical positivists, most famously, those in the Vienna Circle, formed in the early 1920s in order to clean up language and put science on a sure empirical footing. The mad venture was all over before the second world war broke out, but not so behaviourism, which slightly preceded it with the 1913 work of pioneering American psychologist John B. Watson, and went on to outlive in when B.F. Skinner took over after WW2.

Watson, influenced by the work of Pavlov (1897) and Bekhterev (1896) on conditioning of animals, but also, later, by the works of Mach (1924) and Carnap (1927) from the Vienna School, attempted to make psychological research “scientific” by using only objective procedures, such as laboratory experiments which were designed to establish statistically significant results. Watson formulated a stimulus-response theory of psychology according to which all complex forms of behaviour are explained in terms of simple muscular and glandular elements that can be observed and measured.  No mental “reasoning”, no speculation about the workings of any “mind”, were allowed. Thousands of researchers adopted this methodology, and from the end of the first world war until the 1950s an enormous amount of research on learning in animals and in humans was conducted under this strict empiricist regime.

In 1950 behaviourism could justly claim to have achieved paradigm status, and at that moment B.F. Skinner became its new champion.  Skinner’s contribution to behaviourism was to challenge the stimulus-response idea at the heart of Watson’s work and replace it by a type of psychological conditioning known as reinforcement (see Skinner, 1957, and Toates and Slack, 1990).  Important as this modification was, it is Skinner’s insistence on a strict empiricist epistemology, and his claim that language is learned in just the same way as any other complex skill is learned, by social interaction, that is important here.

The strictly empiricist epistemology of  behaviourism outlaws any talk of mental structure or of internal mental states. While it’s perfectly OK to talk about these things in every day parlance, they have no place in scientific discourse. Strictly speaking –  which is how scientists, including psychologists should speak – there is no such thing as the mind, and there is no sense (sic) in talking about feelings or any other stuff that can’t be observed by appeal to the senses. Behaviourism sees psychology as the science of behaviour, not the science of mind. Behaviour can be described and explained without any ultimate reference to mental events or to any internal psychological processes. The sources of behaviour are external (in the environment), not internal (in the mind). If mental terms or concepts are used to describe behaviour, then they must be replaced by behavioural terms or paraphrased into behavioural concepts.

Behaviour is all there is: humans and animals are organisms that can be observed doing things, and the things they do are explained in terms of responses to their environment, which also explains all types of learning.  Learning a language is like learning anything else – it’s the result of repeated responses to stimuli.  There are no innate rules by which organisms learn, which is to say that organisms learn without being innately or pre-experientially provided with explicit procedures by which to learn. Before organisms interact with the environment they know nothing – by definition. Learning doesn’t consist of rule-governed behaviour; learning is what organisms do in response to stimuli. An organism learns from what it does, from its successes and mistakes, as it were.

The minimalist elegance of such a stark view is impressive, even attractive, – especially if you’re sick of trying to make sense of Freud, Jung, or Adler, perhaps – but it makes explaining unobservable phenomena, whatever they happen to be, problematic, to say the least. Still, for Amerrican scholars immersed in the field of foreign language learning in the post WW2 era, a field not exactly renowned  for its contributions to philosophy or scientific method, behaviourism had a lot going for it: an easily-grasped theory with crystal clear pedagogic implications. The opposition to the Chomskian threat was entirely understandable, but, historically at least, we may note that their case collapsed like a house of cards. Casti (1989) points out that a Kuhnian paradigm shift is nowhere more completely and swiftly brought about in the 20th century than by Chomsky in linguistics.

In his 1957 Verbal Behaviour, Skinner put forward his view that language learning is a  process of habit formation involving associations between an environmental stimulus and a particular automatic response, produced through repetition with the help of reinforcement. This view of learning was challenged by Chomsky’s (1959) Review of Skinner’s Verbal Behaviour, where he argued that language learning was quite different from other types of learning and could not be explained in terms of habit-formation. Chomsky’s revolutionary argument, begun in Syntactic Structures (1957), and consequently developed in Aspects of the Theory of Syntax (1965) and Knowledge of Language (1986) was that all human beings are born with an innate grammar – a fixed set of mental rules that enables children to create and utter sentences they have never heard before. Chomsky asserted that language learning was a uniquely human capacity, a result of Homo Sapiens’s possession of what Chomsky at first referred to as a Language Acquisition Device. Chomsky developed his theory and later claimed that language consists of a set of abstract principles that characterise the core grammars of all natural languages, and that the task of learning one’s L1 is thus simplified since one has an innate mechanism that constrains possible grammar formation.  Children do not have to learn those features of the particular language to which they are exposed that are universal, because they know them already.  The job of the linguistic was to describe this generative, or universal, grammar, as rigorously as possible.

So the lines are clearly drawn. For Skinner, language learning is a behavioural phenomenon, for Chomsky, it’s a mental phenomenon. For Skinner, verbal behaviour is the source of learning; for Chomsky it’s the manifestation of what had been learned. For Skinner, talk of innate knowledge is little short of gibberish; for Chomsky it’s the best explanation he can come up with for the knowledge children have of language.


In SLA Part 1, I described how, under the sway of a behaviourist paradigm, researchers in SLA viewed the learner’s L1 as a source of interference, resulting in errors. In SLA Part 2, I described how, under the new influence of a mentalist paradigm, researchers now viewed learners as drawing on their innate language learning capacity to construct their own distinct linguistic system, or  interlanguage. The view of learning an L2 changes from one of accumulating new habits while trying to avoid mistakes (which only entrench bad past habits), to one of a cognitive process, where errors are evidence of the learner’s ‘creative construction’ of the L2.  Research into learner errors and into learning specific grammatical features, gave clear evidence to support the mentalist view. The research showed that all learners, irrespective of their L1, seemed to make the same errors, which in turn supported the view that learners were testing hypotheses about the target language on the basis of their limited experience, and making appropriate adjustments to their developing interlanguage system. Far from being evidence of non-learning, errors were thus clear signs of interlanguage development.

Furthermore, and very importantly in terms of its pedagogic implications, interlanguage development, seen as a kind of built-in syllabus, could be observed following the same route, regardless of differences in the L1 or of the linguistic environment. It was becoming clear that (leaving aside the question of maturational constraints for a moment) learning an L2 involved moving along a universal route which was unaffected by the L1, or by the learning environment – classroom, workplace, home, wherever. Just as importantly, the research showed that L2 learning is not a matter of successively accumulating parts of the language one bit after the other. Rather, SLA is a dynamic process involving the gradual development of a complex system. Learners can sometimes take several months to fully acquire a particular  feature, and the learning process is anything but linear: it involves slowly and unsystematically moving through a series of transitional stages, including zigzags, u-shaped patterns, stalls, and plateaus, as learners’ interlanguages are constantly adjusted, reformulated, and rebuilt in such a way that they gradually approximate more to the L1 model.

A picture is thus emerging of SLA as a learning process with two important characteristics.

  1. Knowledge of the L2 develops along a route which is impervious to instruction, and
  2. it develops in a dynamic, nonlinear way, where lots of different parts of the developing system are being worked on at the same time.

As we continue the review, we’ll look at declarative and procedural knowledge, explicit and implicit knowledge, and explicit and implicit learning, and this will indicate the third important characteristic of the SLA process:

3. Implicit learning is the default mechanism for learning an L2.

We’ll then be in a stronger position to argue that teacher trainers who advise their trainees to devote the majority of classroom time to the explicit teaching of a sequence of formal elements of the L2 are grooming those trainees for failure.


For References See “Bibliography ..” in Header 

SLA Part 3: From Krashen to Schmidt

Developing a transition theory

What is the process of SLA? How do people get from no knowledge of the L2 to some level of proficiency? Before going on, I should make it clear that I’m only looking at psycholinguistic theories, thus ignoring important social aspects of L2 learning and, even within the realm of cognition, leaving out such factors as aptitude and motivation.

 Krashen’s Monitor Model

Krashen’s (1977a, 1977b, 1978, 1981, 1982, 1985) Monitor Model  came hard on the heels of Corder’s work, and contains the following five hypotheses:

The Acquisition-Learning Hypothesis.

Adults have two ways of developing L2 competence:

  1. via acquisition, that is, picking up a language naturally, more or less like children do their L1, by using language for communication. This is a subconscious process and the resulting acquired competence is also subconscious.
  2. via language learning, which is a conscious process and results in formal knowledge of the language.

For Krashen, the two knowledge systems are separate.”Acquired” knowledge is what explains communicative competence.  Knowledge gained through “learning” can’t be internalised and thus serves only the very minor role of acting as a monitor of the acquired system, checking the correctness of utterances against the formal knowledge stored therein.

 The Natural Order Hypothesis

The rules of language are acquired in a predictable way, some rules coming early and others late. The order is not determined solely by formal simplicity, and it is independent of the order in which rules are taught in language classes.

The Monitor Hypothesis

The learned system has only one, limited, function: to act as a Monitor.  Further, the Monitor cannot be used unless three conditions are met:

  1. Enough time. “In order to think about and use conscious rules effectively, a second language performer needs to have sufficient time” (Krashen, 1982:12).
  2. Focused on form “The performer must also be focused on form, or thinking about correctness” (Krashen, 1982: 12).
  3. Knowledge of the rule.

The Input Hypothesis

Second languages are acquired by understanding language that contains structure “a bit beyond our current level of competence (i + 1) by receiving “comprehensible input”.  “When the input is understood and there is enough of it, i + 1 will be provided automatically.  Production ability emerges.  It is not taught directly”  (Krashen, 1982: 21-22).

The Affective Filter Hypothesis

The Affective Filter is “that part of the internal processing system that subconsciously screens incoming language based on … the learner’s motives, needs, attitudes, and emotional states” (Dulay, Burt, and Krashen, 1982: 46). If the affective Filter is high, (because of lack of motivation, or dislike of the L2 culture, or feelings of inadequacy, for example) input is prevented from passing through and hence there is no acquisition.  The Affective Filter is responsible for individual variation in SLA (it is not something children use) and explains why some learners never acquire full competence.


The biggest problem with Krashen’s account is thatThere is no way of testing the Acquisition-Learning hypothesis: we are given no evidence to support the claim that two distinct systems exist, nor any means of determining whether they are, or are not, separate.  Similarly, there is no way of testing the Monitor hypothesis: with no way to determine whether the Monitor is in operation or not, it is impossible to determine the validity of its extremely strong claims. The Input Hypothesis is equally mysterious and incapable of being tested: the levels of knowledge are nowhere defined and so it is impossible to know whether i + 1 is present in input, and, if it is, whether or not the learner moves on to the next level as a result.  Thus, the first three hypotheses make up a circular and vacuous argument.  The Monitor accounts for discrepancies in the natural order, the learning-acquisition distinction justifies the use of the Monitor, and so on.

Further, the model lacks explanatory adequacy. At the heart of the model is the Acquisition-Leaning Hypothesis which simply states that L2 competence is picked up through comprehensible input in a staged, systematic way, without giving any explanation of the process by which comprehensible input leads to acquisition.  Similarly, we are given no account of how the Affective Filter works, of how input is filtered out by an unmotivated learner.

Finally, Krashen’s use of key terms, such as “acquisition and learning”, and “subconscious and conscious”, is vague, confusing, and, not always consistent.

In summary, while the model is broad in scope and is intuitively appealing, Krashen’s key terms are ill-defined, and circular, so that the set is incoherent. The lack of empirical content in the five hypotheses means that there is no means of testing them.  As a theory it has such serious faults that it is not really a theory at all.

And yet, Krashen’s work has had an enormous influence, and in my opinion, rightly so. While the acquisition / learning distinction is badly defined, it is, nevertheless, absolutely crucial to current attempts to explain SLA; all the subsequent work on implicit and explicit learning, knowledge, and instruction starts here, as does the work on interlanguage development. Since the questions of conscious and unconscious learning, and of interlanguage development are the two with the biggest teaching implications, and since I think Krashen was basically right about both issues, I personally see Krashen’s work as of enormous and enduring importance.

Processing Approaches

A) McLaughlin: Automaticity and Restructuring

McLaughlin’s (1987) review of Krashen’s Monitor Model is considered one of the most complete rebuttals offered (but see Gregg, 1984). In an attempt to overcome the problems of finding operational definitions for concepts used to describe and explain the SLA process, McLaughlin went on the argue (1990) that the distinction between conscious and unconscious should be abandoned in favour of clearly-defined empirical concepts.  McLaughlin substitutes the use of the conscious /unconscious dichotomy with the distinction between controlled and automatic processing. Controlled processing requires attention, and humans’ capacity for it is limited; automatic processing does not require attention, and takes up little or no processing capacity.  So, McLaughlin argues, the L2 learner begins the process of acquisition of a particular aspect of the L2 by relying heavily on controlled processing; then, through practice the learner’s use of that aspect of the L2 becomes automatic.

McLaughlin uses the twin concepts of Automaticity and Restructuring to describe the cognitive processes involved in SLA. Automaticity occurs when an associative connection between a certain kind of input and some output pattern occurs.   Many typical greetings exchanges illustrate this:

Speaker 1: Morning.

Speaker 2: Morning. How are you?

Speaker 1: Fine, and you?

Speaker 2: Fine.

Since humans have a limited capacity for processing information, automatic routines free up more time for processing new information. The more information that can be handled automatically, the more attentional resources are freed up for new information.  Learning takes place by the transfer of information to long-term memory and is regulated by controlled processes which lay down the stepping stones for automatic processing.

The second concept, restructuring, refers to qualitative changes in the learner’s interlanguage as they move from stage to stage, not to the simple addition of new structural elements.  These restructuring changes are, according to McLaughlin, often reflected in “U-shaped behaviour”, which refers to three stages of linguistic use:

  • Stage 1: correct utterance,
  • Stage 2: deviant utterance,
  • Stage 3: correct usage.

In a study of French L1 speakers learning English, Lightbown (1983) found that, when acquiring the English  “ing” form, her subjects passed through the three stages of U-shaped behaviour.  Lightbown argued that as the learners, who initially were only presented with the present progressive, took on new information – the present simple – they had to adjust their ideas about the “ing” form.  For a while they were confused and the use of “ing” became less frequent and less correct. TBelow is a diagram showing the same process for past tense forms:


McLaughlin suggested getting rid of the unconscious / conscious distinction because it wasn’t properly defined by Krashen, but in doing so he threw the baby out with the bathwater. Furthermore, we have to ask to what extent the terms “controlled processing” and “automatic processing” are any better; after all, measuring the length of time necessary to perform a given task is a weak type of measure, and one that does little to solve the problem it raises.

Still, the “U-shaped” nature of staged development has been influential in successive attempts to explain interlanguage development, and we may note that McLaughlin was, with Bialystock, among the first scholars to apply general cognitive psychological concepts of computer-based information-processing models to SLA research.  Chomsky’s Minimalist Program confirms his commitment to the view that cognition consists in carrying out computations over mental representations.  Those adopting a connectionist view, though taking a different view of the mind and how it works, also use the same metaphor.  Indeed the basic notion of “input – processing – output” has become an almost unchallenged account of how we think about and react to the world around us.  While in my opinion the metaphor can be extremely useful, it is worth making the obvious point that we are not computers.  One may well sympathise with Foucault and others who warn us of the blinding power of such metaphors.

Schmidt’s noticing hypothesis

Rather than accept McLaughlin’s advice to abandon the search for a definition of “consciousness”, Schmidt attempts to do away with its “terminological vagueness” by examining it in detail. His work has proved enormously influential, but I think there are serious problems with the “Noticing Hypothesis”, and that is has been widely misinterpreted in order to justify types of explicit instruction that are not actually supported by a more considered view of the evidence. I’ll deal with this in Part 4.

See Bibliography in Header for all references