The Enigma of the Interface

Trying to understand the process of SLA, some scholars have concentrated on the psychological aspects of learning. What goes on in the mind? How does a learner go from not knowing to knowing an L2? In this post, I discuss attempts to explain the psychological process of learning an L2 and the enigma of the interface between conscious and unconscious learning. Bill VanPatten said in his plenary at the BAAL 2018 conference that language teaching can only be effective if it comes from an understanding of how people learn languages. In my opinion, the question of the interface between implicit and explicit learning is vital to such an understanding; it has a direct bearing on the syllabuses, materials and assessment tools used in ELT programmes. If, as I’ll suggest, learning an L2 is predominantly an unconscious process which happens mostly when learners are focused on meaning, then current ELT syllabuses, materials and tests, which reject this conclusion, are bound to hinder rather than help efficacious teaching.  

Declarative and Procedural knowledge

One way of attempting to explain L2 learning from a psychological perspective is to make a distinction between declarative and procedural knowledge. The argument goes as follows. Unlike the knowledge required to know about geography or human anatomy for example, knowing how to use an L2 for communication relies on unconscious procedural knowledge: knowledge of how to use the language for communicative purposes. Conscious declarative knowledge about the language plays a very minor role. For all a learner’s declarative knowledge, if they lack the (procedural) knowledge needed to use the language in real time communicative events, they can do little with their declarative knowledge, except pass exams which test it. (This typical item from an English exam John — to Paris yesterday. A) goes; B) went; C) has gone tests declarative knowledge in much the same way as this item from a a geography exam: Paris is the capital of A) Belgium; B) France; C) Spain). Stories abound of primary and secondary school students in English speaking countries who were taught French for many years, passed successive exams in French based on declarative knowledge about the language, but failed to put their knowledge to effective use when they visited Paris. Their declarative knowledge – their knowledge about French – didn’t help them much when it came to using French to get things done in France. They lacked procedural knowledge.

This is, of course, a theory, a tentative explanation of (among other things) the curious phenomenon of people who know a lot about an L2 but can’t put it to practical use, and it uses the theoretical constructs of declarative and procedural knowledge to provide the explanation. But, of course, these theoretical constructs need fleshing out. Which brings us to Krashen’s Monitor Model. Krashen argues that we learn an L2 in much the same way as we learn our L1s as infants, and he relies on Chomsky’s Universal Grammar (UG) theory to explain early language learning. Chomsky’s UG theory is difficult for non-specialists to understand, but in the simplest terms, UG theory claims that all languages share universal features, and they vary in terms of certain parameters. Grammaticality judgement experiments involving very large numbers of children over a period of more than forty years demonstrate that children know a great deal more about the languages they use than can be explained by looking at the language they have been exposed to (the Poverty of the Stimulus (PoS) argument). Chomsky argues that the best explanation for children’s extraordinary ability to use language by the time they’re 12 years old, and for the profound, intricate knowledge that underlies that ability, is that humans are hard-wired for language learning: it’s part of human nature. Innate knowledge about the general structure of language allows young children to “bootstrap” language encountered in the environment. They respond to input not as “table rasas”, but rather as humans prepared for the job: input triggers the setting of parameters on the innate knowledge they already have of the deep structure of languages. I remain unconvinced by attempts in the last sixty years – chaos theories, connectionist models, emergentists theories – to answer the PoS argument, or to provide a better explanation than Chomsky’s, but let’s see.

Back to Krashen’s Monitor Model. Adults learn an L2 in much the same way as children learn languages, by exposure to “comprehensible input”, language that they are exposed to which they can broadly understand, even if there are unknown elements in it. They “acquire” language unconsciously, and what they learn consciously is metalinguistic knowledge – knowledge about the language – which is of extremely limited use. It’s perfectly possible to do without this metalinguistic knowledge, as the millions of people who settle in foreign countries and learn the new language without it attest. Here’s the model:

Given its insistence on the very limited role played by conscious learning, Krashen’s theory is an example of the “No interface” view  – there is a clear difference between implicit (unconscious) learning and explicit (conscious) learning. These types of learning go on in different parts of the mind; they hardly affect each other; and implicit learning is what matters. Krashen’s theory was heavily criticised, notable by Gregg (1984) and McLaughlin (1987) who both highlighted the weak constructs in the model, leading to circularity. Furthermore, most scholars considered that, whatever its merits, the model was too dismissive of the role explicit learning plays in L2 learning.

Next comes VanPatten’s Input Processing (IP) theory, which attempts to explain how learners turn input into intake by parsing input during the act of comprehension while their primary attention is on meaning. VanPatten’s model consists of a set of principles (see my blog post on Processing Input for a list of these principles) which  interact in working memory, taking account the fact that working memory has very limited processing capacity. Content lexical items are searched out first, since words are the principal source of referential meaning. When content lexical items and a grammatical form both encode the same meaning and when both are present in an utterance, learners attend to the lexical item, not the grammatical form. Perhaps the most important construct in the IP model is “Communicative value”: the more a form has communicative value, the more likely it is to get processed and made available in the intake data for acquisition, and it’s thus the forms with no or little communicative value which are least likely to get processed and, without help, may never get acquired. In this theory, the processing is mostly unconscious, but explicit attention to some aspects of the L2 is seen as helpful, so, IMO, the IP theory belongs in the Very Weak Interface camp.

William O’Grady proposes a ‘general nativist’ theory of first and second language acquisition where a modular acquisition device that does not include Universal Grammar is described. O’Grady says his work forms part of the emergentist rubric, but since he sees the acquisition device as a modular part of mind, he’s a long way from the real empiricists in the emergentist camp. Interestingly, O’Grady accepts that there are sensitive periods involved in language learning, and that problems adults face in L2 acquisition can be explained by the fact that adults have only partial access to the (non-UG) L1 acquisition device. O’Grady describes a different kind of processor, doing more general things, but it’s still a language processor and it’s still working not just on segments of voice streams, and words, but on syntax, and thus O’Grady conforms to the view that language is an abstract system governed by rules of syntax. Given O’Grady’s “partial access” view, I think his view also belongs in the Very Weak Interface camp.

Swain’s (1985) famous study of French immersion programmes led to her claim that comprehensible input alone can allow learners to reach high levels of comprehension, but their proficiency and accuracy in production will lag behind, even after years of exposure. Further studies gave more support to this view, and to the opinion that comprehensible input is the necessary but not sufficient condition for proficiency in an L2. Swain’s argument is that we must give more attention to output.

And now we come to Schmidt’s view that in order to learn an L2, we need to “notice” formal features of the input. This enormously influential – and enormously misinterpreted – hypothesis lies at the heart of what the heading of this post calls “the enigma of the interface”.

First, we have to appreciate that Schmidt uses the word “noticing” in a very special, technical way – it’s a theoretical construct not to be confused with the dictionary definition. “Noticing” has subsequently been used, citing Schmidt, in ways that Schmidt roundly rejected. See my post on Schmidt for a brief outline of how he came to form the construct and note Truscott’s (2015) remarks:

Perhaps more disturbing are efforts to use noticing as a theoretical foundation for grammar instruction in general, without concern for whether any given grammar point is or is not a legitimate object of noticing for Schmidt (e.g. R. Ellis, 1993, 1994, 1995; Long & Robinson, 1998; Nassaji & Fotos, 2004). A genuine application of Schmidt’s concept of noticing would have to take this issue seriously. Given the mismatch between the typically broad aims of grammar instruction and the relatively narrow scope of Schmidt’s noticing, it is perhaps not surprising that pedagogical approaches tend to be applications of noticing in name only.

 When others use the term ‘noticing’ and cite Schmidt as its source, they are claiming – whether the claim is explicit or not – that their use rests on this work, when in fact it typically does not. The result is a widespread belief that research and pedagogy in the area of L2 learning are now being guided by a firmly established concept, rooted in extensive review and analysis of research and theory in psychology. But this appearance is an illusion. The reality, whatever one thinks of Schmidt’s noticing, is that most of the relevant work is guided by nothing more than a loose, intuitive notion that consciousness is somehow important.

To the issue, then. Schmidt asks: Is it possible to learn formal aspects of a second language that are not consciously noticed? His answer, at least in his (1990) original version of the hypothesis, is “No”. Schmidt points to disagreement on a definition of “intake”. While Krashen seems to equate intake with comprehensible input, Corder distinguishes between what is available for going in and what actually goes in, but neither Krashen nor Corder explain what part of input functions as intake for the learning of form. Schmidt also notes the distinction Slobin (1985), and Chaudron (1985) make between preliminary intake (the processes used to convert input into stored data that can later be used to construct language), and final intake (the processes used to organise stored data into linguistic systems). Schmidt proposes that all this confusion is resolved by defining intake as:

that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently. If noticed, it becomes intake (Schmidt, 1990: 139).


subliminal language learning is impossible, and … noticing is the necessary and sufficient condition for converting input into intake (Schmidt, 1990:  130).

I hesitantly give Rod Ellis’ model of Schmidt’s view (hesitantly because it’s been challenged by many):

This is, obviously, a Strong Interface view, the complete opposite of Krashen’s: conscious knowledge of the grammar of the L2 is a necessary and sufficient condition for L2 learning.

I’ve written various posts on Schmidt’s work (see, particularly, a series of 3 posts on “Encounters with Noticing”, so here I’ll give a very quick summary of objections to it.  

Schmidt’s original hypothesis claimed that input can’t get processed without being “noticed”, and therefore all L2 learning is conscious. This claim is either trivially true, by adopting circular definitions of ‘conscious’ and ‘learning’, or obviously false. The claim that L2 learning is a process that starts with input going through a necessary stage in short-term memory where “language features” are “noticed” is untenable. It amounts to the claim that all language features in the L2 shuffle through short-term memory and if unnoticed have to re-present themselves. But how can language features present themselves, even once? As Gregg said in a comment on this blog:

Noticing is a perceptual act; you can’t perceive what is not in the senses, so far as I know. Connections, relations, categories, meanings, essences, rules, principles, laws, etc. are not in the senses.

Schmidt can’t expect us to accept that our knowledge of language is the result of “noticing” things from the environment that are presented to the senses because, quite simply, aspects of a language’s grammar, for example, are not there to be “noticed”. Furthermore, Ellis’s figure suggests that the 3 constructs on the top row of ‘noticing’, “comparing” and “integrating” are what turn input into output and explain IL development. But where is the noticing supposed to take place according to the figure? And what is “short/medium-term memory”?

In his 2010 paper, Schmidt confirms the concessions made in 2001, which amount to saying that ‘noticing’ is not needed for all L2 learning, but that the more you notice the more you learn. He also confirms that noticing does not refer to “noticing the gap”. However, the hypothesis remains unsatisfactory, for the following reasons:

1.The Noticing Hypothesis even in its weaker version doesn’t clearly describe the construct of ‘noticing’. There is no way to discern what is and isn’t “noticed”.

2. The empirical support claimed for the Noticing Hypothesis is not as strong as Schmidt (2010) claims.

3. A theory of SLA based on noticing a succession of forms faces the impassable obstacle that, as Schmidt (2010) seemed to finally admit, you can’t “notice” rules, or principles of grammar.

Gass: An Integrated Theory of SLA

Gass (1997), influenced by Schmidt, offers a more complete picture of what happens to input. She says it goes through stages of apperceived input, comprehended input, intake, integration, and output, thus subdividing Krashen’s comprehensible input into three stages: apperceived input, comprehended input, and intake. I don’t quite get “apperceived input”; Gass says it’s the result of attention, in the similar sense as Tomlin and Villa’s (1994) notion of orientation, and Schmidt says it’s the same as his noticing, which doesn’t help me much. In any case, once the intake has been worked on in working memory, Gass stresses the importance of negotiated interaction during input processing and eventual acquisition. I find the Gass model a rather unsatisfactory compilation of bits, but it suggests that L2 learning is predominantly a process of implicit learning and so takes a Weak interface stance.

Skills-based Theory

Skills-based theory also support the Strong Interface view. It is usually based on John Anderson’s (1983) ‘Adaptive Control of Thought’ model (a general learning theory, not a theory of SLA), which makes the distinction described above between declarative knowledge – conscious knowledge of facts; and procedural knowledge – unconscious knowledge of how an activity is done. When applied to instructed second language learning, the model suggests that learners should first be presented with information about the L2 (declarative knowledge ) and then they should practice using the information in various controlled and then more loosely controlled ways so that what they have consciously learned about the language is converted into unconscious knowledge of how to use the L2 (procedural knowledge). The learner moves from controlled to automatic processing, and through intensive linguistically focused rehearsal, achieves increasingly faster access to, and more fluent control over the L2 (see DeKeyser, 2007, for example).

The fact that nearly everybody successfully learns at least one language as a child without starting with declarative knowledge, and that millions of people learn additional languages without studying them (migrant workers, for example), challenges the claim that learning a language needs to begin with the imparting of declarative knowledge. Furthermore, the phenomenon of L1 transfer doesn’t fit well with a skill based approach, and neither do putative critical periods for language learning. But the main reason for rejecting such an approach is that it contradicts SLA research findings related to interlanguage development.

Selinker (1972) introduced the construct of interlanguages to explain learners’ transitional versions of the L2. Studies show that interlanguages exhibit common patterns and features, and that learners pass through well-attested developmental sequences on their way to different end-state proficiency levels. Examples of such sequences are found in morpheme studies; the four-stage sequence for ESL negation; the six-stage sequence for English relative clauses; and the sequence of question formation in German (see Hong and Tarone, 2016, for a review). Regardless of the order or manner in which target-language structures are presented in coursebooks, learners analyse input and create their own interim grammars, slowly mastering the L2 in roughly the same manner and order. Interlanguage (IL) development of individual structures has very rarely been found to be linear. Accuracy in a given grammatical domain typically progresses in a zigzag fashion, with backsliding, occasional U-shaped behavior, over-suppliance and under-suppliance of target forms, flooding and bleeding of a grammatical domain (Huebner 1983), and considerable synchronic variation, volatility  (Long 2003a), and diachronic variation. So the assumption that learners can move from zero knowledge to mastery of formal parts of the L2  one at a time and move on to the next item on a list is a fantasy. Explicit instruction in a particular structure can produce measurable learning. However, studies that have shown this have usually devoted far more extensive periods of time to intensive practice of the targeted feature than is available in a typical course. Also, the few studies that have followed students who receive such instruction over time (e.g., Lightbown 1983) have found that once the pedagogic focus shifts to new linguistic targets, learners revert to an earlier stage on the normal path to acquisition of the structure they had supposedly mastered in isolation and “ahead of schedule.”

Note that interlanguage development refers not just to grammar; pronunciation, vocabulary, formulaic chunks, collocations, sentence patterns, are all part of the development process. To take just one example, U-shaped learning curves can be observed in learning the lexicon. Learners have to master the idiosyncratic nature of words, not just their canonical meaning. While learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. Only by passing through a period of incorrectness, in which the lexicon is used in a variety of ways, can they climb back up the U-shaped curve.

Interlanguage development takes place in line with what Corder (1967) referred to as the internal “learner syllabus”. Students don’t learn different bits of the L2 when and how a teacher might decide to deal with them, but only when they are developmentally ready to do so. As Pienemann demonstrates (e.g., Pienemann, 1987) learnability (i.e., what learners can process at any one time), determines teachability (i.e., what can be taught at any one time).


Emergentism is an umbrella term referring to a range of usage-based theories which are fast becoming the new paradigm for psycholinguistic research. The return to this more “empiricist” view involves a discussion of the philosophy of science which I won’t go into here, although I discuss it at length in my book on Theory Construction and SLA (Jordan, 2004). It’s complicated! Anyway, “connectionist” and associative learning views are based on the premise that language emerges from communicative use, and that the process of L2 learning does not require resorting to any putative “black box” in the mind to explain it. A leading spokesman for emergentism is Nick Ellis (e.g., 1998, 2002, 2019), who argues that language processing is “intimately tuned to input frequency”. This leads him to develop a usage-based theory which holds that “acquisition of language is exemplar based”.

The power law of practice is taken by Ellis as the underpinning for his frequency-based account. Ellis argues that “a huge collection of memories of previously experienced utterances”, rather than knowledge of abstract rules, is what underlies the fluent use of language. In short, emergentists take language learning to be “the gradual strengthening of associations between co-occurring elements of the language”, and they see fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173).

Ellis is committed to a Saussurean view, which sees “the linguistic sign” as a set of mappings between phonological forms and communicative intentions. He claims that

simple associative learning mechanisms operating in and across the human systems for perception, motor-action and cognition, as they are exposed to language data as part of a communicatively-rich human social environment by an organism eager to exploit the functionality of language are what drives the emergence of complex language representations”.

My personal view, following Gregg (2003), is that combining observed frequency effects with the power law of practice, and thus explaining acquisition order by appealing to frequency in the input doesn’t go very far in explaining the acquisition process itself. What role do frequency effects have? How do they interact with other aspects of the SLA process? In other words, we need to know how frequency effects fit into a theory of SLA, because frequency and the power law of practice in themselves don’t provide a sufficient theoretical framework, and neither does connectionism. As Gregg points out “connectionism itself is not a theory; it is a method, and one that in principle is neutral as to the kind of theory to which it is applied” (Gregg, 2003: 55). Emergentism stands or falls on connectionist models and so far the results are disappointing. A theory that will explain the process by which nature and nuture, genes and the environment, interact without recourse to innate knowledge, remains “around the corner”, as Ellis admits.

So where do we put emergentist theories when it comes to the interface riddle? Without doubt, they belong in the Very Weak Interface camp. They argue that language learning is an essentially implicit process, and that the role of explicit learning is a minor one. For example, Nick Ellis uses a weak version of Schmidt’s Noticing hypothesis to argue that in Instructed SLA, drawing students’ attention to certain non-salient or infrequent parts of the input can “reset the dial” (the dial set by their L1s) and thus enable further implicit learning. I note that Mike Long agrees with this view, to which, in later work he contributed. Mike’s untimely passing earlier this year prevented us from resolving our differences.

Carroll: Autonomous Industion

Finally we come to those who challenge the basis of the input -> processing-> output model, and in particular the Noticing hypothesis. Truscott and Sharwood Smith (2004) propose a MOGUL framework, and Carroll (2001) offers her Autonomous Induction theory. Both rely heavily on Jackendoff’s (1992) Representational Modularity Theory (see my post of Jakendoff’s place in Carroll’s theory). A few words on Carroll’s work.

Caroll challenges the basis of Krashen’s and subsequent scholar’s theories. She sees input as physical stimuli, and intake as a subset of this stimuli.

The view that input is comprehended speech is mistaken. Comprehending speech happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards! (Carroll, 2001, p. 78).

So, says Carroll, language learning requires the transformation of environmental stimuli into mental representations, and it’s these mental representations which must be the starting point for language learning. In order to understand speech, for example, properties of the acoustic signal have to be converted to intake; in other words, the auditory stimulus has to be converted into a mental representation.

 “Intake from the speech signal is not input to leaning mechanisms, rather it is input to speech parsers. … Parsers encode the signal in various representational formats” (Carroll, 2001, p.10).

Carroll gives a detailed examination of what happens to environmental stimuli by appeal to Jackendoff’s theory, which I discuss in a post on his contribution to Carroll’s theory. I also discuss Carroll’s theory in a number of posts (use the Search box) . Suffice it to say here that Carroll’s and Truscott & Sharwood Smith’s theories both agree, in their different ways, that L2 learning is predominantly a matter of implicit learning, and that they belong in the Very Weak Interface camp.


Either Krashen or Schmidt is right; at least one of them is wrong. Either Krashen or Nick Ellis is right; ditto. Either N. Ellis or Carroll is right; ditto. And on and on. However, when it comes to deciding on the interface enigma, only Schmidt and skills-based theory take a Strong Interface view. What we can conclude is that whether they base themselves on some version of innate knowledge at work in language knowledge, or they rely on more simple and general learning mechanism working on input from the environment, SLA scholars agree that learning an L2 depends mostly on implicit learning. The implication is that ELT based on following a synthetic syllabus, and thus giving prime place to teaching explicit knowledge about the language, leads to inefficacious classroom practices.   


Changing Tack

Anderson’s new blog post The myth of a theory-practice gap in education makes the following argument:

1. When researchers and academics talk about a theory-practice gap in education, including language teaching, what they are usually referring to is a gap between their beliefs concerning how teachers should teach, and how teachers actually do teach.

2. Research on teacher cognition has established beyond reasonable doubt (e.g., Borg, 2006; Woods, 1996) that all teachers also have theories, either explicit, espousable ones, or the implicit “theories in use” that govern our actions (Argyris & Schön, 1974).

3. Thus, the notion of a theory-practice gap is a myth. There is no theory-practice gap, just a gap between the beliefs of practitioners in two very different communities of practice: academics and teachers.

This all looks very obvious, but what’s the point of it? Well, the point seems to be to reassure teachers that they shouldn’t worry if academics’ theories challenge their teaching practices. If this is the point, then I suggest that it’s wrong and, in any case, Anderson’s remarks do nothing to address the ongoing issue of what teachers can learn from SLA research. Anderson says that researchers and academics are usually referring to a gap between their beliefs concerning how teachers should teach, and how teachers actually do teach, but that’s not actually true; researchers say little about any such gap. To “prove” that the notion of a theory-practice gap is a myth by making it about teaching practice and then pointing to teachers’ own theories is an empty bit of rhetoric which fails to address what is, in fact, the very important gap between what academics know about language learning and what teachers know. Current ELT practice is dominated by the use of coursebooks which implement a synthetic syllabus where the L2 is treated as an object of study, and where the focus is on learning about the language. Such practices contradict the wide consensus among academics that learning an L2 is predominantly a matter of implicit learning; learning by doing, learning by engaging in meaningful use of the language. The clear implication is that current coursebook-driven ELT is inefficacious and that analytical syllabuses, such as those used in TBLT, Dogme and certain types of CLIL are more efficacious.

This is the issue that Anderson ignores. I have argued in many posts on this blog that ELT has become commodified because of the enormous commercial interests involved. These commercial interests, I suggest, explain why the gap between what most teachers know about the way people learn an L2 and what academics know is so wide. Teacher educators (who are often coursebook writers) have vested interests in coursebooks, teacher education programmes like CELTA and high stakes exams like IELTS. They are naturally biased against research findings which challenge the foundations of their approach to ELT, and they use the disagreements among academics to minimise the importance of research findings. Yet, despite all their disagreements, academics studying SLA agree on the fundamental principles which underly L2 learning. While academics would not dare to tell teachers how they should implement a syllabus or make ongoing peddagogic decisions during a lesson, the core findings of SLA research which explain the very special process of learning an L2 can – and IMO should – be taken into account when designing courses, materials, and tests in ELT.

Anderson has previously published a number of articles which cite SLA research in attempts to defend current ELT practice. One such article is Anderson (2016) “Why practice makes perfect sense: the past, present and potential future of the ppp paradigm in language teacher education”  I replied to his article with a post “Does PPP really make perfect sense?” Anderson here changes tack in his ongoing role as defender of the faith. His new message to teachers is: Never mind about the academics’ theories, use your own and just keep reciting this empty syllogism: academics have theories about education, teachers have theories about education, thus, the notion of a theory-practice gap is a myth.