Tom the Teacher Part 5

Stencil: Hi Tom. Your friendly DoS just checking in. We have a good Skype connection, for once.

Tom: Hi Stencil.

Stencil: Who’s in the picture behind you?

Tom: Rosa Luxemburg.

Stencil: One of your lefty heroes?

Tom: She was shot and thrown into a Berlin canal in 1919 for saying the wrong thing.

Stencil: Wow. What did she say?

Tom: She said “Today we can seriously set about destroying capitalism once and for all. Nay, more; not merely are we today in a position to perform this task, nor merely is its performance a duty toward the proletariat, but our solution offers the only means of saving human society from destruction”.

Stencil: What? Well, anyway, how did the Zoom session go?

Tom: OK.  

Stencil: Did everybody turn up?

Tom: At one point there were eight students, plus kids, servants, cats and dogs.

Stencil: Not bad. First impressions?

Tom: Cats really don’t give a shit, do they?

Stencil: I mean the lesson. How did it go? Did you manage to get thru the whole Unit?

Tom: Unit?

Stencil: Unit 4, Watchwords; Not Quite Intermediate, you know. We’re nearly half way thru the course now, Tom, and the second test looms.  

Tom: Well actually, the dog ate my copy of the book yesterday.

Stencil: You have a dog?

Tom: The neighbour’s dog.

Stencil: Jeez. So what did you do?

Tom: We just sort of shot the breeze, you know. Living in lockdown, new stuff on Netflix, …

Stencil: You shot the breeze? You’re supposed to do reported speech in this unit.

Tom: Sorry?

Stencil: Reported speech, Tom, for pity’s sake. A key component in the course. I mean, you know how many of our students want to get jobs in admin.

Tom: He told me that if I turned left, right, then left again I would find the coffee machine?  

Stencil: That’s not in the text. It’s ‘Hi! My name is Boonsri. I’m from Bangkok. She told me that her name was Boonsri and that she came from Bangkok’. Important to positively bias female gender, you know, and foreign places.    

Tom:  Right.

Stencil: Any technical problems?

Tom: We didn’t really get the hang of the mute button, so there was a lot of kind of extraneous stuff. I heard somebody say “Joder! Donde esta mi cripy tio?”

Stencil: Great! That’s a really good opportunity for some negotiation of meaning work, Tom. No problem with a bit of L1 in the mix, right? Step in when that happens, reformulate, recast, clarification routines, you know?

Tom: Yeah. I think she found her stash in the sofa.

Stencil: Right. Did the video recording from Unit 4 go OK?

Tom: I tried to play it but the share screen showed rather personal bits of my emails.

Stencil:  And?

Tom: Montse asked “What does ‘cottaging’ mean?”

Stencil: Great! Perfect opportunity to use the 3 minute ‘Expand vocabulary and collocations’ slot. Charming cottage; typical semi-detached house, nice flat, neat loft, yeah? And how did the break out groups go?

Tom: I’m not sure. What are break out groups?

Stencil: Oh come on Tom. Break out groups. Group work. After the Marker Sentence ‘She told me that her name was Boonsri’, you put them in groups, they ask “What’s your name? Where are you from?” and then report their replies to each other. It’s all in the Teacher’s book, you just have to adapt it to the online context man. Even allow a bit of free practice while you take notes, right?

Tom: Right.

Stencil: Well anyway, we’re all meeting up for a teachers ‘How’s it going?’ webinar tomorrow, 3 am Eastern States time, and we can take this further. We want to examine incorporating non native teacher awareness and social distancing into the supplementary online materials. OK?

Tom: Sorry, Stencil, my Mum’s on the phone.

Stencil: Tom?  Can you hear me Tom?        


Last  September, Neil McMillan and his lovely wife and daughter spent the weekend with us at our house. The house is described by Dellar as “a huge, decadent, sprawling mansion bought by Geoff’s billionaire wife for a song from gullible locals, full of rare paintings by Spanish artists like Rubens and Caravaggio, looked after by scores of starving slaves smuggled in from Brixton, where many of my friends live, yeah?”. It is, in fact, a restored ‘casa rustica’, which offers guests a comfortable bed and a shower down the corridor. Neil gets a bedroom with an en suite bathroom, but that’s because he sweats a lot.

Anyway, during the weekend, Neil and I had the chance to talk into the wee hours (as he insists on calling them) about my favourite subject: the life of the profane. The profane are outsiders: those who, whatever tribe, clan, religious grouping or nation they’re supposed to be part of, just don’t “get” what belonging means. One important part of their dislocation is their relationship with the inanimate world. Somehow, not getting the nuances of social norms, not respecting them because they don’t make sense (a far cry from those dedicated rebels who early on reject them) has a profound effect on how they walk through life. Existential literature concentrates on “being before essence” – we make ourselves up as we go along – but it doesn’t pay enough attention to existence when it comes to dealing with the “things” that we interact with.  I’ve always been fascinated by the way some people waltz through life, effortlessly engaging with the inanimate world; walking down stairs hardly bothering with the banisters; nonchalantly catching the piece of toast that pops up from the toaster; pushing, never pulling, the doors that need pushing; stepping adroitly onto buses and out of taxis;  slotting their credit cards the right way up into the right slot; pressing the right buttons in lifts; and all that and all that. Generally, they stroll along unaware of obstacles: they automatically turn the right key in the right way in the right lock, so to speak.

Compare that to the life of the profane: those whose lives are marked by exactly the opposite experience of daily life. It’s not just a question of being clumsy, it’s that the inanimate world seems to conspire against them. An  extreme example is Inspector Jacques Clouseau, he of the Pink Panther films. When Clouseau walks down the stairs, his sleeve gets caught in the banister; when he tries to catch the toast, he misses; when he uses a cigarette lighter he sets himself on fire; when he pushes the door, it smacks him in the face, and on and on. He turns the wrong key in the wrong way in the wrong lock. The inanimate world is out to get him: he’s the constant ‘fall guy’, the victim, the unfairly done to, one might say.

Another good example is Benny Profane, the hero of Thomas Pynchon’s novel “V”. He’s not called “Profane” for nothing (Pynchon is never in want of a name for his characters): he’s called Profane because he’s not on the inside, he’s not in the know, he’s his own hopeless, honest self, not finely-tuned enough to the way society works. So he’s the perfect vehicle to walk through Pynchon’s marvellous novel; who better to stumble through everything that happens, an innocent non protagonist if ever there was one. And an essential feature of his character is his constant bumping up against the inanimate world as if it were hostile, though no silly conspiracy theories are ever invoked. The inanimate world is constantly waiting to play trivial or life-threatening tricks on him; lamp posts are badly placed, well made beds collapse, phones don’t work;  buses aren’t there when they should be; street signs point the wrong way; numbers on houses are out of synch. A great scene in “V” is when Benny, standing in an empty street, annoyed at something, kicks the wheel of a car. “I’ll pay for that”, he says to himself.

Profanity is described in dictionaries as  ‘bad language’, but its etymology goes back to ‘lack of respect for things that are held to be sacred’. And there’s the clue. Profanity, the thing that Neil and I wanted to discuss that night, is better described as dislocation, an inability to  “get”  what this respect for sacred things is all about. Never articulated, it stems from an inability to come to terms with the ways things are. Why does the social world we live in pretend to respect so many things that it so obviously flouts? Why is our society so horrendously hypocritical? Why does a third of the world’s population live in such horrendous conditions? Why … well, you get the idea – although Inspector Clouseau and Benny don’t.

Of course, in any political sense, the vast majority of the world’s population is profane –  outside the fold – and that, no doubt, should be the focus of our attention. In psychological terms, Neil’s heroes – Foucault, Derrida, and Lacan particularly – insist on profanity (the rejection of respect) when examining how individuals experience their lives emotionally and intellectually. Lacan returns to Freud, but famously does something which strikes me as similar to what Marx did to Hegel. (I know a bit about what Marx did to Hegel, but I know as much about Lacan as Neil has forgotten while eating a deep fried Mars bar, so it’s probably all bollocks, and I hope he’ll reply.) Lacan’s Mirror stage claims that the ego is an object rather than a subject: the ego is not the seat of a free, true “I” determining its own fate, rather it’s neurotic to its very core. The ego is something exterior – crystallized as “the desire of the Other”. Amen to that.  

I take this to be one of many theories of alienation – which have in common that we are, as it were, besides ourselves, lacking authenticity. My favourite attempt among philosophers to “explain” this has always been Camus’; the least philosophically sophisticated, the most appealing somehow (a bit like Krashen’s theory of SLA, maybe!). Alienation is our biggest problem, and to get over it, we need to live in societies described best by anarchists, which means we need a revolution which overturns the State.  

Meanwhile, what about the particular manifestation of alienation that Neil and I were talking about, that profane, awkward, annoying bumping up against the inanimate world? How can we negotiate the inanimate world more smoothly? How can we avoid so many infuriating encounters with the stuff around us? How can we avoid our sleeves getting snagged on banisters? How can we nonchalantly walk through those revolving doors? How can we turn the right key the right way in the right lock?  Only revolution will do it. We can’t be who we want to be while capitalism rules us. But maybe we can learn from Eastern stuff – Zen and all that. The Beatles’ “Let it be” is probably the most stupid song ever sung, but Zen and Taoist texts are full of good advice.  I think of things like “Saunter along and stop worrying”… “If the mind troubles the mind how can you avoid a great confusion”, which I’m sure are misquotes. They suggest that we can alter our behaviour, put the right key in the right door because we don’t care or something. And maybe, just maybe, challenge Lacan’s view of us.    

I hope my chum Neil will respond.       

Carroll’s AIT: Part 5

I’m aware that I haven’t done a good job of describing Carroll’s AIT. Last week, I bought a Kindle version of Sharwood Smith and Truscott’s (2014) The Multilingual Mind  (currently on offer for 19 euros – hurry, hurry) which presents their MOGUL (Modular On Line Growth and Use of Language) theory, and I’m very inpressed with its coherence, cohesion and readability. It relies partly on Jackendoff, and briefly describes Carroll’s AIT much more succinctly than I’ve managed. I highly recommend the book,  

I’ll continue examing bits of AIT and its implications, before trying to make sense of the whole thing and reflect on some of the disagreements among those working in the field of SLA. In this post, I’ll look at Carroll’s AIT in order to question the use by many SLA theories of the constructs of input, intake, and noticing.  

Recall that Jakendoff’s system comprises two types of processor:

  • integrative processors, which build complex structures from the input they receive, and
  • interface processors, which relate the workings of adjacent modules in a chain.

The chain consists of three main links: phonological, syntactic, and conceptual/semantic structure, each level having an integrative processor  connected to the adjacent level by means of an interface processor.

Carroll takes Jackendoff’s theory and argues that input must be seen in conjunction with a theory of language processing: input is the representation that is received by one processor in the chain.


 Thus, Carroll argues, the view of ‘input from outside’ is mistaken: input is a multiple phenomenon where each processor has its own input, which is why Carroll refers to ‘environmental stimului’ to denote the standard way in which ‘input’ is seen. Stimuli only become input as the result of processing, and learning is a function not of the input itself,  but rather of what the system does with the stimuli. In order to explain SLA, we must explain this system. Carroll’s criticism is that the construct ‘input’ is used in many theories of SLA as a cover term which hides an extremely complex process, beginning with the processing of acoustic (and visual) events as detected by the learner’s sensory processing mechanisms.

Carroll’s view is summarised by Sharwood Smith and Truscott, 2014, p. 212) as follows: 

The learner initially parses an L2 using L1 parsing procedures and when this inevitably leads to failure, acquisition mechanisms are triggered and i-learning begins. New parsing procedures for L2 are created and compete with L1 procedures and only win out when their activation threshold has become sufficiently low. These new inferential procedures, adapted from proposals by Holland et al. (1986), are created within the constraints imposed by the particular level at which failure has taken place. This means that a failure to parse in PS [Phonological Stuctures], for example, will trigger i-learning that works with PS representations that are currently active in the parse and it does so entirely in terms of innately given PS representations and constraints, hence the ‘autonomous’ characterisation of AIT (Holland et al. 1986, Carroll 2001: 241–2).

Of course, the same process described for PS is applied to each level of processing.

Both parsing, which takes place in working memory, and i-learning, which draws on long-term memory, depend, to some extent on innate knowledge of the sort described by Jackendoff, where the lexicon plays a key role, incorporating the rules of grammar and encoding relationships among words and among grammatical patterns.

Jackendoff’s theory supposes that just about everything going on in the language faculty happens unconsciously – it’s all implicit learning – and Carroll’s use of Holland et al.’s theory of induction is similarly based on implicit learning.     

So what does all this say about other processing theories of SLA?

Here’s Krashen’s model:

Any comprehensible input that passes through the affective filter, gets processed by UG and becomes acquired knowledge. Learnt knowledge acts as a monitor. Despite the fact that it has tremendous appeal, the model is unsatisfactory because  none of these stages is described by clear constructs, and the theory is hopelessly circular. McLaughlin (1978) and Gregg (1984) provide the best critique of this model.

Then we have Schmidt’s Noticing Hypothesis

Here again, input is never carefully defined – it’s just the language that the learner hears or reads. What’s important, for Schmidt, is “noticing”. This is a gateway to “intake”, defined as that part of the input consciously registered and worked on by the learner in “short/medium-term memory (I take this to be working memory) and which then gets integrated into long-term memory, where it develops the interlanguage of the learner. So noticing is the necessary and sufficient condition for L2 learning.

I’ve done several posts on Schmidt’s Noticing Hypothesis (search for them in the Search bar on the right), so here let me just say that it’s now generally accepted that ‘noticing’, in the sense of conscious attention to form, is not a necessary condition for learning an L2: the hypothesis is false. The ammended, much weaker version, namely that  “the more you notice the more you learn” is a way of rescuing the hypothesis, and has been, in my opinion, too quickly accepted by SLA scholars.  I’m personally not convinced that even this weak version can be accepted; it needs careful refinement, surely. In any case, Schmidt’s model makes the same mistake as Krashen’s: in starting with an undefined construct of input, it puts the cart before the horse. (Note that this has nothing to do with Scott Thornbury’s judgement on UG, as discussed in an earlier post.) As Carroll says, we must start with stimuli, not input, and then explain how those stimuli are processed.

Finally, there’s Gass’s Model (1997), which offers a more complete picture of what happens to ‘input’.

Gass says that input goes through stages of apperceived input, comprehended input, intake, integration, and output, thus subdividing Krashen’s comprehensible input into three stages: apperceived input, comprehended input, and intake. Gass stresses the importance of negotiated interaction in facilating the progress from apperceived input to comprehended input, adopting Long’s construct of negotiation for meaning which refers to what learners do when there’s a failure in communicative interaction. As a result of this negotiation, learners get more “usable input”, they give “attention” (of some sort) to problematic features in the L2, and make mental comparisons between their IL and the L2 which leads to refinement of their current interlanguage.  

But still, what is ‘apperceived input’?  Gass says it’s the result of ‘attention’, akin to  Tomlin and Villa’s (1994) construct of ‘orientation’; and Schmidt says it’s the same as his construct of ‘noticing’. So is it a concious process, then, taking place in working memory? Just to finish the story, Long, in Part 4 of this exploration of AIT says this:  

Genuinely communicative L2 interaction provides opportunities for learners focused on meaning to pick up a new language incidentally, as an unintended by-product of doing something else — communicating through the new language — and often also implicitly, i.e., without awareness. Interacting in the L2 while focused on what is said, learners sometimes perceive new forms or form-meaning-function associations consciously — a process referred to as noticing (Schmidt, 1990, 2010). On other occasions, their focus on communication and attention to the task at hand is such that they will perceive new items in the input unconsciously — a process known as detection (Tomlin & Villa, 1994). Detection is especially important, for as Whong, Gil, & Marsden (2014) point out, implicit learning and (barring some sort of consciousness-raising event)the end-product, implicit knowledge, is what is required for real-time listening and speaking.  

Long makes a distinction betwween ‘implicit’ and ‘incidental’ learning. ‘Implicit’ means unconscious, while ‘incidental’, I think, means conscious, and refers to Schmidt’s ‘noticing’.  Long then says that ‘implicit’ learning, whereby “learners perceive new items in the input unconsciously” and is explained by “a process known as detection (Tomlin & Villa, 1994)”, is “especially important”, because “implicit knowledge is what is required for real-time listening and speaking”.

So here we have yet another construct: ‘detection’. ‘Detection’ is the final part of Tomlin and Villa’s  (1994) three-part process of ‘attention’.  Note first that they claim that ‘awareness’( defined as “the subjective experience of any cognitive or external stimulus,” (p. 194), and which is the crucial part of Schmidt’s ‘noticing’ construct)  can be dissociated from attention, and that awareness is not required for attention. With regard to attention, three functions are involved: alertness, orientation, and detection.

Alertness = an overall, general readiness to deal with incoming stimuli or data.

Orientation = the attentional process responsible for directing attentional resources to some type or class of sensory information at the exclusion of others. When attention is directed to a particular piece of information, detection is facilitated.

Detection = “the cognitive registration of sensory stimuli” and is “the process that selects, or engages, a particular and specific bit of information” (Tomlin & Villa, 1994, p. 192). Detection is responsible for intake of L2 input: detected information gets  further processing.

Gass claims that “apperceived input”  is conscious, the same as Tomlin and Villa’s ‘orienation’, but is Gass’s third stage ‘comprehended input’ the same as ‘Tomlin and Villa’s ‘detection’?  Well, perhaps. ‘Comprehended input’ is “potential intake”  – it’s information which has the possibility of being matched against existing stored knowledge, ready for the next stage, ‘integration’ where the new information can be used for confirmation or reformulation of existing hypotheses. However, if detection is unconscious, then comprehended input is also unconscious, but Gass insists that comprehended input is partly the result of negotiation of meaning, which, Long insists involves not just detection but also noticing.     

This breaking down of the construct of attention into more precise parts is supposed to refine Schmidt’s work. Schmidt starts with the problem of conscious versus uncoscious learning, and breaks ‘consciousness’ down into 3 parts: consciousness as awareness; consciousness as intention; and consciousness as knowledge. As to awareness, Schmidt distinguishes between three levels: Perception, Noticing and Understanding, and the second level, ‘noticing’, is the key to Schmidt’s eventual hypothesis. Noticing is focal awareness. Trying to solve the problem of how ‘input’ becomes ‘intake’, Schmidt’s answer is crystal clear, at least in its initial formulation: ‘intake’ is “that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139)”. And the hypothesis that ‘noticing’ drives L2 learning is plain wrong.

Tomlin and Villa want to recast Schmidt’s construct of noticing as “detection within selective attention”.

Acquisition requires detection, but such detection does not require awareness. Awareness plays a potential support role for detection, helping to set up the circumstances for detection, but it does not directly lead to detection itself. In the same vein, the notion of attention to form articulated by VanPatten (1989, 1990, in press) seems very much akin to the notion of orientation in the attentional literature; that is, the learner may bias attentional resources to linguistic form, increasing the likelihood of detecting formal distinctions but perhaps at the cost of failing to detect other components of input utterances. Finally, input enhancement in instruction, the bringing to awareness of critical form distinctions, may represent one way of heightening the chances of detection. Meta- descriptions of linguistic form may help orient the learner to salient formal distinctions. Input flooding may increase the chances of detection by increasing the opportunities for it” (Tomlin and Villa, 1994, p. 199).

Well, as far as as forming part of a coherent part of a theory of SLA is concerned, I don’t think Tomlin and Villa’s treatment of attention stands up to scrutiny, for all sorts of reasons, many of them teased out by Carroll. Nevertheless, the motivation for this detailed attempt to understand attention, apart from carrying on the work of refining processing theory, is clearly revealed in the above quote: what’s being proposed is that L2 learning is mostly implicit, but that this implicit learning needs to be supplemented by occasional, crucial, conscious attention to form, which triggers ‘orienation’ and enables ‘detection’. An obvious pay off is the improved efficaciousness of teaching! And that, I think, is at the heart of Mike Long’s view  – and of Nick Ellis’ , too.

But it doesn’t do what Carroll (2001, p. 39) insists a theory of SLA should do, namely give

  1. a theory of linguistic knowledge;
  2. a theory of knowledge reconstruction;
  3. a theory of linguistic processing
  4. a theory of learning.  

When I look (yet again!) at Chapter Three of Long’s (2015) book SLA & TBLT, I find that his eloquently described  “Cognitive-Interactionist Theory of SLA” relies on carefully selected “Problems and Explanations”. It’s prime concern is “Instructed SLA”, and it revolves around the problem of why most adult L2 learning is “largely unsuccessful”. It’s not an attempt to construct a full theory of SLA, and I’m quite sure that Long knew exactly what he was doing when he confined himself to articulating his four problems and eight explanations. Maybe this also explains Mike’s comment in the recent post here:

I side with Nick Ellis and the UB (not UG) hordes. Since learning a new language is far too large and too complex a task to be handled explicitly, and although it requires more input and time, implicit learning remains the default learning mechanism for adults.

This looks to me like a good indication of the way things are going.

“Do you fancy a bit more wine?” my wife asks, proffering a bottle of chilled 2019 Viña Esmeralda (a cheeky, very fruity wine; always get the most recent year). “Is the Pope a Catholic?” says I, wishing I had Neil McMillan’s ready ability to come up with a more amusing quote from Pynchon.      


Carroll, S. (1997) Putting ‘input’ in its proper place. Second Language Research 15,4; pp. 337–388.

Carroll, S. (2001) Input and Evidence. Amsterdam, Bejamins.

Gass, S. (1997)  Input, Interaction and the Second Language Learner. Marwash, N. J. Lawrence Erlbaum Associates.

Gregg, K. R. (1984) Krashen’s monitor and Occam’s razor. Applied Linguistics 5, 79-100.

Krashen, S. (1985) The Input Hypothesis: Issues and Implications. New York: Longman.

Long, M. H. (2015). Second language acquisition and Task-Based Language Teaching. Oxford: Wiley-Blackwell.

McLaughlin, B. (1987) Theories of Second Language Learning.  London: Edward Arnold.

Schmidt, R. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129-58.

Schmidt, R. (2001) Attention.  In Robinson, P. (ed.) Cognition and Second Language Instruction.  Cambridge: Cambridge University Press, 3-32.

Sharwood Smith, M., & Truscott, J. (2014). The Multilingual Mind. In The Multilingual Mind: A Modular Processing Perspective . Cambridge: Cambridge University Press.

Tomlin, R., & Villa, H. (1994). Attention in cognitive science and second language acquisition. Studies in Second Language Acquisition, 16, 183-203.

Scott Thornbury on UG and UB Theories of SLA

This is a parenthesis. I’d planned to devote Part 4 of ‘Carroll’s AIT’ to assessing how successfully it bridges the gap between nativist and emergentist views of SLA, but I pause so as to answer Scott Thornbury’s question on Twitter (18th May), which amounts to asking why a theory of SLA has to start by describing what it wants to explain.

Really! I mean, you’d have thought a bright, well-read chap like Scott already knew, right? But he doesn’t, and the problem is that he’s a very influential voice in ELT, likely to have a potentially damaging influence on his legions of followers, some of whom might not detect when – just now and then – he talks baloney. Chomsky’s work seems to be a particular blind spot for Scott; he has a very bad record indeed when it comes to writing and talking about Universal Grammar. I’m told by my friend Neil that he’s not much better when it comes to appreciating Lacan’s paradigm shifting critique of Breton’s Surrealist Manifestos, either (see McMillan Mirroring Despair Among the ELT Ungodly, in press). Nemo Sine Vitiis Est, eh what.

The Question

In Part 3, resuming the story so far, I said “A theory of SLA must start from a theory of grammar”. Soon afterwards, Scott Thorbury tweeted, on that highest, silliest horse of his, which, thankfully, he reserves for his discussion of Chomsky:

“A theory of SLA must start from a theory of grammar.” Why? Who said? I’d argue the reverse ‘A theory of grammar must start from a theory of [S/F]LA’. Grammar is the way it is because of the way languages are learned and used.

Other gems in his tweets included:

The theory comes later. Induction, dear boy. Empirical linguistics, by another name.

Chomsky’s error was to start with a theory of grammar and then extrapolate a theory of language acquisition to fit. Cart before horse.

“The theory of UG…is an innate property of the human mind. In principle, we should be able to account for it in terms of human biology.” In principle, maybe. In practice, we can’t. That’s cart-before-horsism.

UG a powerful theory?’ Based on made-up and often improbable data? Incapable of explaining variability & change? Creationism is a powerful theory, too – if you ignore the fossil evidence.

Luckily, I was sitting in a comfortable chair, joyfully making my way through a crate of chilled, cheeky young Penedes rosé wines (delivered to me by an illegal supplier using drones “lent” to him by Catalan extremists intent on breaking the Madrid inspired lockdown), when I read these extraordinary remarks. Otherwise, I might have stopped breathing. Anyway, let’s try to answer Scott’s tweets.

White (1996: 85) points out:

A theory of language acquisition depends on a theory of language. We cannot decide how something is acquired without having an idea of what that something is.

Carroll (2001) and Gregg (1993) agree: a theory of SLA has to consist of two parts:

1) the explanandum: a property theory which describes WHAT is learned. It describes what the learner knows; what makes up a learner’s mental grammar. It consists of various classifications of the components of the language, the L2, and how they work together.

2) the explanans: a transition theory which explains HOW that knowledge is learned.

Property Theories

So WHAT is human language? I’ve dealt with this in a previous post, so suffice it to say here that it’s a system of form-meaning mappings, where meaningless elementary structures of language combine to make meaning; the sounds of a language combinine to make words, and the words combine to make sentences. There are various property theories,  but let’s focus on just two: UG and Construction Grammar, the second being the description of language offered by emergentists, who take a Usage-based (UB) appproach to a theory of SLA.  


Chomsky’s model of language distinguishes between competence and performance; between the description of underlying knowledge, and the use of language. Chomsky refers to the underlying knowledge of language which is acquired as “I-Language”, and distinguishes it from “E-Language”, which is everyday speech – performance data of the sort you get from a corpus of oral texts.

“I-Language” obeys rules of Universal Grammar, among which are structure dependency, C-command and government theory, and binding theory. These are among the principles of UG grammar and they operate with certain open parameters which are fixed as the result of input to the learner. As the parameters are fixed, the core grammar is established. The principles are universal properties of syntax which constrain learners’ grammars, while parameters account for cross-linguistic syntactic variation, and parameter setting leads to the construction of a core grammar where all relevant UG principles are instantiated.     

UB Construction Grammar

The basic units of language representation are Constructions, which are form-meaning mappings. They are symbolic: their defining properties of morphological, syntactic, and lexical form are associated with particular semantic, pragmatic, and discourse functions. Constructions comprise concrete and particular items (as in words and idioms), more abstract classes of items (as in word classes and abstract constructions), or complex combinations of concrete and abstract pieces of language (as mixed constructions). Constructions may be simultaneously represented and stored in multiple forms, at various levels of abstraction (e.g., concrete item: table+s = tables and [Noun] + (morpheme +s) = plural things). Linguistic constructions (such as the caused motion construction, X causes Y to move Z path/loc [Subj V Obj Obl]) can thus be meaningful linguistic symbols in their own right, existing independently of particular verbs. Nevertheless, constructions and the particular verb tokens that occupy them resonate together, and grammar and lexis are inseparable (Ellis and Cadierno, 2009).

Transition theories  

So HOW do we learn an L2?


UG offers no L2 transition theory, “y punto”, as they say in Spanish. UG says that all human beings are born with an innate grammar – a fixed set of mental rules that enables children to create and utter sentences they have never heard before. Thus, language learning is faciliated by innate knowledge of a set of abstract principles that characterise the core grammars of all natural languages. This knowledge constrains possible grammar formation in such a way that children do not have to learn those features of the particular language to which they are exposed that are universal, because they know them already. This “boot-strapping” device, sometimes referred to as the LanguageAcquisition Device” (LAD) is the best explanation of how children know so much more about their L1 than can be got from the language they are exposed to. The ‘poverty of the stimulus argument’ is summed up by White:   

Despite the fact that certain properties of language are not explicit in the input, native speakers end up with a complex grammar that goes far beyond the input, resulting in knowledge of grammaticality, ungrammaticality, ambiguity, paraphrase relations, and various subtle and complex phenomena, suggesting that universal principles must mediate acquisition and shape knowledge of language (White 1989: 37).

Note that this refers to L1 acquisition. Those who take a UG view of SLA concentrate on the re-setting of parameters to partly explain how the L2 is learnt.  

Just by the way, I suppose I should deal with Scott’s accusations that UG is “based on made-up and often improbable data” which is “incapable of explaining variability & change”. First, the data pertaining to UG are contained in more than 60 years of research studies, hundreds of thousands of them, the results of which scholars (including UB theorists such as Nick Ellis, Tomasello and Larsen-Freeman) acknowledge as having contributed more to the advancement of science than those motivated by any other linguist in history. Second, that UG is incapable of explaining variability and change is hardly surprising, since it doesn’t attempt to, any more than Darwin’s theory attempts to explain tsunamis. To coin a phrase: It’s a question of domains, dear boy.    


The usage-based theory of language learning is based on associative learning –  “acquisition of language is exemplar based”. (Ellis, 2002: 143). “A huge collection of memories of previously experienced utterances” underlies the fluent use of language. Thus, language learning as “the gradual strengthening of associations between co-occurring elements of the language”, and fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173). 

Note that this is part of a general learning theory. Those who take a UB view of SLA see it as affected by L1 learning.  


To paraphrase Gregg (2003), nativist (UG) theories posit an innate representational system specific to the language faculty, and non-associative mechanisms, as well as associative ones, for bringing that system to bear on input to create an L2 grammar. UB theories deny both the innateness of linguistic representations  and any domain-specific  language learning mechanisms. For UB, input from the environment, plus elementary processes of association, are enough to explain SLA.

Clearing up the muddle about UG

Gregg (2003) discusses four “red herrings” used by those arguing against UG. I’ll paraphrase two of them, because they address Scott’s remarks.

The Argument from Vacuity

The argument is: calling a property ‘innate’ does not solve anything: it simply calls a halt to investigation of the property.

First, innate properties are generally assumed in science  – circulation of the blood is one such property. As for language, the jury is still out. Thus it is question-begging to argue that calling UG innate prevents us from investigating how language is learned.

Second, criticising the ‘innateness hypothesis’, often rests on a caricature of the argument from the Poverty of the Stimulus (POS), viz., ‘Property P cannot be learned; therefore it is innate’. But in fact the POS argument is more nuanced:

1)         An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.

2)         The correct set of principles need not be (and typically is not) in any pretheoretic sense simpler or more natural than the alternatives.

3)         The data that would be needed for choosing  among these sets of principles are in many cases not the sort of data that are available to an empiricist learner.

4)         So if children were empiricist learners, they could not reliably arrive at the correct grammar for their language.

5)         Children do reliably arrive at the correct grammar for their language.

6)         Therefore, children are not empiricist learners.

Emergentists need to show that the POS argument for language acquisition is false, by showing that empiricist learning suffices for language acquisition. In other words, they need to show that the stimuli are not impoverished, that the environment is indeed rich enough, and rich in the right ways, to bring about the emergence of linguistic competence (Gregg, 2003, p. 101).

The Argument from Neuroscience

As MacWhinney puts it, ‘Nativism is wrong because it makes untestable assumptions about genetics and unreasonable assumptions about the hard-coding of complex formal rules in neural tissue’ (2000: 728) (Gregg, 2003, p.103).

As Gregg says, the claim that there is an innate UG is a claim about the mind, not about the brain. If brain science could show that it is impossible to instantiate UG in a brain, the claim of an innate UG would clearly fail, but brain science has not shown any such impossibility; indeed, brain science has not yet been able to show how any cognitive capacity is instantiated. Thus, pace Scott Thornbury, to date, neuroscience cannot support any emergentist claim about the development of interlanguages. Furthermore, Scott seems to think that neurological explanations are somehow more ‘real’ or ‘basic’ than cognitive explanations; which is, in Gregg’s opinion (2003, p. 104) “a serious mistake”.

It is not simply that the current state of the art does not yet permit us to propose theories of  language acquisition at the neural level; it is rather that the neural level is likely not ever to be the appropriate level at which to  account for cognitive capacities like language, any more than the physical level is the appropriate level at which to account for the phenomena of economics….  There is no reason whatever for thinking that the neural  level  is now, or ever will be, the level where the best explanation  of language competence or language acquisition is to be found. In short, whether there is  a UG, and if  there is, whether it is  innate, are definitely open questions; but they cannot be answered in the negative merely by appealing to neural implausibility.

UB Theories

Scott has expressed his enthusisam for UB theories for some time now, but he has done little to support this enthusiasm with rational argument. I’ve commented on the limitations of his grasp of the issues in other posts (see, for example, Thornbury on Performance, and Thornbury Part 1 so it’s enough to say here that he fails to adequately address the criticisms of UB theories made by many scholars, including Carroll and Gregg for example. The basic problems that UB theories face are these:

Property theory: UB theory suggests that SLA is explained by the learner’s ability to do distributional analyses and to remember the products of the analyses. So why do they accept the validity of the linguist’s account of grammatical structure? And what bits do they accept? Ellis accepts NOUN, PHRASE STRUCTURE and STRUCTURE- DEPENDENCE, for example. As Gregg comments “Presumably the linguist’s descriptions simply serve to indicate what statistical associations are relevant in a given language, hence what sorts of things need to be shown to ‘emerge’.

Transition Theory: Language is acquired through associative learning, through, what Nick Ellis calls ‘learners’ lifetime analysis of the distributional characteristics of the input’ and the ‘piecemeal learning of thousands of constructions and the frequency-biased abstraction  of  regularities’. To borrow from Scott’s screaming protests about claims for UG “Where’s the evidence?” . Well the only evidence is the models of associative learning processes provided by connectionist  networks. But, as Gregg (2003) so persusively demonstrates, these connectionist models provide very little evidence to support the emergentist transition theory. Scott has made no attempt to reply to Gregg’s criticisms. Let me just give one part of Gregg’s argument, the part that deals with the connectionist claim that their models are ‘neurally inspired’.

The use of the term ‘neural network’ to denote connectionist models is perhaps the most successful case of false advertising since the term ‘pro-life’. Virtually no modeller actually makes any specific claims about analogies between the model and the brain, and for good reason: As Marinov says, ‘Connectionist systems, … have contributed essentially no insight into how knowledge is represented in the brain’ (Marinov, 1993: 256) Christiansen and Chater, who are themselves connectionists, put it more strongly: ‘But connectionist nets are not realistic models of the brain . . ., either at the level of individual processing unit, which drastically oversimplifies and knowingly falsifies many features of real neurons, or in terms of network structure, which typically bears no relation to brain architecture’ (1999: 419). In particular, it should be noted that backpropagation, which is the learning algorithm almost universally used in connectionist models of language acquisition is also universally recognized to be a biological impossibility; no brain process known to science corresponds to backpropagation (Smolensky, 1988; Clark, 1993; Stich, 1994; Marcus, 1998b).

I challenge Scott to answer the arguments so clearly laid out in Gregg’s (2003) article.

Finally, I can’t resist a quote from Eubank and Gregg’s (2002) article.

“And of course it is precisely because rules have a deductive structure that one can have instantaneous learning, without the trial  and error involved in connectionist learning.  With the English past tense rule, one can instantly determine the past tense form of “zoop” without any prior experience of that verb, let alone of “zooped” (unlike, say,  Ellis & Schmidt’s model, which could only approach the “correct” plural form for the test item, and only after repeated exposures to the singular form followed by repeated exposures to the plural form, along with back-propagated comparisons).  If all we know is that John zoops wugs, then we know instantaneously that John zoops, that he might have zooped yesterday and may zoop tomorrow, that he is a wug-zooper who engages in wug-zooping, that whereas John zoops, two wug-zoopers zoop, that if he’s a Canadian wug-zooper he’s either a Canadian or a zooper of Canadian wugs (or both), etc.  We know all this without learning it, without even knowing what “wug” and “zoop” mean.  A frequency / regularity account would need to appeal to a whole congeries of associations, between a large number of pairs like “rum-runner/runs rum, piano-tuner/tunes pianos, …”  but not like “harbor-master/masters harbors, pot-boiler/boils pots, kingfisher/fishes kings, …”, or a roughly equal number of pairs like “purple people-eater” meaning purple people or purple eater, etc.”

Follow that!


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Ellis, N. (2002) Fequency effects in language processing. SSLA, 24,2.  

Eubank, L., & Gregg, K. (2002) Nnews flash—hume still dead. Studies in Second Language Acquisition, 24(2), 237-247.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Gregg, K. R. (2003) The state of emergentism in second language acquisition. Second Language Research 19,2 (2003); pp. 95–128

Mike Long: Reply to Carroll’s comments on the Interaction Hypothesis

I’m very grateful to Mike Long for taking the time to wrie a quick response to Carroll’s comments on his Interaction Hypothesis, which I quoted in the latest post on Carroll’s Autonomous Induction Theory. His email to me is repoduced below, with his permission.

Important issues are raised – the roles of noticing versus “detection”, the reach of negative feedback, and, most importantly, perhaps, the statement “I side with Nick Ellis and the UB (not UG) hordes” – which I’ll try to address, once I’ve stopped sobbing, in Part 4.

Hi, Geoff,

Thank you for the valuable work you do with your blog, and just as important, the fact that it is also usually funny (in a good way).

It has been nice to see Susanne Carroll’s work getting an airing of late. I sent a several-page comment on your Part 3 via the Comment form yesterday, butit apparently disappeared into the ether. As I have a day job, and it’s term paper reading week, what follows is a quick and dirty rehash.

Much of the critique the pair of you leveled against the Interaction Hypothesis (IH) focused on one dimenson only: negotiation for meaning and the negative feedback (NF) it produces. But the IH and negotiation for meaning are a whole lot broader than that.

Genuinely communicative L2 interaction provides opportunities for learners focused on meaning to pick up a new language incidentally, as an unintended by-product of doing something else — communicating through the new language — and often also implicitly, i.e., without awareness. Interacting in the L2 while focused on what is said, learners sometimes perceive new forms or form-meaning-function associations consciously — a process referred to as noticing (Schmidt, 1990, 2010). On other occasions, their focus on communication and attention to the task at hand is such that they will perceive new items in the input unconsciously — a process known as detection (Tomlin & Villa, 1994). Detection is especially important, for as Whong, Gil, & Marsden (2014) point out, implicit learning and (barring some sort of consciousness-raising event)the end-product, implicit knowledge, is what is required for real-time listening and speaking. 

While communicating through the L2, a learner’s attention is likely to be drawn to problematic items by added salience resulting from typical characteristics of meaning-focused exchanges. For instance, NS or more proficient NNS interlocutors will consciously or unconsciously highlight important items, e.g., by pausing briefly before and/or after them, adding stress, repeating them, providing synonyms and informal definitions, moving them to more salient initial or final positions in an utterance through left-dislocation or decomposition, and through comprehension checks, confirmation checks and clarification requests, all triggering a momentary switch of the learner’s focal attention from meaning to linguistic form. In addition, NF — mostly implicit, mostly recasts — can have the same effect, while simultaneously providing whatever positive evidence is needed, whether a missing item or a model of more target-like usage. The same incidental learning process operates when learners read a book or a newspaper, listen to a radio interview or watch a movie. However, whereas those activities involve attempting to understand and learn from static spoken or written input intended for NSs and over which they have no control, face-to-face L2 interaction is dynamic, offering opportunities to negotiate for meaning. The negotiation work increases the likelihood that salience will be added, attention drawn to items uniquely problematic for them, and communicative trouble repaired.

You are both skeptical about learners’ ability to compare representations stored in long-term memory with the positive evidence contained in recasts, in order to “notice the gap”. I would be, too. But the comparison involves short-term, or working, memory (WM), not long-term memory. And why would that be inconceivable? Evidence that it is not only possible, but happens all the time (in L1 and L2) is abundant. For instance, what someone says or writes frequently primes, or triggers, use of the same lexis and syntax in that person’s own, or a listener’s,immediately following speech or writing (see, e.g., Doughty, 2001; McDonough 2006; McDonough & Mackey, 2008).

Then, think of the immediate recall sub-tasks common in language aptitude measures, such as LLAMA D and the n-back task in Hi-Lab. Essentially, learners hear/see a short string of sounds or letters and either have to say which ones in a new sequence they heard or read, or repeat them a few seconds later (n-back is a bit more complex). The same basic idea is employed in countless word-recognition post-tests in incidental learning studies. Everyone can do those tasks — some better than others, which is why they are used as a measure of language aptitude – and I reckon they tap roughly the same ability as that used in learning from recasts. The fact that the learner’s original utterance and a recast it triggers are both meaningful, unlike strings of random letters, sounds or words, can be predicted to make it even easier to hold and compare in short-term memory.

Recasts have seven additional qualities (Long, 1996) that make cognitive comparison and learning even more feasible. They convey needed information about the target language (i) in context, (ii) when listeners and speakers share a joint attentional focus, (iii)whenthe learner is vested in the exchange, (iv) and so is probably motivated and (v) attending. (vi) The fact that learners already have prior comprehension of at least part of the message the recast contains, because the reformulation they hear is of what they just tried to say, frees up attentional resources and facilitates form-function mapping. Indirect support for this idea may lie in the findings of a study of the value of content familiarity. Finally, and crucially, (vii) the contingency of recasts on deviant output means that incorrect and correct utterances are juxtaposed, allowing learners briefly to hold and compare the two versions in working memory (WM). Not convinced? Then considerthe fact that statistical meta-analyses (e.g., Goo, 2019; Li, 2010; Loewen& Sato, 2018) have shown that recasts result in measurable learning, with some evidence that they do so better than modelsand non-contingent speech, on salient targets, at least (Long, Inagaki, & Ortega, 1998).

And, again, it’s not just NF. Negotiation for meaning involves higher than usual frequencies of semantically contingent speech, including repetitions, reformulationsand expansions, sometimes functioning simultaneously as recasts, but more generally (something usually dear to UG-ers’ hearts) as (mostly)comprehensible, so processable, positive evidence usable for learning. Problematic forms are recycled, increasing their salience and the likelihood that they will be perceived by the learner. The positive evidence, moreover, is elaborated, not simplified (except for lower mean length of utterance), so retains the items to which learners need to be exposed if acquisition is to occur.

I side with Nick Ellis and the UB (not UG) hordes. Since learning a new language is far too large and too complex a task to be handled explicitly, and althoh it requires more input and time,implicit learning remains the default learning mechanism for adults:

“Even though many of us go to great lengths to engage in explicit language learning, the bulk of language acquisition is implicit learning from usage. Most knowledge is tacit knowledge; most learning is implicit; the vast majority of our cognitive processing is unconscious” (Ellis &Wulff, 2015, p. 8; and see Ellis &Wulff, 2019).

Sure, there are limitations. Williams (2009) notes, for example, that the scope of implicit learning may not extend to phenomena, such as anaphora, that involve non-adjacent items (Hawkins was arrested, and so were several members of his gang), which may no longer be learnable that way. And I have often pointed to evidence that the capacity for implicit learning, especially instance learning, weakens (not disappears) around age 12 (e.g., Long, 215, pp. 2017). But those are for another time, which, you will by now be relieved to hear, I don’t have now.

According to the Tallying Hypothesis (Ellis, 2002), ensuring that learners’ attention is drawn to learning targets that way, especially to abstract, perceptually non-salient items, can modify entrenched automatic L1 processing routines, thereby altering the way subsequent L2 input is processed implicitly. An initial representation is established in long-term memory and functions as a selective cue priming the learner to attend to and perceive additional instances when processingimplicitly. Ellis identifies what he calls “the general principleof explicit learning in SLA: changing the cues that learners focus on intheir language processing changes what their implicit learning processestune” (Ellis 2005, p. 327). Research is currently under way to determine whether it is possible to achieve the same results by unobtrusive,less interventionist means: enhanced incidental learning (Long, 2015, pp. 30-62, 2017, 2020).


Doughty, C. (2001a). Cognitive underpinnings of focus on form. In Robinson, P. (ed.), Cognition and Second Language Instruction (pp. 206-257). Cambridge: Cambridge University Press.

Ellis, N. C. (2005). At the interface: Dynamic interactions of explicit and implicit language knowledgeStudies in Second Language Acquisition 27, 2, 305-352.

Ellis, N. C. (2006) Selective attention and transfer phenomena in L2 acquisition: contingency, cue competition, salience, interference, overshadowing, blocking, and perceptual learning. Applied Linguistics 27, 2,164-194.

Ellis, N. C. &Wulff, S. (2015). Usage-based approaches to second language acquisition. In VanPatten, B., & Williams, J. (Eds., Theories in second language acquisition (pp. 75-93). New York: Routledge.

Ellis, N. C. & Wulff, S. (2019). Cognitive approaches to L2 acquisition. In Schwieter, J. W., &Benati​, A. (Eds.), The Cambridge Handbook of Language Learning (pp. 41-61). Cambridge: Cambridge University Press

Goo, J. M (2019. Interaction in L2 learning. In Schwieter, J. W., & Benati, A. (Eds.), The Cambridge handbook of language learning (pp. 233-257). Cambridge: Cambridge University Press.

Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning 60, 2, 309-365.

Loewen, S. & Sato, M. (2018). Interaction and instructed second language acquisition. Language Teaching 51, 3, 285-329.

Long, M. H. (2015). Second language acquisition and Task-Based Language Teaching. Oxford: Wiley-Blackwell.

Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In Ritchie, W. C., &Bahtia, T. K. (eds.), Handbook of second language acquisition (pp. 413-68). New York: Academic Press.

Long, M. H. (2015). Second language acquisition and Task-Based Language Teaching. Oxford: Wiley-Blackwell.

Long, M. H. (2017). Instructed second language acquisition (ISLA): Geopolitics, methodological issues, and some major research questions. Instructed Second Language Acquisition 1, 1, 7-44.

Long, M. H. (2020). Optimal input for language learning: genuine, simplified, elaborated, or modified elaborated? Language Teaching 53, 2, 169-182.

Long, M. H., Inagaki, S., & Ortega, L. (1998). The role of implicit negative feedback in SLA: Models and recasts in Japanese and Spanish. Modern Language Journal 82, 3, 357-371.

McDonough, K. (2006). Interaction and syntactic priming: English L2 speakers’ production of dative constructions. Studies in Second Language Acquisition, 28, 179-207.

McDonough, K., & Mackey, A. (2008). Syntactic priming and ESL question development. Studies in Second Language Acquisition 30, 1, 31-47.

Saxton, M. (1997). The contrast theory of negative input. Journal of Child Language 24, 139-161.

Whong, M., Gil, H.-G. and Marsden, E. (2014) Beyond paradigm: the ‘what’ and the‘how’ of classroom research. Second Language Research 30, 4, 551-568.

Susanne Carroll’s AIT: Part 3

In this third part of my exploration of Carroll’s Autonomous Induction Theory (AIT), I’ll look at “categorization” and feedback. In what follows I try to speak for Carroll and I apologise for the awful liberties I’ve taken with her texts.  All the quotes come from Carroll (2002), unless otherwise cited.


A theory of SLA must start from a theory of grammar. When we look at the grammars of natural languages, we note that they differ in their repertoires of categories: words are divided into different segments, and sentences comprise different classes of words and phrases. But how? As a basic example, a noun is not reducible to the symbol ‘N’: word classes consist of sound-meaning correspondences, so a noun is a union of phonetic features, phonological structure, morphosyntactic features, morphological structure, semantic features and conceptual structure. As Jackendoff says, words are correspondences connecting levels of representation.


UG provides the representational primitives of each autonomous level of representation, and UG provides the operations which the parsers can perform. In other words, UG severely constrains the ways that the categories at different levels of representation are unified and project into hierarchical structure.


A theory of i-learning explains what happens when a parser fails, and a new constituent or a new procedure must be learned.

In the case of category i-learning, UG provides a basic repertoire of  features in  each autonomous representational system. Features will combine to form complex units at a given level: phonetic features combine to form segments (a timing unit of speech), morphosyntactic features combine to form morphemes (the basic unit of the morphosyntax),and primitive semantic features combine to form complex concepts like Agent, Male, Cause, Consequence, and so on.

But UG is not the whole story: the acquisition of basic units within an integrative processor will reflect various constraints on feature unification within the limits defined by “unification grammars”.

Some of these constraints will presumably also be attributable to UG. What these restrictions actually consist of, however, is an empirical question and our understanding of such issues has come, and will continue to come, largely from cross-linguistic and typological grammatical research.

Having constructed representations, learners then have to identify them as instances of a category. So SLA consists of learning the categories and the correspondence rules which apply to a specific L2. UG provides some correspondence rules which, in first language acquisition, are used by infants to learn the language specific mappings needed for rapid processing of the particularities of the L1 phonology and morphosyntax. These are carried over into SLA, as are all L1 correspondence rules, which leads to transfer problems.

AIT is embedded in a theory of the functional architecture of the language faculty and linked to theories of parsing and production. Autonomous representational systems of language work with  constrained processing modules in working memory. When parsing fails, acquisition mechanisms try to fix the problem. A correspondence failure can only be fixed by a change to a correspondence rule, and an integration problem can only be changed by a change to an integration procedure.

Very importantly, evidence for acquisition comes in the form of mental representations, not from the speech stream, except in the case of i-learning of acoustic patterns of the phonetics of the L2. Carroll explains:

In this respect, this theory differs radically from the Competition Model and from all theories which eschew structural representations in favour of direct mappings between the sound stream and conceptual representations. If correct, it means that simply noting the presence or absence of strings in the speech stream is not going to tell us directly what is in the input to the learning mechanisms.


The place of lexis needs special mention. Following Jackendoff, Lexical items have correspondence rules, linking phonological, morphosyntactic and conceptual representations of words.

Since the contents of lexical entries in SLA are known to be a locus of transfer, the contents of lexical entries will constitute a major source of information for the learning mechanisms in dealing with stimuli in the L2.

Carroll says (2001, p. 84) “In SLA, the major “bootstrapping” procedure may well be lexical transfer”. She says this in the context of arguing for the limited effects of UG on SLA, and I wish she’d said more.


So, a theory of SLA must start with a theory of linguistic knowledge, of mental grammars. Then, it has to explain how a mental grammar is restructured. After that, a theory of linguistic processing must explain how input gets into the system, thereby creating novel learning problems, and finally, a theory of learning must show how novel information can be created to resolve learning problems. I’ve covered all this, however badly, but more remains.


On page 31 of Input and Evidence we get a re-formulation of Carroll’s research questions:

I have to say that I see little of relevance in the next 300 pages, but the last three chapters do have a shot at answering them. I don’t think she does a good job of it, but that’s for Part 4. If you’re already exhausted, think how I feel about the task of telling you about it.

We must return again to Carroll’s most central claims (IMHO) that ‘input’ and ‘intake’ are badly defined theoretical constructs which make a bad starting point for any theory of SLA, and that consequent talk of ‘L1 transfer’; ‘noticing’; ‘negotiation of meaning’; and ‘ouput’ are similarly unsatisfactory components of a theory of SLA. The starting point should be stimuli from the environment, not linguistic input (whatever that is), and we must then explain how these stimuli get represented and successfully transformed into developing interlanguages. This demands not just a property theory to describe what is being developed, but a much better model of the learning mechanisms and the reasoning involved than is presently on offer.

A taster

Long’s Interaction Hypothesis states that the role of feedback is to draw the learner’s attention to mismatches between a stimulus and the learner’s output, and that they can learn a grammar on the basis of the “negotiation of meaning.” But what is  meant by these terms? For Carroll, “input” means stimulus, and “output” means what the learner actually says, so the claim is that the learner can compare a representation of their speech to a representation of the speech signal. Why should this help the learner in learning properties of the morphosyntax or vocabulary, since the learner’s problems may be problems of incorrect phonological or morphosyntactic structure? To restructure the mental grammar on the basis of feedback, the learner must be able to construct a representation at the relevant level and compare their  output — at the right level — to that.

It would appear then,…  that the Interaction Hypothesis presupposes that the learner can compare representations of her speech and some previously heard uttefance at the right level of analysis. But this strikes me as highly implausible cognitively speaking. Why should we suppose that learners store in longterm memory their analyses of stimuli at all levels of analysis? Why should we assume that they are storing in longterm memory all levels of analysis of their own speech?… Certainly nothing in current processing theory would lead us to suppose that humans do such things. On the contrary, all the evidence suggests that intermediate levels of the analysis of sentences are fleeting, and dependent on the demands of working memory, which is concerned only with constructing a representation of the sort required for the next level of processing up or down. Intermediate levels of analysis of sentences normally never become part of longterm memory. Therefore, it seems reasonable to suppose that the learner has no stored representations at the intermediate levels of analysis either of her own speech or of any stimulus heard before the current “parse moment.” Consequently, he cannot compare his output (at the right level of analysis) to the stimulus in any interesting sense….. Moreover, given the limitations of working memory, negotiations in a conversation cannot literally help the learner to re-parse a given stimulus heard several moments previously. Why not? Because the original stimulus will no longer be in a learner’s working memory by the time the negotiations have occurred. It will have been replaced by the consequences of parsing the last utterance from the NS in the negotiation. I conclude that there is no reason to believe that the negotiation of meaning assists learners in computing an input-output comparison at the right level of representation for grammatical restructuring to occur (Carroll, 2001, p. 291).

Preposterous, right, Mike?

Fun will finally ensue when, in Part 5, I get together with a bunch of well oiled chums in a video conference session to defend Carroll’s insistence on a property theory and a language faculty against the usage based (UB) hordes. Neil McMillan (whose Lacan in Lockdown: reflections from a small patio is eagerly awaited by binocular-wielding graffiti fans in Barcelona); Kevin Gregg (train schedule permitting); and Mike (‘pass the bottle’) Long are among the many who probably won’t take part.


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Carroll’s AIT Theory: Part 2

This is the second part of my exploration of Susanne Carroll’s theory of SLA. Carroll’s work is important, IMHO, because it questions many of the constructs used by SLA theorists, including ‘comprehensible input’, ‘processing’, ‘i +1’, ‘noticing’, ‘noticing the gap’, ‘L1 transfer’, ‘chunk learning’, and many others. By examining Carroll’s work, I think we can throw light on all these constructs and come to a better understanding of how people learn an L2.

In Part One, I looked at Carroll’s adoption of Jackendoff’s Representational Modularity (RM) theory; a theory of modular mind where each module contains levels of representation organised in chains going from the lowest to the highest. The “lowest” representations are stimuli and the “highest” are conceptual structures. This leads to the hypothesis of levels:

Selinker, Kim and Bandi-Rao (2004, p. 82) summarise RM thus:  

The language faculty consists of auditory input, motor output to vocal tract, phonetic, phonological, syntactic components and conceptual structure, and correspondence rules, various processors linking/regulating one autonomous representational type to another. These processors, domain specific modules, all function automatically and unconsciously, with the levels of modularity forming a structural hierarchy representationally mediated in both top-down and bottom-up trajectories.

And Carroll (2002) says:

What is unique to Jackendoff’s model is that it makes explicit that the processors which link the levels of grammatical representation are also a set of modular processors which map representations of one level onto a representation at another level. These processors basically consist of rules with an ‘X is equivalent to Y’ type format. There is a set of processors for mapping ‘upwards’ and a distinct set of processors for mapping ‘downwards’.

Bottom-up correspondence processors

a.         Transduction of sound wave into acoustic information

b.         Mapping of available acoustic information into phonological format.

c.         Mapping of available phonological structure into morphosyntactic format.

d.         Mapping of available syntactic structure into conceptual format.

Top-down correspondence processors

a.         Mapping of available syntactic structure into phonological format.

b.         Mapping of available conceptual structure into morphosyntactic format.

Integrative processors

a.         Integration of newly available phonological information into unified phonological structure.

b.         Integration of newly available morphosyntactic information into unified morphosyntactic structure.

c.         Integration of newly available conceptual information into unified conceptual structure.

(Jackendoff 1987, p. 102, cited in Carroll, 2002, p. 16).  


The second main component of Carroll’s AIT is induction. Induction is a form of reasoning which involves going from the particular to the general. The famous example (given in Philosophy for Idiots, Thribb, 17, cited in Dellar, passim) is of swans. You define a swan and then search lakes looking to see what colour particular examples of it are. All the swans you see in the first lake are white, and so are those in the second lake. Everywhere you look, they’re all white, so you conclude that “All swans are white”. That’s induction. Hume (see Neil McMillan (unpublished) The influence of Famous Scottish Drunkards on Lacard’s psychosis; a bipolar review) famously showed that induction is illogical – no inference from the particular to the general is justified. No matter how many white swans you observe, you’ll never know that they’re ALL white, that there isn’t a non-white swan lurking somewhere, so far unobserved. Likewise, you can’t logically induce that because the sun has so far always risen in the East that it will rise in the East tomorrow. Popper “solved” this conundrum by saying that we’ll never know the truth about any general theory or generalisation, so we just have to accept theories “tentatively”, testing them in attempts not to prove them (impossible) but, rather, to falsify them. If they withstand these tests, we accept the theory, tentatively, as “true”.    

The assumption of all SLA “cognitive processing” transition theories is that the development of interlanguages depends on the gradual reformulation of the learner’s mental conceptualisations of the L2 grammar. These reformulations can be seen as following the path suggested by Popper to get to reliable knowledge:

P1 -> TT¹ -> EE -> P2 -> TT², etc.

P = problem

TT = tentative theory

EE = testing for empirical evidence which conflicts with TT   

You start with a problem and you leap to a tentative theory (TT) and then you test it, trying to falsify it with empirical evidence. If you find such contradictory evidence, you have a problem, and you re-formulate the theory (TT²) which tries to deal with the problem, and you then test again, and round we go again, slowly improving the theory. Popper is talking about hypothesis testing and theory construction in the hard sciences (particularly physics), and while it’s a long way from describing what scientists actually do, it’s even further away from describing what L2 learners do in developing interlanguages. Nevertheless, it’s common to hear people describing SLA as hypothesis formation and hypothesis testing.

We could, I suppose, see the TT1¹ as the learner’s initial interlanguage theory. Then, at any given point in its trajectory, the theory gets challenged by evidence that doesn’t fit (perhaps went when goed is expected, for example) and the problem is resolved by a new, more sophisticated theory, the TT². But it doesn’t work – interlanguage development is not a matter of hypothesis formation and testing in Popper’s sense, and I agree with Carroll that it’s “a misleading metaphor”. In her view, SLA is a process of “learning new categories of the grammar, new structural arangements in on-line parses and hence new parsing procedures and new productive schemata” (Carroll, 2001, p. 32). Still, Hume’s problem of underdeterminism remains – the inductions that learners are said to make aren’t strictly logical. (“Just saying” (McMillan, ibid)).  

So anyway, Carroll wants to see SLA development (partly) as a process of induction. The most respectable theory of induction is inference to the best explanation, also known as abduction, and I think Lipton (1991) provides the best account, although Gregg (1993) does a pretty good job of it in a couple of pages (adeptly including a concise account of  Hempel’s D-N model, by the way). Carroll, however, ducks the issues and follows Holland et al., (1986), who define induction as a set of procedures which lead to the creation and/or refinement of the constructs which form mental  models (MMs). Mental models are “temporary and changing complex conceptual representations of specific situations”.  Carroll gives the example of a Canadian’s MM of a breakfast event, versus, say, the very different one of a Japanese MM breakfast event. MMs are domains of knowledge, schemata, if you like, and Carroll makes lots of use of MMs which I’m going to skip over. She then goes into considerable detail about categorising MMs, and then procedes to “Condition-action rules” which govern induction. These are competition rules which share ideas from abduction in as much as they say “When confronted with competing solutions to a problem, choose the most likely, the best ‘fit’ ”.

 Carroll (2001, p. 170) finally (sic) defines induction as a process

leading to revision of representations so that they are consistent with information currently represented in working memory. Its defining property is that it is rooted in stimuli made available to the organism through the perceptual system, coupled with input from Long Term Memory and current computations. … the results of i-learning depend upon the contents of symbolic representations.


Carroll’s theory of learning rests on i-learning (as opposed to ‘I language’ in Chomsky’s sense, which has very little to do with it, and one can only wish she’d chosen some other term, rather like Long’s unhappy choice of “Focus on FornS”). I-learning depends on the contents of symbolic representations being computed by the learning system.

At the level of phonetic learning content of phonetic representations, to be defined in terms of acoustic properties. At the level of phonological representation, i-learning will depend on the content of phonological representations, to be defined in terms of  prosodic  categories, and featural specification ef segments. At the level of morphosyntactic learning, i-learning will depend upon the content ‹if morphosyntactic representations. And so on.

So, it seems, i-learning  goes on autonomously within all the parts of Lackendoff’s theory of modularity,  not just in the conceptual representational system. (I take it that this is where Carroll’s ‘competition’ comes in – analysing a novel form involves competition  among various information sources from different levels.) Anyway, the key point is that i-learning is triggered by the failure of current  representations  to “fit” current models in conjunction with specific environmental stimuli.

More light

I usually don’t comment on my choice of images, but the above image shows Goethe on his death bed. His wonderful dying words were, according to his doctor, Carl Vogel, “Mehr Licht!” And I can’t help sharing this anecdote. In my first seminar, in my first term of my first year at LSE, I read a paper presided over by Imre Lakatos, one of the finest scholars I’ve ever met, and later a friend who committed perjory in court to help me avoid being found guilty of a criminal charge. The paper was about German developments in science, and I mentioned Goethe, whose name I pronounced ‘Go eth’. Lakatos was drinking a coffee at the moment when I said “Go eth” and reacted very violently. He spat the coffee out, all over the alarmed students sitting round the table in his study, jumped to his feet, and shouted hysterically: “I fail to understand how anybody who’s been accepted into this university can so hopelessly mispronounce the name of Germany’s most famous poet!”

I use Goethe’s dying words here to refer to Carroll’s 2002 paper, which really does throw more light on her difficult-to-follow 2001 work.

In her (2002) account of I-learning, Carroll argues that researching the nature of induction in language acquisition requires the notion of a UG, which describes the properties of grammatical knowledge shared by all human languages. The psycholinguistic processes which result in this knowledge are constrained by UG – which, she insists, doesn’t mean that “UG is thereby operating on-line in any fashion or indeed is stored anywhere to be consulted, as one might infer from much generative SLA research” (Carroll, 2002, p. 11).

Carroll goes on to say that a speaker’s I-language consists of a particular combination of universal and acquired contents, so that a theory of SLA must explain not only what is universal in our mental grammars, but also what is different both among speakers of the same E-language and among the various E-languages of the world.

In order to have a term to cover a UG-compatible theory of acquisition, as well as to make an explicit connection to I-language, I suggest we call such a theory of acquisition a theory of i(nductive)-learning, specifically the Autonomous Induction Theory (Carroll, 2002, p.12).

In other words, while Chomsky is concerned with explaining I-language, Carroll is concerned with explaining the much wider construct of I-learning; she wants to integrate a theory of linguistic competence with theories of performance. So, it goes like this:

The perception of speech, the recognition of words, and the parsing of sentences in the L2 requires the application of unit detection and structure-building procedures. When those procedures are in place, speech processing is perfomed satisfactorily. But when the procedures are not available, (e.g., to the beginning L2 learner), speech proccessing will fail, forcing the learner to fall back on inferences from the context, stored knowledge, etc. But, of course, beginners have very few such rescources to draw on, and so interpretation of the stimulus will fail, which is when i-learning mechanisms will be activated.

When speech detection, word recognition, or sentence parsing fail,… only the i-learning mechanisms can fix the problem . They go into action automatically and unconsciously (Carroll, 2002, p13).

To start with then, the learner hears the speech stream as little more than noise. Comprehension depends on their learning the right cues to syllable edges and and the segments which comprise the syllables. Only once these cues to the identification of phonological units has been learned can word learning begin. After that, form-extraction processes which map some unit or other of the phonology onto a morphosyntactic word will allow the learner to hear a form in the speech stream, but still without necessarily knowing what it means. After that, when learners can identify words and know what they mean, they might still lack the parsing procedures needed  to use morphosyntactic cues to arrive at the correct sentence structure and hence arrive at the correct sentence meaning. Either they fail to arrive at any interpretation,  or they arrive at the wrong one – their semantic representation isn’t the same as what was intended by the speaker. Finally, their i-learning allows them to get the right meaning – the parsers can now do their job satisfactorily.

Recall what was said in Part 1: Krashen got it backwards! This is the real thrust of Carroll’s argument: input must be seen not as linguistic stuff coming straight from the environment, but rather as stuff that results from processes going on in the mind which call on innate knowledge. Furthermore: YOU CAN’T NOTICE GRAMMAR!

So there you have it. Except that, really, that’s nowhere near “it”. Carroll admits that her theory doesn’t explain what the acquisition mechanism does when parsing breaks down. She asks:

How does the mechanism move beyond that point of parse, and what are the constraints on the solution the learner alights on?  Why do the acquisition mechanisms which attempt to restructure the existing parsing procedures and correspondence rules to deal with a current parse problem often fail?”

The answers partly lie in Carroll’s investigation of  “Categories and categorization” and partly in the roles of feedback and correction. In an early refomulation of her research questions in Input and Evidence, Carroll emphasises the importance of feedback and correction to her work, which points to her important contributions to examining the empirical evidence found in the SLA literature, and also highlights some of the ways in which this evidence has been (mis)used. All this will be discussed in Part 3, where I’ll also look at what Carroll’s AIT has to say about explicit and implicit learning, and about what some of today’s gurus in ELT might learn from Carroll’s work.

This is a blog post, not an academic text. I’m exploring Carroll’s work, and I’ve no doubt made huge mistakes in describing and interpreting it. I await correction. But I hope it will provoke discussion among the many ELT folk who enjoy shooting the breeze about important questions which have a big impact (or should I say ‘impact big time’) on how we organise and implement teaching programmes.


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Lipton, P.: (1991) Inference to the Best Explanation . London: Routledge.

Popper, K. R. (1972)  Objective Knowledge.  Oxford: Oxford University Press.

Selinker, L., Kim, D. and Bandi-Rao, S. (2004) Linguistic structure with processing in second language research: is a ‘unified theory’ possible? Second Language Research 20,1, pp. 77–94

The place of Jackendoff’ Representational Modularity Theory in Carroll’s Autonomous Induction Theory


Jackendoff’s Representational Modularity Theory (Jackendoff, 1992) is a key component in Susanne Carroll’s Autonomous Induction Theory, as described in her book Input and Evidence (2001). Carroll’s book is too often neglected in the SLA literature, and I think that’s partly because it’s very demanding. Carroll goes into so much depth about how we learn; she covers so much ground in so much methodical detail; she’s so careful, so thorough, so aware of the complexities, that if you start reading her book without some previous understanding of linguistics, the philosophy of mind, and the history of SLA theories, you’ll find it very tough going. Even with some such understanding of these matters, I myself find the book extremely challenging. Furthermore, the text is dense and often, in my opinion, over elaborate; you have to be prepared to read the text slowly, and at the same time keep on reading while not at all sure where the argument’s going, in order to “get” what she’s saying.

One criterion for judging theories of SLA is an appeal to Occam’s Razor: ceteris paribus (all other things being equal), the theory with the simplest formula, and the fewest number of basic types of entity postulated, is to be preferred for reasons of economy. Carroll’s theory scores badly here: it’s complicated! Her use of Jackendoff’s theory, and of the Induction Theory of Holland means that her theory of SLA counts on a variety of formula and entities, and thus it’s not “economical”. On the other hand, it’s one of the most complete theories of SLA on offer.

Over the years, I’ve spent weeks reading Carroll’s Input and Evidence, and now, while reading it yet again in “lockdown”, I’m only just starting to feel comfortable turning from one page to the next. But it’s worth it: it’s a classic; one of the best books on SLA ever, IMHO, and I hope to persuade you of its worth in what follows. I’m going to present The Autonomous Induction Theory (AIT) in an exploratory way, bit by bit, and I hope we’ll end up, eventually, with some clear account of AIT and what it has to say about second language learning, and its implications for teaching.

To the issues, then.

UG versus UB

In the current debate between Chomsky’s UG theory and more recent Usage-based (UB) theories of language and language learning, most of those engaged in the debate see the two theories as mutually contradictory: one is right and the other is wrong. One says language is an abstract system of form-meaning mappings governed by a grammar (in Chomsky’s case a deep grammar common to all natural languages as described in the Principles and Parameters version of UG), and this knowledge is learned with the help of innate properties of the mind. The other says language should be described in terms of its communicative function; as Saussure put it “linguistic signs arise from the dynamic interactions of thought and sound – from patterns of usage”. The signs are form-meaning mappings; we amass a huge collection of them through usage; and we process them by using relatively simple, probabilistic algorithms based on frequency.

O’Grady (2005) has this to say:

The dispute over the nature of the acquisition device is really part of a much deeper disagreement over the nature of language itself. On the one hand, there are linguists who see language as a highly complex formal system that is best described by abstract rules that have no counterparts in other areas of cognition. (The requirement that sentences have a binary branching syntactic structure is one example of such a “rule.”) Not surprisingly, there is a strong tendency for these researchers to favor the view that the acquisition device is designed specifically for language. On the other hand, there are many linguists who think that language has to be understood in terms of its communicative function. According to these researchers, strategies that facilitate communication – not abstract formal rules – determine how language works. Because communication involves many different types of considerations (new versus old information, point of view, the status of speaker and addressee, the situation), this perspective tends to be associated with a bias toward a multipurpose acquisition device.

Susanne Carroll tries to take both views into account.

Property Theories and Transition Theories of SLA

Carroll agrees with Gregg (1993) that any theory of SLA has to consist of two parts:

1) a property thory which describes WHAT is learned,

2) a transition theory which explains HOW that knowledge is learned.

As regards the property theory, it’s a theory of knowledge of language, describing the mental representations that make up a learner’s grammar – which consists of various classifications of all the components of language and how they work together. What is it that is represented in the learner’s knowledge of the L2? Chomsky’s UG theory is an example; Construction grammar is another; The Competition Model of Bates & MacWhinney (1989, cited in Carroll, 2001) is another; while general knowledge representations, and forms of rules of discourse, Gricean maxims , etc. are, I suppose also candidates.

Transition theories of SLA explain how these knowledge states change over time. The changes in the learner’s knowledge, generally seen as progress towards a more complete knowledge of the target language, need to be explained by appeal to a causal mechanism by which one knowledge state develops into another.

Many of the most influential cognitive processing theories of SLA (Chaudron, 1985; Krashen, 1982; Sharwood Smith, 1986, Gass, 1997, Towell & Hawkins, 1994, cited in Carroll, 2001) concentrate on a transition theory. They explain the process of L2 learning in terms of the development of interlanguages , while largely ignoring the property theory, which they sometimes, and usually vagely, assume is dealt with by UG. New UB theories (e.g. Ellis, 2019; Tomesello, 2003) reject Chomsky’s UG property theory and rely on what Chomsky regards as performance data for a description of the language in terms of a Construction Grammar. More importantly, perhaps, their ‘transition theory’ makes a minimal appeal to the workings of the mind; they’re at pains to use quite simple general learning mechanisms to explain how “associative” learning, acting on input from the environment, explains language learning.

Mentalist Theories

Carroll bases her approach on the view that humans have a unique, innate capacity for language, and that language learning goes on in a modular mind. Here, I’ll leave discussions about the philosophy of mind to one side, but suffice it to say for now that ‘mind’ is a theoretical construct referring to a human being’s world of thought, feeling, attitude, belief and imagination. When we talk about the mind, we’re not talking about a physical part of the body (the brain), and when we talk about a modular mind, we’re not talking about well-located, separate parts of the brain.

Carroll rejects Fodor’s (1983) claim that the language faculty comprises a single language module in the mind’s architecture, and she sees Chomsky’s LAD as an inadequate description of the language faculty. Rather than accept that language learning is crucially explained by the workings of a “black box”, Carroll explores the mechanisms of mind more closely, and, following Jackendoff, suggests that the language faculty operates at different levels, and is made up of a chain of mental representations, with the lowest level interacting with physical stimuli, and the highest level interacting with conceptual representations. Processing goes on at each level of representation, and a detailed description of these representations explains how input is processed for parsing.

Carroll further distinguishes between processing for parsing and processing for learning, such that, in speech, for example, when the parsers fail to get the message, the learning mechanisms take over. Successful parsing means that the processors currently at the learner’s disposal are able to use existing rules which categorize and combine representations to understand the speech signal. When the rules are inadequate or missing, parsing breaks down; and in order to deal with this breakdown, the known rule that helps most in parsing the problematic item of input is selected and subsequently adapted or refined until parsing succeeds at that given level. As Sun (2008) summarises “This procedure explains the process of acquisition, where the exact trigger for acquisition is parsing failure resulting from incomprehensible input”.

Scholars from Krashen to Gass take ‘input’ and ‘intake’ as the first two necessary steps in the SLA process (Gass’s model suggests that input passes through the stages of “apperceived” and “comprehended” input before becoming ‘intake’), and ‘intake’ is regarded as the set of processed structures waiting to be incorporated into interlanguage grammar. The widely accepted view that in order for input to become intake it has to be ‘noticed’, as described by Schmidt in his influential 1990 paper, has since, as the result of criticism (see, for example, Truscott, 1998) been seriously modified so that it now approximate to Gass’ ‘apperception’ (see Schmidt 2001, 2010), but it’s still widely seen as an important part of the SLA process.

Processing Theories of SLA

Caroll, on the other hand, sees input as physical stimuli, and intake as a subset of this stimuli.

The view that input is comprehended speech is mistaken. Comprehending speech ..happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards! (Carroll, 2001, p. 78).

Referring not just to Krashen, but to all those who use the constructs ‘input’, ‘intake’ and ‘noticing’, Gregg (in a comment on one of my blog posts) makes the obvious, but crucial point: “You can’t notice grammar”! Grammar consists of things like nouns and verbs, which, are, quite simply, not empirically observable things existing “out there” in the environment, waiting for alert, on-their-toes learners to notice them.

So, says Carroll, language learning requires the transformation of environmental stimuli into mental representations, and it’s these mental representations which must be the starting point for language learning. In order to understand speech, for example, properties of the acoustic signal have to be converted to intake; in other words, the auditory stimulus has to be converted into a mental representation. “Intake from the speech signal is not input to leaning mechanisms, rather it is input to speech parsers. … Parsers encode the signal in various representational formats” (Carroll, 2001, p.10).  


Sorry for the poor quality of the scan.

Jackendoff’s Representational Modularity

We now need to look at Jackendoff ‘s (1992) Representational Modularity. Jackendoff presents a theory of mind which contrasts with Fodor’s modular theory (where the language faculty constitutes a single module which processes already formed linguistic representations) by proposing that particular types of representation are sets belonging to different modules. The language faculty has several autonomous representational systems and information flows in limited ways from a conceptual system into the grammar via correspodence rules which connect the autonomous representational systems (Carroll, 2001, p. 121).

Jackendoff’s model has various cognitive faculties, each associated with a chain of levels of representation. The stimuli are the “lowest” level of representation, and “conceptual structures” are the “highest”. The chains intersect at various points allowing information encoded in one chain to influence the information encoded in another. This amounts to Jackendoff’s hypothesis of levels.


Here’s a partial model


Jackendoff  proposes that, in regard to language learning, the mind has three representational modules: phonology, syntax, and semantics, and that it also has interface modules which, by defining correspondence rules between representational formats, allow them to pass information along from the lowest to the highest level. This is important for Carroll, because, as we’ll see, the different modules are autonomous and so there must be a translation processor for each set of correspondence rules linking one autonomous representation type to another.

What Carroll wants from Jackendoff is “a clear picture of the functional architecture of the mind” (Carroll, 2001, p. 126), on which to build her induction model. In Part 2, I’ll deal with the Induction bit, but we must finish Part 1 by looking at other parts of Jackendoff’s work.

The Architecture of the Language Faculty,

 In The Architecture of the Language Faculty, Jackendoff  argues for the central part played in language by the lexicon. The lexicon is not part of one of his representational modules, but rather the central component of the interface between them. Lexical items include phonological, syntactic, and semantic content, and thus any lexical item is a set of three structures linked by correspondence rules. Furthermore, since lexical items are part of this general interface, there is no need to restrict them to word-sized elements–they can be affixes, single words, compound words, or even whole constructions, including MWUs, idioms, and so on. As Stephenson (1997) says: Simply put, the claim is that what we call the lexicon is not a distinct entity but rather a subset of the interface relations between the three grammatical subsystems. … Jackendoff’s proposal thus has the potential to provide a uniform characterization of morphological, lexical, and phrase-level knowledge and processes, within a highly lexicalized framework.

To bring this home, I offer two presentations by Jackendoff. In the first presentation, Jackendoff  argues that lexis only – “linear grammar” – paved the way for modern languages. It’s eloquent, to say the very least.

The main argument is, of course, the importance of the lexicon, but I think this diagram is particularly interesting.

Never mind the details, just that comprehending starts with percepual stimuli and goes through various levels of representation from lowest to highest, while speaking starts with responding to stimuli actively and goes in the opposite direction.

In the second presentation, Jackendoff talks about mentalism and formalism. Please skip to Minute 49.

There a handout fot this which I recommend you download and then follow. Click here.

In this presentation Jackendoff argues that we should abandon the assumption made by generative grammar that lexicon and grammar are fundamentally different kinds of mental representations. If the lexicon gets progressively more and more rule-like, and you erase the line between words and rules, then you slide down a slippery slope which ends up with HPSG (Head-driven phrase structure grammar), Cognitive Grammar, and Construction Grammar, which, he says, is “not so bad”.

So, we may well ask, is Jackendoff  a convert to UB theories? How can he be, if he bases his theory of Representational Modularity on the assumption of our possession of a modular mind? How can all this ‘mental representation’ stuff be reconciled with an empiricist view like N. Ellis’ which wants to explain language learning almost exclusively in terms of input from the environment? Part of the answer is, surely, that UB theory has a lot more mental stuff going on than it cares to recognise, but, in any case, I hope we can explore this further in Part 2, and I’d be very pleased if it leads to a lively discussion.  

To summarise then, Jackendoff (2000) replaces Chomsky’s generative grammar with the view that syntax is only one of several generative components. Lexical items are not, pace Chomsky, inserted into initial syntactic derivations, and then interpreted through processes of derivations, but rather, speech signals are processed by the auditory-to-phonology interface module to create a phonological representation. After that, the phonology-to-syntax interface creates a syntactic structure, which is then, aided by the syntax-to-semantics interface module, converted into a propositional structure, i.e. meaning. Which is why, when a lexical item becomes activated, it not only activates its phonology, but it also activates its syntax and semantics and thus “establishes partial structures in those domains” (Jackendoff, 2000: 25). The same but reversed process takes place in language production.

What does Suzanne Carroll make of it all? Can you make do with Netflix till the next exciting episode comes along??

Well, well. I hope you find this half as interesting as I do. Onward through the fog.


Carroll, S. (2001) Input and Evidence. Amsterdam, Bejamins.

Ellis, N. C. (2019). Essentials of a theory of language cognition. Modern Language Journal, 103.

Fordor, J. (1987) The Modularity of mind. Cambridge, MA, MIT Press.

Gregg. K.R. (1993) Taking Explanation seriously. Applied Linguistics, 14, 3.  

Jackendoff, R.S. (1992) Language of the mind. Cambridge, Ma; MIT Press.

O’Grady, W. (2005)  How Children Learn Language. Cambridge, UK: Cambridge University Press

Schmidt,R. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Stevenson, S. (1997) A Review of The Architecture of the Language Faculty. Computational Linguistics, 24, 4.

Sun, Y.A. (2008) Input Processing in Second Language Acquisition: A Discussion of Four Input Processing Models. Working Papers in TESOL & Applied Linguistics, Vol. 8, No. 1.

Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.

Truscott, John (1998). “Noticing in second language acquisition: a critical review”  Second Language Research14 (2): 103–135.


In a recent blog post I said:

“UB theories are increasingly fashionable, but I’m still not impressed with Construction Grammar, or with the claim that language learning can be explained by appeal to noticing regularities in the input”.

Scott Thornbury replied:

“Read Taylor’s The Mental Corpus (2012) and then say that!”.  

Well I’ve had a second go at reading Taylor’s book, and here’s my reply, based partly on a review by Pieter Seuren

Taylor’s thesis is that knowledge of a language can be conceptualized in terms of the metaphor of a “mental corpus”. Language knowledge is not knowledge of grammar rules, but rather  “accumulated memories of previously encountered utterances and the generalizations which arise from them.” Everybody remembers everything in the language that they have ever heard or read. That’s to say, they remember everything they’ve ever encountered linguistically, including phonetics, the context of utterances, and precise semantic, morphological and syntactic form. Readers may well think that this has much in common with Hoey’s (2006) theory, and that’s not where the similarities end: like Hoey, Taylor offers no explanaton of how people draw on this “literally fabulous” memory. Taylor says nothing about the  formula of analysis in memory; nothing about the internal structure of that memory; nothing about how speakers actually draw on it; nothing about the kind of memory involved; “in short, nothing at all”.

Seuren argues that while there’s no doubt that speakers often fall back on chunks drawn holistically from memory, they also use rules. Thus, criticism of Chomsky’s UG is no argument against the viability of any research programme involving the notion of rules.

Taylor never considers the possibility of different models of algorithmically organized grammar. One obvious possibility is a grammar that converts semantic representations into well-formed surface structures, as was proposed during the 1970s in Generative Semantics. One specific instantiation of such a model is proposed in my book Semantic Syntax (Blackwell, Oxford 1996), … This model is totally non-Chomskyan, yet algorithmic and thus rule-based and completely formalized. But Taylor does not even consider the possibility of such a model.

Without endorsing Seuran’s model of grammar, or indeed his view of language, I think he makes a good point. He concludes

Apart from the inherent incoherence of Taylor’s ‘language-as-corpus’ view, the book’s main fault is a conspicuous absence of actual arguments: it’s all rhetoric, easily shown up to be empty when one applies ordinary standards of sound reasoning. In this respect, it represents a retrograde development in linguistics, after the enormous methodological and empirical gains of the past half century.

In a tweet, Scott Thornbury points to Martin Hilbert’s (2014) more favourable review of Taylor’s book, but neither he nor I have a copy of it, so we’ll have to wait while Scott gets hold of it.

Meanwhile, let’s return to the usage based (UB) theory claim that language learning can be explained by appeal to noticing regularities in the input, and that Construction Grammar is a good way of describing the regularities that are noticed in this way.

Dellar & Walkley, and Selivan, the chief proponents of “The Lexical Approach”, can hardly claim to be the brains of the UB outfit, since they all misrepresent UB theory to the point of travisty. But there are, of course, better attempts to describe and explain UB theory, most noteably by Nick Ellis (see, for example Ellis, 2019). Language can be described in terms of constructions (see Wuff & Ellis, 2018), and language acquisition can be explained by simple learning mechanisms, which boil down to detecting patterns in input: when exposed to language input, learners notice frequencies and discover language patterns. As Gregg (2003) points out, this amounts to the claim that language learning is associative learning.

When Ellis, for instance, speaks of ‘learners’ lifetime analysis of the distributional characteristics of the input’ or the ‘piecemeal learning of thousands of constructions and the frequency-biased abstraction  of regularities’, he’s talking of association in the standard empiricist sense.

Here we have to pause and look at empiricism, and its counterpart rationalism.

Empiricists claim that sense experience is the ultimate source of all our concepts and knowledge. There’s no such thing as “mind”; we’re born with a “tabula rasa”, our brain an empty vessel that gets filled with our experiences and so our knowledge is a posteriori, dependent wholly upon our history of sense experience. Skinner’s version of Behaviourism serves as a model. Language learning, like all learning, is a matter of associating one thing with another, with habit formation.      

Compare this to Chomsky’s view that what language learners get from their experiences of the world can’t explain their knowledge of their language: a better explanation is that learners have an innate knowledge of a universal grammar which captures the common deep structure of all natural languages. A set of innate capacities or dispositions enables and determines their language development. In my opinion, there’s no need to go back to the historical debate between rationalists like Descartes and empiricists like Locke: indeed, I think that these comparisons are often misleading, usually because those that use them to argue for UB theories give a very distorted description of Descartes, and fail to appreciate the full implications of adopting an empiricist view. What’s important is that the empiricism adopted by Nick Ellis, Tomasello and others today is a less strict version than the original, – talk of mind and reason is not proscibed, although for them, the simpler the mechanisms employed to explain learning, the better.

Chomsky is the main target; motivated by the desire to get rid of any “black box” and any appeal to inference to the best explanation when confronted by arguments about the poverty of the stimulus, the UB theorists appeal to frequency, Zipfian distribution, power laws, and other flimsy bits and pieces in order to replace the view that language competence is knowledge of a language system which enables speakers to produce and understand an infinite number of sentences in their language, and to distinguish grammatical sentences from ungrammatical sentences; while language learning goes on in the mind, equipped with a special language learning module to help interpret the stream of input from the environment. Such a view led to theories of SLA which see the L2 learning process as crucially a psycholinguistic process involving the development of an interlanguage, where L2 learners gradually approximate to the way native speakers use the target language.

We return to Nick Ellis’s view. Language is a collection of utterances whose regularities are explained by Construction Grammar, and language learning is based on associative learning – the frequency-biased abstraction of regularities. I’ve already expressed the view that Construction Grammmar seems to me little more than a difficult to grasp taxonomy, an a posteriori attempt to classify bits of attested language use collected from corpora; while the explanation of how we learn this grammar relies on associative learning processes which do nothing to adequately explain SLA or what children know about their L1. Here’s a bit more, based on the work of Kevin Gregg, whose view of theory construction in SLA is more eloquently stated and more carefully argued than that of any scholar I’ve ever read.

N. Ellis claims that language emerges from relatively simple developmental processes being exposed to a massive and complex environment. Gregg (2003) uses the example of the concept SUBJECT to challenge Ellis’ claim.

 The concept enters into various causal relations that determine various outcomes in various  languages: the form of relative clauses in English, the  assignment  of reference in Japanese anaphoric  sentences, agreement  markers on verbs, the existence of expletives in  some languages, the form of the verb in others, the possibility of certain null arguments in still others and so on.

Ellis claims that the concept SUBJECT emerges; it’s the effect of environmental influences that act by forming associations in the speaker’s mind such that the speaker comes to have the relevant concepts as specified by the linguist’s description. But how can the environment provide the necessary information, in all languages, for all learners to acquire the concept?  What sort of environmental information could be instructive in the right ways, and how does this information act associatively?

Gregg comments:

Frankly, I do not think the emergentist has the ghost of a chance of showing this, but what I think hardly matters. The point is that so far as I can tell, no emergentist has tried. Please note that connectionist simulations, even if they were successful in generalizing beyond their training sets, are beside the point here. It is not enough to show that a connectionist model could learn such and such: In order to underwrite an emergentist claim about language learning, it has to be shown that the model uses information analogous to information that is to be found in the actual environment of a human learner. Emergentists have been slow, to say the least, to meet this challenge.

Amen to that.

So, the choice is yours. If you choose to accept Dellar’s account of language and language learning, then you base your teaching on the worst “principles” of language and language learning in print. If you choose to follow Nick Ellis’ account, then you’ll probably have to pass on trying to figure out Construction Grammar, or explaining not just what children know about their L1 with zero input from the environment, but also how associative learning explains adult L2 learning trajectories as reported in hundreds of studies over the last 50 years. If you choose to accept one or another cognitive, psycholinguistic theory of SLA which sees L2 learning as a process of deleoping interlanguages, then you are left with the problem of providing what Gregg refers to as the property theory of SLA – In what does the capacity to use an L2 consist?; What are the properties of the language which is learned in this way? Chomsky’s explanation of language and language learning might well be wrong, but it’s still the best description of language competence on offer, (language, quite simply, is not exclusively a tool for social interaction), and it’s still the best explanation of what children know about language and how they come to know it.  


Ellis, N. (2019) Essentials of a Theory of Language Cognition. The Modern Language Journal, 103 (Supplement 2019).

Seuren, P. (1996) Semantic Syntax. Oxford: Blackwell.

Wuff. S. and Ellis, N.  (2018) Usage-based approaches  to second language acquisition. Downloadable here:

A Review of “Teaching Lexically” (2016) by H. Dellar and A. Walkley

(Note: I’ve moved this post from its old place in my “stuff” to here because the old blog is getting increasingly difficult to access and to edit.)

Teaching Lexically is divided into three sections.

Part A. 

We begin with The 6 principles of how people learn languages:

“Essentially, to learn any given item of language, people need to carry out the following stages:

  • Understand the meaning of the item.
  • Hear/see an example of the item in context.
  • Approximate the sounds of the item.
  • Pay attention to the item and notice its features.
  • Do something with the item – use it in some way.
  • Repeat these steps over time, when encountering the item again in other contexts” 

These “principles” are repeated in slightly amplified form at the end of Part A, and they inform the “sets of principles” for each of the chapters in Part B.

Next, we are told about Principles of why people learn languages

These “principles” are taken en bloc from the Common European Framework of Reference for languages.  The authors argue that teachers should recognise that

“for what is probably the majority of learners, class time is basically all they may have spare for language study. [This] … “emphasises how vital it is that what happens in class meets the main linguistic wants and needs of learners, chiefly:

  • To be able to do things with their language.
  • To be able to chat to others.
  • To learn to understand others cultures better”.   

We then move to language itself.

Two Views of Language

1. Grammar + words + skills

This is the “wrong” view, which, according to Dellar and Walkley, most people in ELT hold. It says that

language can be reduced to a list of grammar structures that you can drop single words into.

The implications of this view are:

  1. Grammar is the most important area of language. …The examples used to illustrate grammar are relatively unimportant. …It doesn’t matter if an example used to illustrate a rule could not easily (or ever) be used in daily life.
  2. If words are to fit in the slots provided by grammar, it follows that learning lists of single words is all that is required, and that any word can effectively be used if it fits a particular slot.
  3. Naturalness, or the probable usage of vocabulary, is regarded as an irrelevance; students just need to grasp core meanings.
  4. Synonyms are seen as being more or less interchangeable, with only subtle shades of meaning distinguishing them.
  5. Grammar is acquired in a particular order – the so-called “buildings blocks” approach where students are introduced to “basic structures”, before moving to “more advanced ones”.
  6. Where there is a deficit in fluency or writing or reading, this may be connected to a lack of appropriate skills. These skills are seen as existing independently of language .

2. From words with words to grammar

This is the “right” view, and is based on the principle that “communication almost always depends more on vocabulary than on grammar”. The authors illustrate this view by taking the sentence

I’ve been wanting to see that film for ages.

They argue that “Saying want see film is more  likely to achieve the intended communicative message than only using what can be regarded as the grammar and function words I’ve been –ing to that for. “

The authors go on to say that in daily life the language we use is far more restricted than the infinite variety of word combinations allowed by rules of grammar. In fact, we habitually use the same chunks of language, rather than constructing novel phrases from an underlying knowledge of “grammar + single words”.  This leads the authors to argue the case for a lexical approach to teaching  and to state their agreement with Lewis’ (1993) view that

 teaching should be centred around collocation and chunks, alongside large amount of input from texts.  

They go on:

From this input a grasp of grammar ‘rules’ and correct usage would emerge. 

The authors cite Hoey’s Lexical Priming (2005) as giving theoretical support for this view of language.  They explain Hoey’s view by describing the example Hoey gives of the the two words “result” and “consequence”. While these two words are apparently synonymous, they function in quite different ways, as can be seen in statistics from corpora which show when and how they are used.  Dellar and Walkley continue:

Hoey argues that these statistical differences must come about because, when we first encounter these words (he calls such encounters ‘primings’) our brains somehow subconsciously record some or all of this kind of information about the way the words are used. Our next encounter may reaffirm – or possibly contradict – this initial priming, as will the next encounter, and the one after that – and so on. 

The authors go on to explain how Hoey uses “evidence from psycholinguistic studies” to support the claim that we remember words not as individual units, but rather, in pairings and in groups, which allows for quicker and more accurate processing. Thus,

 spoken fluency, the speed at which we read and the ease and accuracy with which we listen may all develop as a result of language users being familiar with groupings of words.

A lexical view of teaching

Dellar & Walkley urge teachers to

think of whole phrases, sentences or even ‘texts’ that students might want to say when attempting a particular task or conversation. ….. At least some of those lexical items are learnable, and some of that learning could be done with the assistance of materials before students try to have particular kinds of communication.

It seems that the biggest problem of teaching lexically is that it’s difficult for teachers to come up, in real time, with the right kind of lexical input and the right kind of questions to help students notice lexical chunks, collocations, etc.. The practicalities of teaching lexically are discussed under the heading “Pragmatism in a grammar-dominated world”, where teachers are advised to work with the coursebooks they’ve got and approach coursebook materials in a different way, focusing on the vocabulary and finding better ways of exploiting it.

The rest of Part 1 is devoted to a lexical view of vocabulary (units of meaning, collocation, co-text, genre and register, lexical sets, antonyms, word form pragmatic meanings and synonyms are discussed), a lexical view of grammar (including “words define grammar” and “grammar is all around”), and a lexical view of skills.

Part 1 ends with “A practical pedagogy for teaching and learning”, which stresses the need to consider “Naturalness, priming and non-native speakers”, and ends with “The Process”, which repeats the 6 processes introduced at the start, noting that noticing and repetition are the two stages that the lexical teacher should place the most emphasis on.

Part B offers 100 worksheets for teachers to work through. Each page shares the same format: Principle; Practising the Principle; Applying the principle. In many of the worksheets, it´s hard to find the “principle” and in most worksheets “applying the principle” involves looking for chances to teach vocabulary, particularly lexical chunks.  Here’s an example:

 Worksheet 2: Choosing words to teach.

Principle: prioritise the teaching of more frequent words.

Practicing the principle involves deciding which words in a box (government / apple for example)  are more frequent and looking at the on line Macmillan Dictionary or the British Corpus to check.

Applying the Principle involves choosing 10 words from “a word list of a unit or a vocabulary exercise that you are going to teach”, putting the words in order of frequency, checking your ideas, challenging an interested colleague with queries about frequency and “keeping a record of who wins!”

The worksheets cover teaching vocabulary lexically, teaching grammar lexically, teaching the 4 skills lexically, and recycling and revising. Many of them involve looking at the coursebook which readers are presumed to be using in their teaching, and finding ways to adapt the content to a more lexical approach to teaching. In the words of the authors,

the book is less about recipes and activities for lessons, and more about training for preparing lexical lessons with whatever materials you are using.       


Part C (10 pages long) looks at materials, teaching courses other than general English, and teacher training.


Language Learning

Let’s start with Dellar and Walkley’s account of language learning. More than 50 years of research into second language learning is “neatly summarised” by listing the 6 steps putatively involved in learning “any given item of language”.  You (1) understand the meaning, (2) hear/see an example in context, (3) approximate the sound, (4) pay attention to the item and notice its features, (5) do something with it – use it some way, and (6) then repeat these steps over time.  We’re not told what an “item” of language refers to, but we may be sure that there are tens, if not hundreds of thousands of such items, and we are asked to believe that they’re all learned, one by one, following the same 6-step process.

Bachman (1990) provides an alternative account, according to which  people learn languages by developing a complex set of competencies, as outlined in the figure below.

There remains the question of how these competencies are developed. We can compare Dellar and Walkley’s 6-step account with that offered by theories of interlanguage development (see Tarone, 2001, for a review). Language learning is, in this view, gradual, incremental and slow, sometimes taking years to accomplish. Development of the L2 involves all sorts of learning going on at the same time as learners use a variety of strategies to develop the different types of competencies shown in Bachman’s model, confronting problems of comprehension, pronunciation, grammar, lexis, idioms, fluency, appropriacy, and so on along the way. The concurrent development of the many competencies Bachman refers to exhibits plateaus, occasional movement away from, not toward, the L2, and U-shaped or zigzag trajectories rather than smooth, linear contours.  This applies not only to learning grammar, but also to lexis, and to that in-between area of malleable lexical chunks as described by Pawley and Syder.

As for lexis, explanations of SLA based on interlanguage development assert that learners have to master not just the canonical meaning of words, but also their idiosyncratic nature and their collocates. When learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. By passing through a period of incorrectness, in which the lexicon is used in a variety of ways, they climb back up the U-shaped curve. Carlucci and Case (2011) give the example of the noun ‘shop.’ Learners may first encounter the word in a sentence such as “I bought this wine at the duty free shop”. Then, they experiment with deviant utterances such as “I am going to the supermarket shop,” correctly associating the word ‘shop’ with a place they can purchase goods, but getting it wrong. By making these incorrect utterances, the learner distinguishes between what is appropriate, because “at each stage of the learning process, the learner outputs a corresponding hypothesis based on the evidence available so far” (Carlucci and Case, 2011).

Dellar and Walkley’s “Six Step” account of language learning is neither well explained nor complete. These are not, I suggest, very robust principles on which to build. The principles of why people learn are similarly flimsy. To say that people learn languages “to be able to do things with their language; to be able to chat to others; and to learn to understand others cultures better” is to say very little indeed.

Two Views of Language

Dellar & Walkley give one of the most preposterous misrepresentations of how most teachers see English grammar that I’ve ever seen in print. Recall that they describe this popular view of language as “grammar + words”, such that language can be reduced to a list of grammar structures that you can drop single words into.

In fact, grammar models of the English language, such as that found in Quirk (1985), or Swan (2001), and used in coursebooks such as Headway or English File, describe the structure of English in terms of grammar, the lexicon and phonology. These descriptions have almost nothing in common with the description given on page 9 of Teaching Lexically, which is subsequently referred to dozens of times throughout the book as if it were an accurate summary, rather than a biased straw man used to promote their own view of language. The one sentence description, and the 6 simplistic assumptions that are said to flow from it, completely fail to fairly represent grammar models of the English language.

The second view of language, the right one according to the authors, is “language = from words + words to grammar”. Given that this is the most important, the most distinguishing, feature of the whole approach to teaching lexically, you’d expect a detailed description and a careful critical evaluation of their preferred view of language. But no; what is offered is a poorly articulated inadequate summary, mixed up with one-sided arguments for teaching lexically. It’s based on Hoey’s (2005) view that the best model of language structure is the word, along with its collocational and colligational properties, so that collocation and “nesting” (words join with other primed words to form sequence) are linked to contexts and co-texts, and grammar is replaced by a network of chunks of words. There are no rules of grammar; there’s no English outside a description of the patterns we observe among those who use it. There is no right or wrong in language. It makes little sense to talk of something being ungrammatical.

This is surely a step too far; surely we need to describe language not just in terms of the performed but also in terms of the possible. Hoey argues that we should look only at attested behaviour and abandon descriptions of syntax, but, while nobody these days denies the importance of lexical chunks, very few would want to ignore the rules which guide the construction of novel, well formed sentences. After all, pace Hoey, people speaking English (including learners of English as an L2) invent millions of novel utterances every day.  They do so by making use of, among other things, grammatical knowledge.

The fact that the book devotes some attention to teaching grammar indicates that the authors recognise the existence and importance of grammar, which in turn indicates that there are limits to their adherence to Hoey’s model. But nothing is said in the book to clarify these limits. Given that Dellar and Walkley repeatedly stress that their different view of language is what drives their approach to teaching,  their failure to offer any  coherent account of their own view of language is telling. We´re left with the impression that the authors are enthusiastic purveyors of a view which they don’t fully understand and are unable to adequately describe or explain.

Teaching Lexically

1. Teaching Lexically concentrates very largely on “doing things to learners” (Breen, 1987): it’s probably the most teacher-centred book on ELT I’ve ever read. There’s no mention in the book of including students in decisions affecting what and how things are to be learned: teachers make all the decisions. They work with a pre-confected product or synthetic syllabus, usually defined by a coursebook, and they plan and execute lessons on the basis of adapting the syllabus or coursebook to a lexical approach. Students are expected to learn what is taught in the order that it’s taught, the teacher deciding the “items”, the sequence of presentation of these “items”, the recycling, the revision, and the assessment.

2.  There’s a narrowly focused, almost obsessive concentration on teaching as many lexical chunks as possible. The need to teach as much vocabulary as possible pervades the book. The chapters in Part B on teaching speaking, reading, listening and writing are driven by the same over-arching aim: look for new ways to teach more lexis, or to re-introduce lexis that has already been presented.

3. Education is seen as primarily concerned with the transmission of information. This view runs counter to the principles of learner-centred teaching, as argued by educators such as John Dewey, Sebastian Faure, Paul Friere, Ivan Illich, and Paul Goodman, and supported in the ELT field by all progressive educators who reject the view of education as the transmission of information, and, instead, see the student as a learner whose needs and opinions have to be continuously taken into account. For just one opinion, see  Weimer (2002) who argues for the need to bring about changes in the balance of power; changes in the function of course content; changes in the role of the teacher: changes in who is responsible for learning; and changes in the purpose and process of evaluation.

4. The book takes an extreme interventionist position on ELT.  Teaching Lexically involves dividing the language into items, presenting them to learners via various types of carefully-selected texts, and practising them intensively, using pattern drills, exercises and all the other means outlined in the book, including comprehension checks, error corrections and so on, before moving on to the next set of items.  As such, it mostly replicates the grammar-based PPP method it so stridently criticises. Furthermore, it sees translation into the L1 as the best  way of dealing with meaning, because it wants to get quickly on to the most important part of the process , namely memorising bits of lexis with their collocates and even co-text.  Compare this to an approach that sees the negotiation of meaning as a key aspect of language teaching, where the lesson is conducted almost entirely in English and the L1 is used  sparingly, where students have chosen for themselves some of the topics that they deal with, where they contribute some of their own texts, and where most of classroom time is given over to activities where the language is used communicatively and spontaneously, and where the teacher reacts to linguistic problems as they arise, thus respecting the learners’ ‘internal syllabus’.

Teaching Lexically sees explicit learning and explicit teaching as paramount, and it assumes that explicit knowledge, otherwise called declarative knowledge, can be converted into implicit (or procedural) knowledge through practice. These assumptions, like the assumptions that students will learn what they’re taught in the order they’re taught it, clash with SLA research findings. As Long says: “implicit and explicit learning, memory and knowledge are separate processes and systems, their end products stored in different areas of the brain” (Long, 2015, p. 44).  To assume, as Dellar and Walkley do, that the best way to teach English as an L2 is to devote the majority of classroom time to the explicit teaching and practice of pre-selected bits of the language is to fly in the face of SLA research.

Children learn languages in an implicit way – they are not consciously aware of most of what they learn about language. As for adults, all the research in SLA indicates that implicit learning is still the default learning mechanism. This suggests that teachers should devote most of the time in class to giving students comprehensible input and opportunities to communicate among themselves and with the teacher.

Nevertheless, adult L2 learners are what Long calls partially “disabled” language learners, for whom some classes of linguistic features are “fragile”. The implication is that, unless helped by some explicit instruction, they are unlikely to notice these fragile (non-salient )features, and thus not progress beyond a certain, limited, stage of proficiency.  The question is: What kind of explicit teaching helps learners progress in their trajectory towards communicative competence?  And here we arrive at lexical chunks.

Teaching Lexical Chunks

One of the most difficult parts of English for non native speakers to learn is collocation. As Long (2015, pages 307 to 316) points out in his section on lexical chunks, while children learn collocations implicitly, “collocation errors persist, even among near-native L2 speakers resident in the target language environment for decades.” Long cites Boers work, which suggests a number of reasons for why L2 collocations constitute such a major learning  problem, including L1 interference, the semantic vagueness of many collocations, the fact that collocates for some words vary , and the fact that some collocations look deceptively similar.

The size and scope of the collocations problem can be appreciated by considering findings on the lesser task of word learning. Long cites work by Nation (2006) and Nation and Chung (2009) who have have calculated that learners require knowledge of between 6000 and 7000 word families for adequate comprehension of speech and 9000 for reading. Intentional vocabulary learning has been shown to be more effective than incidental learning in the short tem, but, the authors conclude, “there is nowhere near enough time to handle so many items in class that way”.  The conclusion is that massive amounts of extensive reading outside class, but scaffolded by teachers, is the best solution.

As for lexical chunks, there are very large numbers of such items, probably hundreds of thousands of them. As Swan (2006) points out, “memorising 10 lexical chunks a day, a learner would take nearly 30 years to achieve a good command of 10,000 of them”. So how does one select which chunks to explicitly teach, and how does one teach them? The most sensible course of action would seem to be to base selection on frequency , but there are problems with such a simple criterion, not the least being the needs of the set of students in the classroom. Although Dellar and Walkley acknowledge the criterion of frequency, Teaching Lexically gives very little discussion of it, and there is very little clear or helpful advice offered about what lexical chunks to select for explicit teaching, – see the worksheet cited at the start of this review. The general line seems to be: work with the material you have, and look for the lexical chunks that occur in the texts, or that are related to the words in the texts. This is clearly not a satisfactory criterion for selection.

The other important question that Teaching Lexically does not give any well considered answer to  is: how best to facilitate the learning of lexical chunks?  Dellar and Walkley could start by addressing the problem of how their endorsement of Hoey’s theory of language learning, and Hoey’s “100% endorsement” of Krashen’s Natural Approach, fit with their own view that explicit instruction in lexical chunks should be the most important part of classroom based instruction. The claim that they are just speeding up the natural, unconscious process doesn’t bear examination because two completely different systems of learning are being conflated. Dellar and Walkley take what’s called a “strong interface” position, whereas Krashen and Hoey take the opposite view. Dellar and Walkley make conscious noticing the main plank in their teaching approach, which contradicts Hoey’s claim that lexical priming is a subconscious process.

Next, Dellar and Walkley make no mention of the fact that learning lexical chunks is one of the most challenging aspects of learning English as an L2 for adult learners.  Neither do they discuss the questions related to the teachability of lexical chunks that have been raised by scholars like Boers (who confesses that he doesn’t know the answer to the problems they have identified about how to teach lexical chunks). The authors of Teaching Lexically blithely assume that drawing attention to features of language (by underlining them, mentioning them and so on), and making students aware of collocations, co-text, colligations, antonyms, etc., (by giving students (repeated) exposure to carefully-chosen written and spoken texts, using drills, concept questions, input flood, bottom-up comprehension questions, and so on) will allow the explicit knowledge taught to become fully proceduralised.  Quite apart from the question of how many chunks a teacher is expected to treat so exhaustively, there are good reasons to question the assumption that such instruction will have the desired result.

In a section of his book on TBLT, Long (2015) discusses his 5th methodological principle: “Encourage inductive ·chunk” learning”.  Note that Long discusses 10 methodological principles, and sees teaching lexical chunks as an important but minor part of the teacher’s job. The most important concluson that Long comes to is that there is, as yet, no satisfactory answer to “the $64,000 dollar question: how best to facilitate chunk learning”.  Long’s discussion of explicit approaches to teaching collocations includes the following points:

  • Trying to teach thousands of chunks is out of the question.
  • Drawing learners attention to formulaic strings does not necessarily lead to memory traces usable in subsequent receptive L2 use, and in any case there are far too many to deal with in that way.
  • Getting learners to look at corpora and identify chunks has failed to produce measurable advantages.
  • Activities to get learners to concentrate on collocations on their own have had poor results.
  • Grouping collocations thematically increases the learning load (decreasing transfer to long term memory) and so does presentation of groups which share synonymous collocates, such as make and do.
  • Exposure to input floods where collocations are frequently repeated has poor results.
  • Commercially published ELT material designed to teach collocations have varying results. For example, when lists of verbs in one column are to be matched with nouns in another, this inevitably produces some erroneous groupings that, even when corrective feedback is available, can be expected to leave unhelpful memory traces.
  • It is clear that encouraging inductive chunk learning is well motivated, but it is equally unclear how best to realise it in practice, i.e., which pedagogical procedures to call upon.


Teaching Lexically is based on a poorly articulated view of the English language and on a flimsy account of second language learning. It claims that language is best seen as lexically driven, that a grasp of grammar ‘rules’ and correct usage will emerge from studying lexical chunks, that spoken fluency, the speed at which we read, and the ease and accuracy with which we listen will all develop as a result of language users being familiar with groupings of words, and that therefore, the teaching of lexical chunks should be the most important part of a classrooms teacher’s job. These claims often rely on mere assertions, and include straw man fallacies, cherry picking the evidence of research findings and ignoring counter evidence. The case made for this view of teaching is in my opinion, entirely unconvincing. The concentration on just one small part of what’s involved in language teaching, and the lack of any well considered discussion of the problems associated with teaching lexical chunks, are seriously flaws in the book’s treatment of an interesting topic.


Bachman, L. (1990). Fundamental considerations in language testingOxford University Press.

Breen, M. (1987) Contemporary Paradigms in Syllabus Design, Parts 1 and 2. Language Teaching 20 (02) and 20 (03).

Carlucci, L. and Case, J.  (2013)  On the Necessity of U-Shaped Learning. Topics.

Hoey, M.(2005) Lexical Priming. Routeledge.

Long, M. (2015) Second Language Acquisition and Task Based Language Teaching. Wiley.

Swan, M. (2006) Chunks in the classroom: let’s not go overboard. The Teacher Trainer, 20/3.

Tarone, E. (2001), Interlanguage. In R. Mesthrie (Ed.). Concise Encyclopedia of Sociolinguistics. (pp. 475–481) Oxford: Elsevier Science.

Weimer, M. (2002) Learner-Centered Teaching. Retrieved from  3/09/2016