Last  September, Neil McMillan and his lovely wife and daughter spent the weekend with us at our house. The house is described by Dellar as “a huge, decadent, sprawling mansion bought by Geoff’s billionaire wife for a song from gullible locals, full of rare paintings by Spanish artists like Rubens and Caravaggio, looked after by scores of starving slaves smuggled in from Brixton, where many of my friends live, yeah?”. It is, in fact, a restored ‘casa rustica’, which offers guests a comfortable bed and a shower down the corridor. Neil gets a bedroom with an en suite bathroom, but that’s because he sweats a lot.

Anyway, during the weekend, Neil and I had the chance to talk into the wee hours (as he insists on calling them) about my favourite subject: the life of the profane. The profane are outsiders: those who, whatever tribe, clan, religious grouping or nation they’re supposed to be part of, just don’t “get” what belonging means. One important part of their dislocation is their relationship with the inanimate world. Somehow, not getting the nuances of social norms, not respecting them because they don’t make sense (a far cry from those dedicated rebels who early on reject them) has a profound effect on how they walk through life. Existential literature concentrates on “being before essence” – we make ourselves up as we go along – but it doesn’t pay enough attention to existence when it comes to dealing with the “things” that we interact with.  I’ve always been fascinated by the way some people waltz through life, effortlessly engaging with the inanimate world; walking down stairs hardly bothering with the banisters; nonchalantly catching the piece of toast that pops up from the toaster; pushing, never pulling, the doors that need pushing; stepping adroitly onto buses and out of taxis;  slotting their credit cards the right way up into the right slot; pressing the right buttons in lifts; and all that and all that. Generally, they stroll along unaware of obstacles: they automatically turn the right key in the right way in the right lock, so to speak.

Compare that to the life of the profane: those whose lives are marked by exactly the opposite experience of daily life. It’s not just a question of being clumsy, it’s that the inanimate world seems to conspire against them. An  extreme example is Inspector Jacques Clouseau, he of the Pink Panther films. When Clouseau walks down the stairs, his sleeve gets caught in the banister; when he tries to catch the toast, he misses; when he uses a cigarette lighter he sets himself on fire; when he pushes the door, it smacks him in the face, and on and on. He turns the wrong key in the wrong way in the wrong lock. The inanimate world is out to get him: he’s the constant ‘fall guy’, the victim, the unfairly done to, one might say.

Another good example is Benny Profane, the hero of Thomas Pynchon’s novel “V”. He’s not called “Profane” for nothing (Pynchon is never in want of a name for his characters): he’s called Profane because he’s not on the inside, he’s not in the know, he’s his own hopeless, honest self, not finely-tuned enough to the way society works. So he’s the perfect vehicle to walk through Pynchon’s marvellous novel; who better to stumble through everything that happens, an innocent non protagonist if ever there was one. And an essential feature of his character is his constant bumping up against the inanimate world as if it were hostile, though no silly conspiracy theories are ever invoked. The inanimate world is constantly waiting to play trivial or life-threatening tricks on him; lamp posts are badly placed, well made beds collapse, phones don’t work;  buses aren’t there when they should be; street signs point the wrong way; numbers on houses are out of synch. A great scene in “V” is when Benny, standing in an empty street, annoyed at something, kicks the wheel of a car. “I’ll pay for that”, he says to himself.

Profanity is described in dictionaries as  ‘bad language’, but its etymology goes back to ‘lack of respect for things that are held to be sacred’. And there’s the clue. Profanity, the thing that Neil and I wanted to discuss that night, is better described as dislocation, an inability to  “get”  what this respect for sacred things is all about. Never articulated, it stems from an inability to come to terms with the ways things are. Why does the social world we live in pretend to respect so many things that it so obviously flouts? Why is our society so horrendously hypocritical? Why does a third of the world’s population live in such horrendous conditions? Why … well, you get the idea – although Inspector Clouseau and Benny don’t.

Of course, in any political sense, the vast majority of the world’s population is profane –  outside the fold – and that, no doubt, should be the focus of our attention. In psychological terms, Neil’s heroes – Foucault, Derrida, and Lacan particularly – insist on profanity (the rejection of respect) when examining how individuals experience their lives emotionally and intellectually. Lacan returns to Freud, but famously does something which strikes me as similar to what Marx did to Hegel. (I know a bit about what Marx did to Hegel, but I know as much about Lacan as Neil has forgotten while eating a deep fried Mars bar, so it’s probably all bollocks, and I hope he’ll reply.) Lacan’s Mirror stage claims that the ego is an object rather than a subject: the ego is not the seat of a free, true “I” determining its own fate, rather it’s neurotic to its very core. The ego is something exterior – crystallized as “the desire of the Other”. Amen to that.  

I take this to be one of many theories of alienation – which have in common that we are, as it were, besides ourselves, lacking authenticity. My favourite attempt among philosophers to “explain” this has always been Camus’; the least philosophically sophisticated, the most appealing somehow (a bit like Krashen’s theory of SLA, maybe!). Alienation is our biggest problem, and to get over it, we need to live in societies described best by anarchists, which means we need a revolution which overturns the State.  

Meanwhile, what about the particular manifestation of alienation that Neil and I were talking about, that profane, awkward, annoying bumping up against the inanimate world? How can we negotiate the inanimate world more smoothly? How can we avoid so many infuriating encounters with the stuff around us? How can we avoid our sleeves getting snagged on banisters? How can we nonchalantly walk through those revolving doors? How can we turn the right key the right way in the right lock?  Only revolution will do it. We can’t be who we want to be while capitalism rules us. But maybe we can learn from Eastern stuff – Zen and all that. The Beatles’ “Let it be” is probably the most stupid song ever sung, but Zen and Taoist texts are full of good advice.  I think of things like “Saunter along and stop worrying”… “If the mind troubles the mind how can you avoid a great confusion”, which I’m sure are misquotes. They suggest that we can alter our behaviour, put the right key in the right door because we don’t care or something. And maybe, just maybe, challenge Lacan’s view of us.    

I hope my chum Neil will respond.       

Carroll’s AIT: Part 5

I’m aware that I haven’t done a good job of describing Carroll’s AIT. Last week, I bought a Kindle version of Sharwood Smith and Truscott’s (2014) The Multilingual Mind  (currently on offer for 19 euros – hurry, hurry) which presents their MOGUL (Modular On Line Growth and Use of Language) theory, and I’m very inpressed with its coherence, cohesion and readability. It relies partly on Jackendoff, and briefly describes Carroll’s AIT much more succinctly than I’ve managed. I highly recommend the book,  

I’ll continue examing bits of AIT and its implications, before trying to make sense of the whole thing and reflect on some of the disagreements among those working in the field of SLA. In this post, I’ll look at Carroll’s AIT in order to question the use by many SLA theories of the constructs of input, intake, and noticing.  

Recall that Jakendoff’s system comprises two types of processor:

  • integrative processors, which build complex structures from the input they receive, and
  • interface processors, which relate the workings of adjacent modules in a chain.

The chain consists of three main links: phonological, syntactic, and conceptual/semantic structure, each level having an integrative processor  connected to the adjacent level by means of an interface processor.

Carroll takes Jackendoff’s theory and argues that input must be seen in conjunction with a theory of language processing: input is the representation that is received by one processor in the chain.


 Thus, Carroll argues, the view of ‘input from outside’ is mistaken: input is a multiple phenomenon where each processor has its own input, which is why Carroll refers to ‘environmental stimului’ to denote the standard way in which ‘input’ is seen. Stimuli only become input as the result of processing, and learning is a function not of the input itself,  but rather of what the system does with the stimuli. In order to explain SLA, we must explain this system. Carroll’s criticism is that the construct ‘input’ is used in many theories of SLA as a cover term which hides an extremely complex process, beginning with the processing of acoustic (and visual) events as detected by the learner’s sensory processing mechanisms.

Carroll’s view is summarised by Sharwood Smith and Truscott, 2014, p. 212) as follows: 

The learner initially parses an L2 using L1 parsing procedures and when this inevitably leads to failure, acquisition mechanisms are triggered and i-learning begins. New parsing procedures for L2 are created and compete with L1 procedures and only win out when their activation threshold has become sufficiently low. These new inferential procedures, adapted from proposals by Holland et al. (1986), are created within the constraints imposed by the particular level at which failure has taken place. This means that a failure to parse in PS [Phonological Stuctures], for example, will trigger i-learning that works with PS representations that are currently active in the parse and it does so entirely in terms of innately given PS representations and constraints, hence the ‘autonomous’ characterisation of AIT (Holland et al. 1986, Carroll 2001: 241–2).

Of course, the same process described for PS is applied to each level of processing.

Both parsing, which takes place in working memory, and i-learning, which draws on long-term memory, depend, to some extent on innate knowledge of the sort described by Jackendoff, where the lexicon plays a key role, incorporating the rules of grammar and encoding relationships among words and among grammatical patterns.

Jackendoff’s theory supposes that just about everything going on in the language faculty happens unconsciously – it’s all implicit learning – and Carroll’s use of Holland et al.’s theory of induction is similarly based on implicit learning.     

So what does all this say about other processing theories of SLA?

Here’s Krashen’s model:

Any comprehensible input that passes through the affective filter, gets processed by UG and becomes acquired knowledge. Learnt knowledge acts as a monitor. Despite the fact that it has tremendous appeal, the model is unsatisfactory because  none of these stages is described by clear constructs, and the theory is hopelessly circular. McLaughlin (1978) and Gregg (1984) provide the best critique of this model.

Then we have Schmidt’s Noticing Hypothesis

Here again, input is never carefully defined – it’s just the language that the learner hears or reads. What’s important, for Schmidt, is “noticing”. This is a gateway to “intake”, defined as that part of the input consciously registered and worked on by the learner in “short/medium-term memory (I take this to be working memory) and which then gets integrated into long-term memory, where it develops the interlanguage of the learner. So noticing is the necessary and sufficient condition for L2 learning.

I’ve done several posts on Schmidt’s Noticing Hypothesis (search for them in the Search bar on the right), so here let me just say that it’s now generally accepted that ‘noticing’, in the sense of conscious attention to form, is not a necessary condition for learning an L2: the hypothesis is false. The ammended, much weaker version, namely that  “the more you notice the more you learn” is a way of rescuing the hypothesis, and has been, in my opinion, too quickly accepted by SLA scholars.  I’m personally not convinced that even this weak version can be accepted; it needs careful refinement, surely. In any case, Schmidt’s model makes the same mistake as Krashen’s: in starting with an undefined construct of input, it puts the cart before the horse. (Note that this has nothing to do with Scott Thornbury’s judgement on UG, as discussed in an earlier post.) As Carroll says, we must start with stimuli, not input, and then explain how those stimuli are processed.

Finally, there’s Gass’s Model (1997), which offers a more complete picture of what happens to ‘input’.

Gass says that input goes through stages of apperceived input, comprehended input, intake, integration, and output, thus subdividing Krashen’s comprehensible input into three stages: apperceived input, comprehended input, and intake. Gass stresses the importance of negotiated interaction in facilating the progress from apperceived input to comprehended input, adopting Long’s construct of negotiation for meaning which refers to what learners do when there’s a failure in communicative interaction. As a result of this negotiation, learners get more “usable input”, they give “attention” (of some sort) to problematic features in the L2, and make mental comparisons between their IL and the L2 which leads to refinement of their current interlanguage.  

But still, what is ‘apperceived input’?  Gass says it’s the result of ‘attention’, akin to  Tomlin and Villa’s (1994) construct of ‘orientation’; and Schmidt says it’s the same as his construct of ‘noticing’. So is it a concious process, then, taking place in working memory? Just to finish the story, Long, in Part 4 of this exploration of AIT says this:  

Genuinely communicative L2 interaction provides opportunities for learners focused on meaning to pick up a new language incidentally, as an unintended by-product of doing something else — communicating through the new language — and often also implicitly, i.e., without awareness. Interacting in the L2 while focused on what is said, learners sometimes perceive new forms or form-meaning-function associations consciously — a process referred to as noticing (Schmidt, 1990, 2010). On other occasions, their focus on communication and attention to the task at hand is such that they will perceive new items in the input unconsciously — a process known as detection (Tomlin & Villa, 1994). Detection is especially important, for as Whong, Gil, & Marsden (2014) point out, implicit learning and (barring some sort of consciousness-raising event)the end-product, implicit knowledge, is what is required for real-time listening and speaking.  

Long makes a distinction betwween ‘implicit’ and ‘incidental’ learning. ‘Implicit’ means unconscious, while ‘incidental’, I think, means conscious, and refers to Schmidt’s ‘noticing’.  Long then says that ‘implicit’ learning, whereby “learners perceive new items in the input unconsciously” and is explained by “a process known as detection (Tomlin & Villa, 1994)”, is “especially important”, because “implicit knowledge is what is required for real-time listening and speaking”.

So here we have yet another construct: ‘detection’. ‘Detection’ is the final part of Tomlin and Villa’s  (1994) three-part process of ‘attention’.  Note first that they claim that ‘awareness’( defined as “the subjective experience of any cognitive or external stimulus,” (p. 194), and which is the crucial part of Schmidt’s ‘noticing’ construct)  can be dissociated from attention, and that awareness is not required for attention. With regard to attention, three functions are involved: alertness, orientation, and detection.

Alertness = an overall, general readiness to deal with incoming stimuli or data.

Orientation = the attentional process responsible for directing attentional resources to some type or class of sensory information at the exclusion of others. When attention is directed to a particular piece of information, detection is facilitated.

Detection = “the cognitive registration of sensory stimuli” and is “the process that selects, or engages, a particular and specific bit of information” (Tomlin & Villa, 1994, p. 192). Detection is responsible for intake of L2 input: detected information gets  further processing.

Gass claims that “apperceived input”  is conscious, the same as Tomlin and Villa’s ‘orienation’, but is Gass’s third stage ‘comprehended input’ the same as ‘Tomlin and Villa’s ‘detection’?  Well, perhaps. ‘Comprehended input’ is “potential intake”  – it’s information which has the possibility of being matched against existing stored knowledge, ready for the next stage, ‘integration’ where the new information can be used for confirmation or reformulation of existing hypotheses. However, if detection is unconscious, then comprehended input is also unconscious, but Gass insists that comprehended input is partly the result of negotiation of meaning, which, Long insists involves not just detection but also noticing.     

This breaking down of the construct of attention into more precise parts is supposed to refine Schmidt’s work. Schmidt starts with the problem of conscious versus uncoscious learning, and breaks ‘consciousness’ down into 3 parts: consciousness as awareness; consciousness as intention; and consciousness as knowledge. As to awareness, Schmidt distinguishes between three levels: Perception, Noticing and Understanding, and the second level, ‘noticing’, is the key to Schmidt’s eventual hypothesis. Noticing is focal awareness. Trying to solve the problem of how ‘input’ becomes ‘intake’, Schmidt’s answer is crystal clear, at least in its initial formulation: ‘intake’ is “that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139)”. And the hypothesis that ‘noticing’ drives L2 learning is plain wrong.

Tomlin and Villa want to recast Schmidt’s construct of noticing as “detection within selective attention”.

Acquisition requires detection, but such detection does not require awareness. Awareness plays a potential support role for detection, helping to set up the circumstances for detection, but it does not directly lead to detection itself. In the same vein, the notion of attention to form articulated by VanPatten (1989, 1990, in press) seems very much akin to the notion of orientation in the attentional literature; that is, the learner may bias attentional resources to linguistic form, increasing the likelihood of detecting formal distinctions but perhaps at the cost of failing to detect other components of input utterances. Finally, input enhancement in instruction, the bringing to awareness of critical form distinctions, may represent one way of heightening the chances of detection. Meta- descriptions of linguistic form may help orient the learner to salient formal distinctions. Input flooding may increase the chances of detection by increasing the opportunities for it” (Tomlin and Villa, 1994, p. 199).

Well, as far as as forming part of a coherent part of a theory of SLA is concerned, I don’t think Tomlin and Villa’s treatment of attention stands up to scrutiny, for all sorts of reasons, many of them teased out by Carroll. Nevertheless, the motivation for this detailed attempt to understand attention, apart from carrying on the work of refining processing theory, is clearly revealed in the above quote: what’s being proposed is that L2 learning is mostly implicit, but that this implicit learning needs to be supplemented by occasional, crucial, conscious attention to form, which triggers ‘orienation’ and enables ‘detection’. An obvious pay off is the improved efficaciousness of teaching! And that, I think, is at the heart of Mike Long’s view  – and of Nick Ellis’ , too.

But it doesn’t do what Carroll (2001, p. 39) insists a theory of SLA should do, namely give

  1. a theory of linguistic knowledge;
  2. a theory of knowledge reconstruction;
  3. a theory of linguistic processing
  4. a theory of learning.  

When I look (yet again!) at Chapter Three of Long’s (2015) book SLA & TBLT, I find that his eloquently described  “Cognitive-Interactionist Theory of SLA” relies on carefully selected “Problems and Explanations”. It’s prime concern is “Instructed SLA”, and it revolves around the problem of why most adult L2 learning is “largely unsuccessful”. It’s not an attempt to construct a full theory of SLA, and I’m quite sure that Long knew exactly what he was doing when he confined himself to articulating his four problems and eight explanations. Maybe this also explains Mike’s comment in the recent post here:

I side with Nick Ellis and the UB (not UG) hordes. Since learning a new language is far too large and too complex a task to be handled explicitly, and although it requires more input and time, implicit learning remains the default learning mechanism for adults.

This looks to me like a good indication of the way things are going.

“Do you fancy a bit more wine?” my wife asks, proffering a bottle of chilled 2019 Viña Esmeralda (a cheeky, very fruity wine; always get the most recent year). “Is the Pope a Catholic?” says I, wishing I had Neil McMillan’s ready ability to come up with a more amusing quote from Pynchon.      


Carroll, S. (1997) Putting ‘input’ in its proper place. Second Language Research 15,4; pp. 337–388.

Carroll, S. (2001) Input and Evidence. Amsterdam, Bejamins.

Gass, S. (1997)  Input, Interaction and the Second Language Learner. Marwash, N. J. Lawrence Erlbaum Associates.

Gregg, K. R. (1984) Krashen’s monitor and Occam’s razor. Applied Linguistics 5, 79-100.

Krashen, S. (1985) The Input Hypothesis: Issues and Implications. New York: Longman.

Long, M. H. (2015). Second language acquisition and Task-Based Language Teaching. Oxford: Wiley-Blackwell.

McLaughlin, B. (1987) Theories of Second Language Learning.  London: Edward Arnold.

Schmidt, R. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129-58.

Schmidt, R. (2001) Attention.  In Robinson, P. (ed.) Cognition and Second Language Instruction.  Cambridge: Cambridge University Press, 3-32.

Sharwood Smith, M., & Truscott, J. (2014). The Multilingual Mind. In The Multilingual Mind: A Modular Processing Perspective . Cambridge: Cambridge University Press.

Tomlin, R., & Villa, H. (1994). Attention in cognitive science and second language acquisition. Studies in Second Language Acquisition, 16, 183-203.

Scott Thornbury on UG and UB Theories of SLA

This is a parenthesis. I’d planned to devote Part 4 of ‘Carroll’s AIT’ to assessing how successfully it bridges the gap between nativist and emergentist views of SLA, but I pause so as to answer Scott Thornbury’s question on Twitter (18th May), which amounts to asking why a theory of SLA has to start by describing what it wants to explain.

Really! I mean, you’d have thought a bright, well-read chap like Scott already knew, right? But he doesn’t, and the problem is that he’s a very influential voice in ELT, likely to have a potentially damaging influence on his legions of followers, some of whom might not detect when – just now and then – he talks baloney. Chomsky’s work seems to be a particular blind spot for Scott; he has a very bad record indeed when it comes to writing and talking about Universal Grammar. I’m told by my friend Neil that he’s not much better when it comes to appreciating Lacan’s paradigm shifting critique of Breton’s Surrealist Manifestos, either (see McMillan Mirroring Despair Among the ELT Ungodly, in press). Nemo Sine Vitiis Est, eh what.

The Question

In Part 3, resuming the story so far, I said “A theory of SLA must start from a theory of grammar”. Soon afterwards, Scott Thorbury tweeted, on that highest, silliest horse of his, which, thankfully, he reserves for his discussion of Chomsky:

“A theory of SLA must start from a theory of grammar.” Why? Who said? I’d argue the reverse ‘A theory of grammar must start from a theory of [S/F]LA’. Grammar is the way it is because of the way languages are learned and used.

Other gems in his tweets included:

The theory comes later. Induction, dear boy. Empirical linguistics, by another name.

Chomsky’s error was to start with a theory of grammar and then extrapolate a theory of language acquisition to fit. Cart before horse.

“The theory of UG…is an innate property of the human mind. In principle, we should be able to account for it in terms of human biology.” In principle, maybe. In practice, we can’t. That’s cart-before-horsism.

UG a powerful theory?’ Based on made-up and often improbable data? Incapable of explaining variability & change? Creationism is a powerful theory, too – if you ignore the fossil evidence.

Luckily, I was sitting in a comfortable chair, joyfully making my way through a crate of chilled, cheeky young Penedes rosé wines (delivered to me by an illegal supplier using drones “lent” to him by Catalan extremists intent on breaking the Madrid inspired lockdown), when I read these extraordinary remarks. Otherwise, I might have stopped breathing. Anyway, let’s try to answer Scott’s tweets.

White (1996: 85) points out:

A theory of language acquisition depends on a theory of language. We cannot decide how something is acquired without having an idea of what that something is.

Carroll (2001) and Gregg (1993) agree: a theory of SLA has to consist of two parts:

1) the explanandum: a property theory which describes WHAT is learned. It describes what the learner knows; what makes up a learner’s mental grammar. It consists of various classifications of the components of the language, the L2, and how they work together.

2) the explanans: a transition theory which explains HOW that knowledge is learned.

Property Theories

So WHAT is human language? I’ve dealt with this in a previous post, so suffice it to say here that it’s a system of form-meaning mappings, where meaningless elementary structures of language combine to make meaning; the sounds of a language combinine to make words, and the words combine to make sentences. There are various property theories,  but let’s focus on just two: UG and Construction Grammar, the second being the description of language offered by emergentists, who take a Usage-based (UB) appproach to a theory of SLA.  


Chomsky’s model of language distinguishes between competence and performance; between the description of underlying knowledge, and the use of language. Chomsky refers to the underlying knowledge of language which is acquired as “I-Language”, and distinguishes it from “E-Language”, which is everyday speech – performance data of the sort you get from a corpus of oral texts.

“I-Language” obeys rules of Universal Grammar, among which are structure dependency, C-command and government theory, and binding theory. These are among the principles of UG grammar and they operate with certain open parameters which are fixed as the result of input to the learner. As the parameters are fixed, the core grammar is established. The principles are universal properties of syntax which constrain learners’ grammars, while parameters account for cross-linguistic syntactic variation, and parameter setting leads to the construction of a core grammar where all relevant UG principles are instantiated.     

UB Construction Grammar

The basic units of language representation are Constructions, which are form-meaning mappings. They are symbolic: their defining properties of morphological, syntactic, and lexical form are associated with particular semantic, pragmatic, and discourse functions. Constructions comprise concrete and particular items (as in words and idioms), more abstract classes of items (as in word classes and abstract constructions), or complex combinations of concrete and abstract pieces of language (as mixed constructions). Constructions may be simultaneously represented and stored in multiple forms, at various levels of abstraction (e.g., concrete item: table+s = tables and [Noun] + (morpheme +s) = plural things). Linguistic constructions (such as the caused motion construction, X causes Y to move Z path/loc [Subj V Obj Obl]) can thus be meaningful linguistic symbols in their own right, existing independently of particular verbs. Nevertheless, constructions and the particular verb tokens that occupy them resonate together, and grammar and lexis are inseparable (Ellis and Cadierno, 2009).

Transition theories  

So HOW do we learn an L2?


UG offers no L2 transition theory, “y punto”, as they say in Spanish. UG says that all human beings are born with an innate grammar – a fixed set of mental rules that enables children to create and utter sentences they have never heard before. Thus, language learning is faciliated by innate knowledge of a set of abstract principles that characterise the core grammars of all natural languages. This knowledge constrains possible grammar formation in such a way that children do not have to learn those features of the particular language to which they are exposed that are universal, because they know them already. This “boot-strapping” device, sometimes referred to as the LanguageAcquisition Device” (LAD) is the best explanation of how children know so much more about their L1 than can be got from the language they are exposed to. The ‘poverty of the stimulus argument’ is summed up by White:   

Despite the fact that certain properties of language are not explicit in the input, native speakers end up with a complex grammar that goes far beyond the input, resulting in knowledge of grammaticality, ungrammaticality, ambiguity, paraphrase relations, and various subtle and complex phenomena, suggesting that universal principles must mediate acquisition and shape knowledge of language (White 1989: 37).

Note that this refers to L1 acquisition. Those who take a UG view of SLA concentrate on the re-setting of parameters to partly explain how the L2 is learnt.  

Just by the way, I suppose I should deal with Scott’s accusations that UG is “based on made-up and often improbable data” which is “incapable of explaining variability & change”. First, the data pertaining to UG are contained in more than 60 years of research studies, hundreds of thousands of them, the results of which scholars (including UB theorists such as Nick Ellis, Tomasello and Larsen-Freeman) acknowledge as having contributed more to the advancement of science than those motivated by any other linguist in history. Second, that UG is incapable of explaining variability and change is hardly surprising, since it doesn’t attempt to, any more than Darwin’s theory attempts to explain tsunamis. To coin a phrase: It’s a question of domains, dear boy.    


The usage-based theory of language learning is based on associative learning –  “acquisition of language is exemplar based”. (Ellis, 2002: 143). “A huge collection of memories of previously experienced utterances” underlies the fluent use of language. Thus, language learning as “the gradual strengthening of associations between co-occurring elements of the language”, and fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173). 

Note that this is part of a general learning theory. Those who take a UB view of SLA see it as affected by L1 learning.  


To paraphrase Gregg (2003), nativist (UG) theories posit an innate representational system specific to the language faculty, and non-associative mechanisms, as well as associative ones, for bringing that system to bear on input to create an L2 grammar. UB theories deny both the innateness of linguistic representations  and any domain-specific  language learning mechanisms. For UB, input from the environment, plus elementary processes of association, are enough to explain SLA.

Clearing up the muddle about UG

Gregg (2003) discusses four “red herrings” used by those arguing against UG. I’ll paraphrase two of them, because they address Scott’s remarks.

The Argument from Vacuity

The argument is: calling a property ‘innate’ does not solve anything: it simply calls a halt to investigation of the property.

First, innate properties are generally assumed in science  – circulation of the blood is one such property. As for language, the jury is still out. Thus it is question-begging to argue that calling UG innate prevents us from investigating how language is learned.

Second, criticising the ‘innateness hypothesis’, often rests on a caricature of the argument from the Poverty of the Stimulus (POS), viz., ‘Property P cannot be learned; therefore it is innate’. But in fact the POS argument is more nuanced:

1)         An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.

2)         The correct set of principles need not be (and typically is not) in any pretheoretic sense simpler or more natural than the alternatives.

3)         The data that would be needed for choosing  among these sets of principles are in many cases not the sort of data that are available to an empiricist learner.

4)         So if children were empiricist learners, they could not reliably arrive at the correct grammar for their language.

5)         Children do reliably arrive at the correct grammar for their language.

6)         Therefore, children are not empiricist learners.

Emergentists need to show that the POS argument for language acquisition is false, by showing that empiricist learning suffices for language acquisition. In other words, they need to show that the stimuli are not impoverished, that the environment is indeed rich enough, and rich in the right ways, to bring about the emergence of linguistic competence (Gregg, 2003, p. 101).

The Argument from Neuroscience

As MacWhinney puts it, ‘Nativism is wrong because it makes untestable assumptions about genetics and unreasonable assumptions about the hard-coding of complex formal rules in neural tissue’ (2000: 728) (Gregg, 2003, p.103).

As Gregg says, the claim that there is an innate UG is a claim about the mind, not about the brain. If brain science could show that it is impossible to instantiate UG in a brain, the claim of an innate UG would clearly fail, but brain science has not shown any such impossibility; indeed, brain science has not yet been able to show how any cognitive capacity is instantiated. Thus, pace Scott Thornbury, to date, neuroscience cannot support any emergentist claim about the development of interlanguages. Furthermore, Scott seems to think that neurological explanations are somehow more ‘real’ or ‘basic’ than cognitive explanations; which is, in Gregg’s opinion (2003, p. 104) “a serious mistake”.

It is not simply that the current state of the art does not yet permit us to propose theories of  language acquisition at the neural level; it is rather that the neural level is likely not ever to be the appropriate level at which to  account for cognitive capacities like language, any more than the physical level is the appropriate level at which to account for the phenomena of economics….  There is no reason whatever for thinking that the neural  level  is now, or ever will be, the level where the best explanation  of language competence or language acquisition is to be found. In short, whether there is  a UG, and if  there is, whether it is  innate, are definitely open questions; but they cannot be answered in the negative merely by appealing to neural implausibility.

UB Theories

Scott has expressed his enthusisam for UB theories for some time now, but he has done little to support this enthusiasm with rational argument. I’ve commented on the limitations of his grasp of the issues in other posts (see, for example, Thornbury on Performance, and Thornbury Part 1 so it’s enough to say here that he fails to adequately address the criticisms of UB theories made by many scholars, including Carroll and Gregg for example. The basic problems that UB theories face are these:

Property theory: UB theory suggests that SLA is explained by the learner’s ability to do distributional analyses and to remember the products of the analyses. So why do they accept the validity of the linguist’s account of grammatical structure? And what bits do they accept? Ellis accepts NOUN, PHRASE STRUCTURE and STRUCTURE- DEPENDENCE, for example. As Gregg comments “Presumably the linguist’s descriptions simply serve to indicate what statistical associations are relevant in a given language, hence what sorts of things need to be shown to ‘emerge’.

Transition Theory: Language is acquired through associative learning, through, what Nick Ellis calls ‘learners’ lifetime analysis of the distributional characteristics of the input’ and the ‘piecemeal learning of thousands of constructions and the frequency-biased abstraction  of  regularities’. To borrow from Scott’s screaming protests about claims for UG “Where’s the evidence?” . Well the only evidence is the models of associative learning processes provided by connectionist  networks. But, as Gregg (2003) so persusively demonstrates, these connectionist models provide very little evidence to support the emergentist transition theory. Scott has made no attempt to reply to Gregg’s criticisms. Let me just give one part of Gregg’s argument, the part that deals with the connectionist claim that their models are ‘neurally inspired’.

The use of the term ‘neural network’ to denote connectionist models is perhaps the most successful case of false advertising since the term ‘pro-life’. Virtually no modeller actually makes any specific claims about analogies between the model and the brain, and for good reason: As Marinov says, ‘Connectionist systems, … have contributed essentially no insight into how knowledge is represented in the brain’ (Marinov, 1993: 256) Christiansen and Chater, who are themselves connectionists, put it more strongly: ‘But connectionist nets are not realistic models of the brain . . ., either at the level of individual processing unit, which drastically oversimplifies and knowingly falsifies many features of real neurons, or in terms of network structure, which typically bears no relation to brain architecture’ (1999: 419). In particular, it should be noted that backpropagation, which is the learning algorithm almost universally used in connectionist models of language acquisition is also universally recognized to be a biological impossibility; no brain process known to science corresponds to backpropagation (Smolensky, 1988; Clark, 1993; Stich, 1994; Marcus, 1998b).

I challenge Scott to answer the arguments so clearly laid out in Gregg’s (2003) article.

Finally, I can’t resist a quote from Eubank and Gregg’s (2002) article.

“And of course it is precisely because rules have a deductive structure that one can have instantaneous learning, without the trial  and error involved in connectionist learning.  With the English past tense rule, one can instantly determine the past tense form of “zoop” without any prior experience of that verb, let alone of “zooped” (unlike, say,  Ellis & Schmidt’s model, which could only approach the “correct” plural form for the test item, and only after repeated exposures to the singular form followed by repeated exposures to the plural form, along with back-propagated comparisons).  If all we know is that John zoops wugs, then we know instantaneously that John zoops, that he might have zooped yesterday and may zoop tomorrow, that he is a wug-zooper who engages in wug-zooping, that whereas John zoops, two wug-zoopers zoop, that if he’s a Canadian wug-zooper he’s either a Canadian or a zooper of Canadian wugs (or both), etc.  We know all this without learning it, without even knowing what “wug” and “zoop” mean.  A frequency / regularity account would need to appeal to a whole congeries of associations, between a large number of pairs like “rum-runner/runs rum, piano-tuner/tunes pianos, …”  but not like “harbor-master/masters harbors, pot-boiler/boils pots, kingfisher/fishes kings, …”, or a roughly equal number of pairs like “purple people-eater” meaning purple people or purple eater, etc.”

Follow that!


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Ellis, N. (2002) Fequency effects in language processing. SSLA, 24,2.  

Eubank, L., & Gregg, K. (2002) Nnews flash—hume still dead. Studies in Second Language Acquisition, 24(2), 237-247.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Gregg, K. R. (2003) The state of emergentism in second language acquisition. Second Language Research 19,2 (2003); pp. 95–128

Mike Long: Reply to Carroll’s comments on the Interaction Hypothesis

I’m very grateful to Mike Long for taking the time to wrie a quick response to Carroll’s comments on his Interaction Hypothesis, which I quoted in the latest post on Carroll’s Autonomous Induction Theory. His email to me is repoduced below, with his permission.

Important issues are raised – the roles of noticing versus “detection”, the reach of negative feedback, and, most importantly, perhaps, the statement “I side with Nick Ellis and the UB (not UG) hordes” – which I’ll try to address, once I’ve stopped sobbing, in Part 4.

Hi, Geoff,

Thank you for the valuable work you do with your blog, and just as important, the fact that it is also usually funny (in a good way).

It has been nice to see Susanne Carroll’s work getting an airing of late. I sent a several-page comment on your Part 3 via the Comment form yesterday, butit apparently disappeared into the ether. As I have a day job, and it’s term paper reading week, what follows is a quick and dirty rehash.

Much of the critique the pair of you leveled against the Interaction Hypothesis (IH) focused on one dimenson only: negotiation for meaning and the negative feedback (NF) it produces. But the IH and negotiation for meaning are a whole lot broader than that.

Genuinely communicative L2 interaction provides opportunities for learners focused on meaning to pick up a new language incidentally, as an unintended by-product of doing something else — communicating through the new language — and often also implicitly, i.e., without awareness. Interacting in the L2 while focused on what is said, learners sometimes perceive new forms or form-meaning-function associations consciously — a process referred to as noticing (Schmidt, 1990, 2010). On other occasions, their focus on communication and attention to the task at hand is such that they will perceive new items in the input unconsciously — a process known as detection (Tomlin & Villa, 1994). Detection is especially important, for as Whong, Gil, & Marsden (2014) point out, implicit learning and (barring some sort of consciousness-raising event)the end-product, implicit knowledge, is what is required for real-time listening and speaking. 

While communicating through the L2, a learner’s attention is likely to be drawn to problematic items by added salience resulting from typical characteristics of meaning-focused exchanges. For instance, NS or more proficient NNS interlocutors will consciously or unconsciously highlight important items, e.g., by pausing briefly before and/or after them, adding stress, repeating them, providing synonyms and informal definitions, moving them to more salient initial or final positions in an utterance through left-dislocation or decomposition, and through comprehension checks, confirmation checks and clarification requests, all triggering a momentary switch of the learner’s focal attention from meaning to linguistic form. In addition, NF — mostly implicit, mostly recasts — can have the same effect, while simultaneously providing whatever positive evidence is needed, whether a missing item or a model of more target-like usage. The same incidental learning process operates when learners read a book or a newspaper, listen to a radio interview or watch a movie. However, whereas those activities involve attempting to understand and learn from static spoken or written input intended for NSs and over which they have no control, face-to-face L2 interaction is dynamic, offering opportunities to negotiate for meaning. The negotiation work increases the likelihood that salience will be added, attention drawn to items uniquely problematic for them, and communicative trouble repaired.

You are both skeptical about learners’ ability to compare representations stored in long-term memory with the positive evidence contained in recasts, in order to “notice the gap”. I would be, too. But the comparison involves short-term, or working, memory (WM), not long-term memory. And why would that be inconceivable? Evidence that it is not only possible, but happens all the time (in L1 and L2) is abundant. For instance, what someone says or writes frequently primes, or triggers, use of the same lexis and syntax in that person’s own, or a listener’s,immediately following speech or writing (see, e.g., Doughty, 2001; McDonough 2006; McDonough & Mackey, 2008).

Then, think of the immediate recall sub-tasks common in language aptitude measures, such as LLAMA D and the n-back task in Hi-Lab. Essentially, learners hear/see a short string of sounds or letters and either have to say which ones in a new sequence they heard or read, or repeat them a few seconds later (n-back is a bit more complex). The same basic idea is employed in countless word-recognition post-tests in incidental learning studies. Everyone can do those tasks — some better than others, which is why they are used as a measure of language aptitude – and I reckon they tap roughly the same ability as that used in learning from recasts. The fact that the learner’s original utterance and a recast it triggers are both meaningful, unlike strings of random letters, sounds or words, can be predicted to make it even easier to hold and compare in short-term memory.

Recasts have seven additional qualities (Long, 1996) that make cognitive comparison and learning even more feasible. They convey needed information about the target language (i) in context, (ii) when listeners and speakers share a joint attentional focus, (iii)whenthe learner is vested in the exchange, (iv) and so is probably motivated and (v) attending. (vi) The fact that learners already have prior comprehension of at least part of the message the recast contains, because the reformulation they hear is of what they just tried to say, frees up attentional resources and facilitates form-function mapping. Indirect support for this idea may lie in the findings of a study of the value of content familiarity. Finally, and crucially, (vii) the contingency of recasts on deviant output means that incorrect and correct utterances are juxtaposed, allowing learners briefly to hold and compare the two versions in working memory (WM). Not convinced? Then considerthe fact that statistical meta-analyses (e.g., Goo, 2019; Li, 2010; Loewen& Sato, 2018) have shown that recasts result in measurable learning, with some evidence that they do so better than modelsand non-contingent speech, on salient targets, at least (Long, Inagaki, & Ortega, 1998).

And, again, it’s not just NF. Negotiation for meaning involves higher than usual frequencies of semantically contingent speech, including repetitions, reformulationsand expansions, sometimes functioning simultaneously as recasts, but more generally (something usually dear to UG-ers’ hearts) as (mostly)comprehensible, so processable, positive evidence usable for learning. Problematic forms are recycled, increasing their salience and the likelihood that they will be perceived by the learner. The positive evidence, moreover, is elaborated, not simplified (except for lower mean length of utterance), so retains the items to which learners need to be exposed if acquisition is to occur.

I side with Nick Ellis and the UB (not UG) hordes. Since learning a new language is far too large and too complex a task to be handled explicitly, and althoh it requires more input and time,implicit learning remains the default learning mechanism for adults:

“Even though many of us go to great lengths to engage in explicit language learning, the bulk of language acquisition is implicit learning from usage. Most knowledge is tacit knowledge; most learning is implicit; the vast majority of our cognitive processing is unconscious” (Ellis &Wulff, 2015, p. 8; and see Ellis &Wulff, 2019).

Sure, there are limitations. Williams (2009) notes, for example, that the scope of implicit learning may not extend to phenomena, such as anaphora, that involve non-adjacent items (Hawkins was arrested, and so were several members of his gang), which may no longer be learnable that way. And I have often pointed to evidence that the capacity for implicit learning, especially instance learning, weakens (not disappears) around age 12 (e.g., Long, 215, pp. 2017). But those are for another time, which, you will by now be relieved to hear, I don’t have now.

According to the Tallying Hypothesis (Ellis, 2002), ensuring that learners’ attention is drawn to learning targets that way, especially to abstract, perceptually non-salient items, can modify entrenched automatic L1 processing routines, thereby altering the way subsequent L2 input is processed implicitly. An initial representation is established in long-term memory and functions as a selective cue priming the learner to attend to and perceive additional instances when processingimplicitly. Ellis identifies what he calls “the general principleof explicit learning in SLA: changing the cues that learners focus on intheir language processing changes what their implicit learning processestune” (Ellis 2005, p. 327). Research is currently under way to determine whether it is possible to achieve the same results by unobtrusive,less interventionist means: enhanced incidental learning (Long, 2015, pp. 30-62, 2017, 2020).


Doughty, C. (2001a). Cognitive underpinnings of focus on form. In Robinson, P. (ed.), Cognition and Second Language Instruction (pp. 206-257). Cambridge: Cambridge University Press.

Ellis, N. C. (2005). At the interface: Dynamic interactions of explicit and implicit language knowledgeStudies in Second Language Acquisition 27, 2, 305-352.

Ellis, N. C. (2006) Selective attention and transfer phenomena in L2 acquisition: contingency, cue competition, salience, interference, overshadowing, blocking, and perceptual learning. Applied Linguistics 27, 2,164-194.

Ellis, N. C. &Wulff, S. (2015). Usage-based approaches to second language acquisition. In VanPatten, B., & Williams, J. (Eds., Theories in second language acquisition (pp. 75-93). New York: Routledge.

Ellis, N. C. & Wulff, S. (2019). Cognitive approaches to L2 acquisition. In Schwieter, J. W., &Benati​, A. (Eds.), The Cambridge Handbook of Language Learning (pp. 41-61). Cambridge: Cambridge University Press

Goo, J. M (2019. Interaction in L2 learning. In Schwieter, J. W., & Benati, A. (Eds.), The Cambridge handbook of language learning (pp. 233-257). Cambridge: Cambridge University Press.

Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning 60, 2, 309-365.

Loewen, S. & Sato, M. (2018). Interaction and instructed second language acquisition. Language Teaching 51, 3, 285-329.

Long, M. H. (2015). Second language acquisition and Task-Based Language Teaching. Oxford: Wiley-Blackwell.

Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In Ritchie, W. C., &Bahtia, T. K. (eds.), Handbook of second language acquisition (pp. 413-68). New York: Academic Press.

Long, M. H. (2015). Second language acquisition and Task-Based Language Teaching. Oxford: Wiley-Blackwell.

Long, M. H. (2017). Instructed second language acquisition (ISLA): Geopolitics, methodological issues, and some major research questions. Instructed Second Language Acquisition 1, 1, 7-44.

Long, M. H. (2020). Optimal input for language learning: genuine, simplified, elaborated, or modified elaborated? Language Teaching 53, 2, 169-182.

Long, M. H., Inagaki, S., & Ortega, L. (1998). The role of implicit negative feedback in SLA: Models and recasts in Japanese and Spanish. Modern Language Journal 82, 3, 357-371.

McDonough, K. (2006). Interaction and syntactic priming: English L2 speakers’ production of dative constructions. Studies in Second Language Acquisition, 28, 179-207.

McDonough, K., & Mackey, A. (2008). Syntactic priming and ESL question development. Studies in Second Language Acquisition 30, 1, 31-47.

Saxton, M. (1997). The contrast theory of negative input. Journal of Child Language 24, 139-161.

Whong, M., Gil, H.-G. and Marsden, E. (2014) Beyond paradigm: the ‘what’ and the‘how’ of classroom research. Second Language Research 30, 4, 551-568.

Susanne Carroll’s AIT: Part 3

In this third part of my exploration of Carroll’s Autonomous Induction Theory (AIT), I’ll look at “categorization” and feedback. In what follows I try to speak for Carroll and I apologise for the awful liberties I’ve taken with her texts.  All the quotes come from Carroll (2002), unless otherwise cited.


A theory of SLA must start from a theory of grammar. When we look at the grammars of natural languages, we note that they differ in their repertoires of categories: words are divided into different segments, and sentences comprise different classes of words and phrases. But how? As a basic example, a noun is not reducible to the symbol ‘N’: word classes consist of sound-meaning correspondences, so a noun is a union of phonetic features, phonological structure, morphosyntactic features, morphological structure, semantic features and conceptual structure. As Jackendoff says, words are correspondences connecting levels of representation.


UG provides the representational primitives of each autonomous level of representation, and UG provides the operations which the parsers can perform. In other words, UG severely constrains the ways that the categories at different levels of representation are unified and project into hierarchical structure.


A theory of i-learning explains what happens when a parser fails, and a new constituent or a new procedure must be learned.

In the case of category i-learning, UG provides a basic repertoire of  features in  each autonomous representational system. Features will combine to form complex units at a given level: phonetic features combine to form segments (a timing unit of speech), morphosyntactic features combine to form morphemes (the basic unit of the morphosyntax),and primitive semantic features combine to form complex concepts like Agent, Male, Cause, Consequence, and so on.

But UG is not the whole story: the acquisition of basic units within an integrative processor will reflect various constraints on feature unification within the limits defined by “unification grammars”.

Some of these constraints will presumably also be attributable to UG. What these restrictions actually consist of, however, is an empirical question and our understanding of such issues has come, and will continue to come, largely from cross-linguistic and typological grammatical research.

Having constructed representations, learners then have to identify them as instances of a category. So SLA consists of learning the categories and the correspondence rules which apply to a specific L2. UG provides some correspondence rules which, in first language acquisition, are used by infants to learn the language specific mappings needed for rapid processing of the particularities of the L1 phonology and morphosyntax. These are carried over into SLA, as are all L1 correspondence rules, which leads to transfer problems.

AIT is embedded in a theory of the functional architecture of the language faculty and linked to theories of parsing and production. Autonomous representational systems of language work with  constrained processing modules in working memory. When parsing fails, acquisition mechanisms try to fix the problem. A correspondence failure can only be fixed by a change to a correspondence rule, and an integration problem can only be changed by a change to an integration procedure.

Very importantly, evidence for acquisition comes in the form of mental representations, not from the speech stream, except in the case of i-learning of acoustic patterns of the phonetics of the L2. Carroll explains:

In this respect, this theory differs radically from the Competition Model and from all theories which eschew structural representations in favour of direct mappings between the sound stream and conceptual representations. If correct, it means that simply noting the presence or absence of strings in the speech stream is not going to tell us directly what is in the input to the learning mechanisms.


The place of lexis needs special mention. Following Jackendoff, Lexical items have correspondence rules, linking phonological, morphosyntactic and conceptual representations of words.

Since the contents of lexical entries in SLA are known to be a locus of transfer, the contents of lexical entries will constitute a major source of information for the learning mechanisms in dealing with stimuli in the L2.

Carroll says (2001, p. 84) “In SLA, the major “bootstrapping” procedure may well be lexical transfer”. She says this in the context of arguing for the limited effects of UG on SLA, and I wish she’d said more.


So, a theory of SLA must start with a theory of linguistic knowledge, of mental grammars. Then, it has to explain how a mental grammar is restructured. After that, a theory of linguistic processing must explain how input gets into the system, thereby creating novel learning problems, and finally, a theory of learning must show how novel information can be created to resolve learning problems. I’ve covered all this, however badly, but more remains.


On page 31 of Input and Evidence we get a re-formulation of Carroll’s research questions:

I have to say that I see little of relevance in the next 300 pages, but the last three chapters do have a shot at answering them. I don’t think she does a good job of it, but that’s for Part 4. If you’re already exhausted, think how I feel about the task of telling you about it.

We must return again to Carroll’s most central claims (IMHO) that ‘input’ and ‘intake’ are badly defined theoretical constructs which make a bad starting point for any theory of SLA, and that consequent talk of ‘L1 transfer’; ‘noticing’; ‘negotiation of meaning’; and ‘ouput’ are similarly unsatisfactory components of a theory of SLA. The starting point should be stimuli from the environment, not linguistic input (whatever that is), and we must then explain how these stimuli get represented and successfully transformed into developing interlanguages. This demands not just a property theory to describe what is being developed, but a much better model of the learning mechanisms and the reasoning involved than is presently on offer.

A taster

Long’s Interaction Hypothesis states that the role of feedback is to draw the learner’s attention to mismatches between a stimulus and the learner’s output, and that they can learn a grammar on the basis of the “negotiation of meaning.” But what is  meant by these terms? For Carroll, “input” means stimulus, and “output” means what the learner actually says, so the claim is that the learner can compare a representation of their speech to a representation of the speech signal. Why should this help the learner in learning properties of the morphosyntax or vocabulary, since the learner’s problems may be problems of incorrect phonological or morphosyntactic structure? To restructure the mental grammar on the basis of feedback, the learner must be able to construct a representation at the relevant level and compare their  output — at the right level — to that.

It would appear then,…  that the Interaction Hypothesis presupposes that the learner can compare representations of her speech and some previously heard uttefance at the right level of analysis. But this strikes me as highly implausible cognitively speaking. Why should we suppose that learners store in longterm memory their analyses of stimuli at all levels of analysis? Why should we assume that they are storing in longterm memory all levels of analysis of their own speech?… Certainly nothing in current processing theory would lead us to suppose that humans do such things. On the contrary, all the evidence suggests that intermediate levels of the analysis of sentences are fleeting, and dependent on the demands of working memory, which is concerned only with constructing a representation of the sort required for the next level of processing up or down. Intermediate levels of analysis of sentences normally never become part of longterm memory. Therefore, it seems reasonable to suppose that the learner has no stored representations at the intermediate levels of analysis either of her own speech or of any stimulus heard before the current “parse moment.” Consequently, he cannot compare his output (at the right level of analysis) to the stimulus in any interesting sense….. Moreover, given the limitations of working memory, negotiations in a conversation cannot literally help the learner to re-parse a given stimulus heard several moments previously. Why not? Because the original stimulus will no longer be in a learner’s working memory by the time the negotiations have occurred. It will have been replaced by the consequences of parsing the last utterance from the NS in the negotiation. I conclude that there is no reason to believe that the negotiation of meaning assists learners in computing an input-output comparison at the right level of representation for grammatical restructuring to occur (Carroll, 2001, p. 291).

Preposterous, right, Mike?

Fun will finally ensue when, in Part 5, I get together with a bunch of well oiled chums in a video conference session to defend Carroll’s insistence on a property theory and a language faculty against the usage based (UB) hordes. Neil McMillan (whose Lacan in Lockdown: reflections from a small patio is eagerly awaited by binocular-wielding graffiti fans in Barcelona); Kevin Gregg (train schedule permitting); and Mike (‘pass the bottle’) Long are among the many who probably won’t take part.


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Carroll’s AIT Theory: Part 2

This is the second part of my exploration of Susanne Carroll’s theory of SLA. Carroll’s work is important, IMHO, because it questions many of the constructs used by SLA theorists, including ‘comprehensible input’, ‘processing’, ‘i +1’, ‘noticing’, ‘noticing the gap’, ‘L1 transfer’, ‘chunk learning’, and many others. By examining Carroll’s work, I think we can throw light on all these constructs and come to a better understanding of how people learn an L2.

In Part One, I looked at Carroll’s adoption of Jackendoff’s Representational Modularity (RM) theory; a theory of modular mind where each module contains levels of representation organised in chains going from the lowest to the highest. The “lowest” representations are stimuli and the “highest” are conceptual structures. This leads to the hypothesis of levels:

Selinker, Kim and Bandi-Rao (2004, p. 82) summarise RM thus:  

The language faculty consists of auditory input, motor output to vocal tract, phonetic, phonological, syntactic components and conceptual structure, and correspondence rules, various processors linking/regulating one autonomous representational type to another. These processors, domain specific modules, all function automatically and unconsciously, with the levels of modularity forming a structural hierarchy representationally mediated in both top-down and bottom-up trajectories.

And Carroll (2002) says:

What is unique to Jackendoff’s model is that it makes explicit that the processors which link the levels of grammatical representation are also a set of modular processors which map representations of one level onto a representation at another level. These processors basically consist of rules with an ‘X is equivalent to Y’ type format. There is a set of processors for mapping ‘upwards’ and a distinct set of processors for mapping ‘downwards’.

Bottom-up correspondence processors

a.         Transduction of sound wave into acoustic information

b.         Mapping of available acoustic information into phonological format.

c.         Mapping of available phonological structure into morphosyntactic format.

d.         Mapping of available syntactic structure into conceptual format.

Top-down correspondence processors

a.         Mapping of available syntactic structure into phonological format.

b.         Mapping of available conceptual structure into morphosyntactic format.

Integrative processors

a.         Integration of newly available phonological information into unified phonological structure.

b.         Integration of newly available morphosyntactic information into unified morphosyntactic structure.

c.         Integration of newly available conceptual information into unified conceptual structure.

(Jackendoff 1987, p. 102, cited in Carroll, 2002, p. 16).  


The second main component of Carroll’s AIT is induction. Induction is a form of reasoning which involves going from the particular to the general. The famous example (given in Philosophy for Idiots, Thribb, 17, cited in Dellar, passim) is of swans. You define a swan and then search lakes looking to see what colour particular examples of it are. All the swans you see in the first lake are white, and so are those in the second lake. Everywhere you look, they’re all white, so you conclude that “All swans are white”. That’s induction. Hume (see Neil McMillan (unpublished) The influence of Famous Scottish Drunkards on Lacard’s psychosis; a bipolar review) famously showed that induction is illogical – no inference from the particular to the general is justified. No matter how many white swans you observe, you’ll never know that they’re ALL white, that there isn’t a non-white swan lurking somewhere, so far unobserved. Likewise, you can’t logically induce that because the sun has so far always risen in the East that it will rise in the East tomorrow. Popper “solved” this conundrum by saying that we’ll never know the truth about any general theory or generalisation, so we just have to accept theories “tentatively”, testing them in attempts not to prove them (impossible) but, rather, to falsify them. If they withstand these tests, we accept the theory, tentatively, as “true”.    

The assumption of all SLA “cognitive processing” transition theories is that the development of interlanguages depends on the gradual reformulation of the learner’s mental conceptualisations of the L2 grammar. These reformulations can be seen as following the path suggested by Popper to get to reliable knowledge:

P1 -> TT¹ -> EE -> P2 -> TT², etc.

P = problem

TT = tentative theory

EE = testing for empirical evidence which conflicts with TT   

You start with a problem and you leap to a tentative theory (TT) and then you test it, trying to falsify it with empirical evidence. If you find such contradictory evidence, you have a problem, and you re-formulate the theory (TT²) which tries to deal with the problem, and you then test again, and round we go again, slowly improving the theory. Popper is talking about hypothesis testing and theory construction in the hard sciences (particularly physics), and while it’s a long way from describing what scientists actually do, it’s even further away from describing what L2 learners do in developing interlanguages. Nevertheless, it’s common to hear people describing SLA as hypothesis formation and hypothesis testing.

We could, I suppose, see the TT1¹ as the learner’s initial interlanguage theory. Then, at any given point in its trajectory, the theory gets challenged by evidence that doesn’t fit (perhaps went when goed is expected, for example) and the problem is resolved by a new, more sophisticated theory, the TT². But it doesn’t work – interlanguage development is not a matter of hypothesis formation and testing in Popper’s sense, and I agree with Carroll that it’s “a misleading metaphor”. In her view, SLA is a process of “learning new categories of the grammar, new structural arangements in on-line parses and hence new parsing procedures and new productive schemata” (Carroll, 2001, p. 32). Still, Hume’s problem of underdeterminism remains – the inductions that learners are said to make aren’t strictly logical. (“Just saying” (McMillan, ibid)).  

So anyway, Carroll wants to see SLA development (partly) as a process of induction. The most respectable theory of induction is inference to the best explanation, also known as abduction, and I think Lipton (1991) provides the best account, although Gregg (1993) does a pretty good job of it in a couple of pages (adeptly including a concise account of  Hempel’s D-N model, by the way). Carroll, however, ducks the issues and follows Holland et al., (1986), who define induction as a set of procedures which lead to the creation and/or refinement of the constructs which form mental  models (MMs). Mental models are “temporary and changing complex conceptual representations of specific situations”.  Carroll gives the example of a Canadian’s MM of a breakfast event, versus, say, the very different one of a Japanese MM breakfast event. MMs are domains of knowledge, schemata, if you like, and Carroll makes lots of use of MMs which I’m going to skip over. She then goes into considerable detail about categorising MMs, and then procedes to “Condition-action rules” which govern induction. These are competition rules which share ideas from abduction in as much as they say “When confronted with competing solutions to a problem, choose the most likely, the best ‘fit’ ”.

 Carroll (2001, p. 170) finally (sic) defines induction as a process

leading to revision of representations so that they are consistent with information currently represented in working memory. Its defining property is that it is rooted in stimuli made available to the organism through the perceptual system, coupled with input from Long Term Memory and current computations. … the results of i-learning depend upon the contents of symbolic representations.


Carroll’s theory of learning rests on i-learning (as opposed to ‘I language’ in Chomsky’s sense, which has very little to do with it, and one can only wish she’d chosen some other term, rather like Long’s unhappy choice of “Focus on FornS”). I-learning depends on the contents of symbolic representations being computed by the learning system.

At the level of phonetic learning content of phonetic representations, to be defined in terms of acoustic properties. At the level of phonological representation, i-learning will depend on the content of phonological representations, to be defined in terms of  prosodic  categories, and featural specification ef segments. At the level of morphosyntactic learning, i-learning will depend upon the content ‹if morphosyntactic representations. And so on.

So, it seems, i-learning  goes on autonomously within all the parts of Lackendoff’s theory of modularity,  not just in the conceptual representational system. (I take it that this is where Carroll’s ‘competition’ comes in – analysing a novel form involves competition  among various information sources from different levels.) Anyway, the key point is that i-learning is triggered by the failure of current  representations  to “fit” current models in conjunction with specific environmental stimuli.

More light

I usually don’t comment on my choice of images, but the above image shows Goethe on his death bed. His wonderful dying words were, according to his doctor, Carl Vogel, “Mehr Licht!” And I can’t help sharing this anecdote. In my first seminar, in my first term of my first year at LSE, I read a paper presided over by Imre Lakatos, one of the finest scholars I’ve ever met, and later a friend who committed perjory in court to help me avoid being found guilty of a criminal charge. The paper was about German developments in science, and I mentioned Goethe, whose name I pronounced ‘Go eth’. Lakatos was drinking a coffee at the moment when I said “Go eth” and reacted very violently. He spat the coffee out, all over the alarmed students sitting round the table in his study, jumped to his feet, and shouted hysterically: “I fail to understand how anybody who’s been accepted into this university can so hopelessly mispronounce the name of Germany’s most famous poet!”

I use Goethe’s dying words here to refer to Carroll’s 2002 paper, which really does throw more light on her difficult-to-follow 2001 work.

In her (2002) account of I-learning, Carroll argues that researching the nature of induction in language acquisition requires the notion of a UG, which describes the properties of grammatical knowledge shared by all human languages. The psycholinguistic processes which result in this knowledge are constrained by UG – which, she insists, doesn’t mean that “UG is thereby operating on-line in any fashion or indeed is stored anywhere to be consulted, as one might infer from much generative SLA research” (Carroll, 2002, p. 11).

Carroll goes on to say that a speaker’s I-language consists of a particular combination of universal and acquired contents, so that a theory of SLA must explain not only what is universal in our mental grammars, but also what is different both among speakers of the same E-language and among the various E-languages of the world.

In order to have a term to cover a UG-compatible theory of acquisition, as well as to make an explicit connection to I-language, I suggest we call such a theory of acquisition a theory of i(nductive)-learning, specifically the Autonomous Induction Theory (Carroll, 2002, p.12).

In other words, while Chomsky is concerned with explaining I-language, Carroll is concerned with explaining the much wider construct of I-learning; she wants to integrate a theory of linguistic competence with theories of performance. So, it goes like this:

The perception of speech, the recognition of words, and the parsing of sentences in the L2 requires the application of unit detection and structure-building procedures. When those procedures are in place, speech processing is perfomed satisfactorily. But when the procedures are not available, (e.g., to the beginning L2 learner), speech proccessing will fail, forcing the learner to fall back on inferences from the context, stored knowledge, etc. But, of course, beginners have very few such rescources to draw on, and so interpretation of the stimulus will fail, which is when i-learning mechanisms will be activated.

When speech detection, word recognition, or sentence parsing fail,… only the i-learning mechanisms can fix the problem . They go into action automatically and unconsciously (Carroll, 2002, p13).

To start with then, the learner hears the speech stream as little more than noise. Comprehension depends on their learning the right cues to syllable edges and and the segments which comprise the syllables. Only once these cues to the identification of phonological units has been learned can word learning begin. After that, form-extraction processes which map some unit or other of the phonology onto a morphosyntactic word will allow the learner to hear a form in the speech stream, but still without necessarily knowing what it means. After that, when learners can identify words and know what they mean, they might still lack the parsing procedures needed  to use morphosyntactic cues to arrive at the correct sentence structure and hence arrive at the correct sentence meaning. Either they fail to arrive at any interpretation,  or they arrive at the wrong one – their semantic representation isn’t the same as what was intended by the speaker. Finally, their i-learning allows them to get the right meaning – the parsers can now do their job satisfactorily.

Recall what was said in Part 1: Krashen got it backwards! This is the real thrust of Carroll’s argument: input must be seen not as linguistic stuff coming straight from the environment, but rather as stuff that results from processes going on in the mind which call on innate knowledge. Furthermore: YOU CAN’T NOTICE GRAMMAR!

So there you have it. Except that, really, that’s nowhere near “it”. Carroll admits that her theory doesn’t explain what the acquisition mechanism does when parsing breaks down. She asks:

How does the mechanism move beyond that point of parse, and what are the constraints on the solution the learner alights on?  Why do the acquisition mechanisms which attempt to restructure the existing parsing procedures and correspondence rules to deal with a current parse problem often fail?”

The answers partly lie in Carroll’s investigation of  “Categories and categorization” and partly in the roles of feedback and correction. In an early refomulation of her research questions in Input and Evidence, Carroll emphasises the importance of feedback and correction to her work, which points to her important contributions to examining the empirical evidence found in the SLA literature, and also highlights some of the ways in which this evidence has been (mis)used. All this will be discussed in Part 3, where I’ll also look at what Carroll’s AIT has to say about explicit and implicit learning, and about what some of today’s gurus in ELT might learn from Carroll’s work.

This is a blog post, not an academic text. I’m exploring Carroll’s work, and I’ve no doubt made huge mistakes in describing and interpreting it. I await correction. But I hope it will provoke discussion among the many ELT folk who enjoy shooting the breeze about important questions which have a big impact (or should I say ‘impact big time’) on how we organise and implement teaching programmes.


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Lipton, P.: (1991) Inference to the Best Explanation . London: Routledge.

Popper, K. R. (1972)  Objective Knowledge.  Oxford: Oxford University Press.

Selinker, L., Kim, D. and Bandi-Rao, S. (2004) Linguistic structure with processing in second language research: is a ‘unified theory’ possible? Second Language Research 20,1, pp. 77–94

The place of Jackendoff’ Representational Modularity Theory in Carroll’s Autonomous Induction Theory


Jackendoff’s Representational Modularity Theory (Jackendoff, 1992) is a key component in Susanne Carroll’s Autonomous Induction Theory, as described in her book Input and Evidence (2001). Carroll’s book is too often neglected in the SLA literature, and I think that’s partly because it’s very demanding. Carroll goes into so much depth about how we learn; she covers so much ground in so much methodical detail; she’s so careful, so thorough, so aware of the complexities, that if you start reading her book without some previous understanding of linguistics, the philosophy of mind, and the history of SLA theories, you’ll find it very tough going. Even with some such understanding of these matters, I myself find the book extremely challenging. Furthermore, the text is dense and often, in my opinion, over elaborate; you have to be prepared to read the text slowly, and at the same time keep on reading while not at all sure where the argument’s going, in order to “get” what she’s saying.

One criterion for judging theories of SLA is an appeal to Occam’s Razor: ceteris paribus (all other things being equal), the theory with the simplest formula, and the fewest number of basic types of entity postulated, is to be preferred for reasons of economy. Carroll’s theory scores badly here: it’s complicated! Her use of Jackendoff’s theory, and of the Induction Theory of Holland means that her theory of SLA counts on a variety of formula and entities, and thus it’s not “economical”. On the other hand, it’s one of the most complete theories of SLA on offer.

Over the years, I’ve spent weeks reading Carroll’s Input and Evidence, and now, while reading it yet again in “lockdown”, I’m only just starting to feel comfortable turning from one page to the next. But it’s worth it: it’s a classic; one of the best books on SLA ever, IMHO, and I hope to persuade you of its worth in what follows. I’m going to present The Autonomous Induction Theory (AIT) in an exploratory way, bit by bit, and I hope we’ll end up, eventually, with some clear account of AIT and what it has to say about second language learning, and its implications for teaching.

UG versus UB

In the current debate between Chomsky’s UG theory and more recent Usage-based (UB) theories of language and language learning, most of those engaged in the debate see the two theories as mutually contradictory: one is right and the other is wrong. One says language is an abstract system of form-meaning mappings governed by a grammar (in Chomsky’s case a deep grammar common to all natural languages as described in the Principles and Parameters version of UG), and this knowledge is learned with the help of innate properties of the mind. The other says language should be described in terms of its communicative function; as Saussure put it “linguistic signs arise from the dynamic interactions of thought and sound – from patterns of usage”. The signs are form-meaning mappings; we amass a huge collection of them through usage; and we process them by using relatively simple, probabilistic algorithms based on frequency.

O’Grady (2005) has this to say:

The dispute over the nature of the acquisition device is really part of a much deeper disagreement over the nature of language itself. On the one hand, there are linguists who see language as a highly complex formal system that is best described by abstract rules that have no counterparts in other areas of cognition. (The requirement that sentences have a binary branching syntactic structure is one example of such a “rule.”) Not surprisingly, there is a strong tendency for these researchers to favor the view that the acquisition device is designed specifically for language. On the other hand, there are many linguists who think that language has to be understood in terms of its communicative function. According to these researchers, strategies that facilitate communication – not abstract formal rules – determine how language works. Because communication involves many different types of considerations (new versus old information, point of view, the status of speaker and addressee, the situation), this perspective tends to be associated with a bias toward a multipurpose acquisition device.

Susanne Carroll tries to take both views into account.

Property Theories and Transition Theories of SLA

Carroll agrees with Gregg (1993) that any theory of SLA has to consist of two parts:

1) a property thory which describes WHAT is learned,

2) a transition theory which explains HOW that knowledge is learned.

As regards the property theory, it’s a theory of knowledge of language, describing the mental representations that make up a learner’s grammar – which consists of various classifications of all the components of language and how they work together. What is it that is represented in the learner’s knowledge of the L2? Chomsky’s UG theory is an example; Construction grammar is another; The Competition Model of Bates & MacWhinney (1989, cited in Carroll, 2001) is another; while general knowledge representations, and forms of rules of discourse, Gricean maxims , etc. are, I suppose also candidates.

Transition theories of SLA explain how these knowledge states change over time. The changes in the learner’s knowledge, generally seen as progress towards a more complete knowledge of the target language, need to be explained by appeal to a causal mechanism by which one knowledge state develops into another.

Many of the most influential cognitive processing theories of SLA (Chaudron, 1985; Krashen, 1982; Sharwood Smith, 1986, Gass, 1997, Towell & Hawkins, 1994, cited in Carroll, 2001) concentrate on a transition theory. They explain the process of L2 learning in terms of the development of interlanguages , while largely ignoring the property theory, which they sometimes, and usually vagely, assume is dealt with by UG. New UB theories (e.g. Ellis, 2019; Tomesello, 2003) reject Chomsky’s UG property theory and rely on what Chomsky regards as performance data for a description of the language in terms of a Construction Grammar. More importantly, perhaps, their ‘transition theory’ makes a minimal appeal to the workings of the mind; they’re at pains to use quite simple general learning mechanisms to explain how “associative” learning, acting on input from the environment, explains language learning.

Mentalist Theories

Carroll bases her approach on the view that humans have a unique, innate capacity for language, and that language learning goes on in a modular mind. Here, I’ll leave discussions about the philosophy of mind to one side, but suffice it to say for now that ‘mind’ is a theoretical construct referring to a human being’s world of thought, feeling, attitude, belief and imagination. When we talk about the mind, we’re not talking about a physical part of the body (the brain), and when we talk about a modular mind, we’re not talking about well-located, separate parts of the brain.

Carroll rejects Fodor’s (1983) claim that the language faculty comprises a single language module in the mind’s architecture, and she sees Chomsky’s LAD as an inadequate description of the language faculty. Rather than accept that language learning is crucially explained by the workings of a “black box”, Carroll explores the mechanisms of mind more closely, and, following Jackendoff, suggests that the language faculty operates at different levels, and is made up of a chain of mental representations, with the lowest level interacting with physical stimuli, and the highest level interacting with conceptual representations. Processing goes on at each level of representation, and a detailed description of these representations explains how input is processed for parsing.

Carroll further distinguishes between processing for parsing and processing for learning, such that, in speech, for example, when the parsers fail to get the message, the learning mechanisms take over. Successful parsing means that the processors currently at the learner’s disposal are able to use existing rules which categorize and combine representations to understand the speech signal. When the rules are inadequate or missing, parsing breaks down; and in order to deal with this breakdown, the known rule that helps most in parsing the problematic item of input is selected and subsequently adapted or refined until parsing succeeds at that given level. As Sun (2008) summarises “This procedure explains the process of acquisition, where the exact trigger for acquisition is parsing failure resulting from incomprehensible input”.

Scholars from Krashen to Gass take ‘input’ and ‘intake’ as the first two necessary steps in the SLA process (Gass’s model suggests that input passes through the stages of “apperceived” and “comprehended” input before becoming ‘intake’), and ‘intake’ is regarded as the set of processed structures waiting to be incorporated into interlanguage grammar. The widely accepted view that in order for input to become intake it has to be ‘noticed’, as described by Schmidt in his influential 1990 paper, has since, as the result of criticism (see, for example, Truscott, 1998) been seriously modified so that it now approximate to Gass’ ‘apperception’ (see Schmidt 2001, 2010), but it’s still widely seen as an important part of the SLA process.

Processing Theories of SLA

Caroll, on the other hand, sees input as physical stimuli, and intake as a subset of this stimuli.

The view that input is comprehended speech is mistaken. Comprehending speech ..happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards! (Carroll, 2001, p. 78).

Referring not just to Krashen, but to all those who use the constructs ‘input’, ‘intake’ and ‘noticing’, Gregg (in a comment on one of my blog posts) makes the obvious, but crucial point: “You can’t notice grammar”! Grammar consists of things like nouns and verbs, which, are, quite simply, not empirically observable things existing “out there” in the environment, waiting for alert, on-their-toes learners to notice them.

So, says Carroll, language learning requires the transformation of environmental stimuli into mental representations, and it’s these mental representations which must be the starting point for language learning. In order to understand speech, for example, properties of the acoustic signal have to be converted to intake; in other words, the auditory stimulus has to be converted into a mental representation. “Intake from the speech signal is not input to leaning mechanisms, rather it is input to speech parsers. … Parsers encode the signal in various representational formats” (Carroll, 2001, p.10).  


Sorry for the poor quality of the scan.

Jackendoff’s Representational Modularity

We now need to look at Jackendoff ‘s (1992) Representational Modularity. Jackendoff presents a theory of mind which contrasts with Fodor’s modular theory (where the language faculty constitutes a single module which processes already formed linguistic representations) by proposing that particular types of representation are sets belonging to different modules. The language faculty has several autonomous representational systems and information flows in limited ways from a conceptual system into the grammar via correspodence rules which connect the autonomous representational systems (Carroll, 2001, p. 121).

Jackendoff’s model has various cognitive faculties, each associated with a chain of levels of representation. The stimuli are the “lowest” level of representation, and “conceptual structures” are the “highest”. The chains intersect at various points allowing information encoded in one chain to influence the information encoded in another. This amounts to Jackendoff’s hypothesis of levels.


Here’s a partial model


Jackendoff  proposes that, in regard to language learning, the mind has three representational modules: phonology, syntax, and semantics, and that it also has interface modules which, by defining correspondence rules between representational formats, allow them to pass information along from the lowest to the highest level. This is important for Carroll, because, as we’ll see, the different modules are autonomous and so there must be a translation processor for each set of correspondence rules linking one autonomous representation type to another.

What Carroll wants from Jackendoff is “a clear picture of the functional architecture of the mind” (Carroll, 2001, p. 126), on which to build her induction model. In Part 2, I’ll deal with the Induction bit, but we must finish Part 1 by looking at other parts of Jackendoff’s work.

The Architecture of the Language Faculty

In The Architecture of the Language Faculty, Jackendoff  argues for the central part played in language by the lexicon. The lexicon is not part of one of his representational modules, but rather the central component of the interface between them. Lexical items include phonological, syntactic, and semantic content, and thus any lexical item is a set of three structures linked by correspondence rules. Furthermore, since lexical items are part of this general interface, there is no need to restrict them to word-sized elements–they can be affixes, single words, compound words, or even whole constructions, including MWUs, idioms, and so on. As Stephenson (1997) says: Simply put, the claim is that what we call the lexicon is not a distinct entity but rather a subset of the interface relations between the three grammatical subsystems. … Jackendoff’s proposal thus has the potential to provide a uniform characterization of morphological, lexical, and phrase-level knowledge and processes, within a highly lexicalized framework.

To bring this home, I offer two presentations by Jackendoff. In the first presentation, Jackendoff  argues that lexis only – “linear grammar” – paved the way for modern languages. It’s eloquent, to say the very least.

The main argument is, of course, the importance of the lexicon, but I think this diagram is particularly interesting.

Never mind the details, just that comprehending starts with percepual stimuli and goes through various levels of representation from lowest to highest, while speaking starts with responding to stimuli actively and goes in the opposite direction.

In the second presentation, Jackendoff talks about mentalism and formalism. Please skip to Minute 49.

There a handout fot this which I recommend you download and then follow. Click here.

In this presentation Jackendoff argues that we should abandon the assumption made by generative grammar that lexicon and grammar are fundamentally different kinds of mental representations. If the lexicon gets progressively more and more rule-like, and you erase the line between words and rules, then you slide down a slippery slope which ends up with HPSG (Head-driven phrase structure grammar), Cognitive Grammar, and Construction Grammar, which, he says, is “not so bad”.

So, we may well ask, is Jackendoff  a convert to UB theories? How can he be, if he bases his theory of Representational Modularity on the assumption of our possession of a modular mind? How can all this ‘mental representation’ stuff be reconciled with an empiricist view like N. Ellis’ which wants to explain language learning almost exclusively in terms of input from the environment? Part of the answer is, surely, that UB theory has a lot more mental stuff going on than it cares to recognise, but, in any case, I hope we can explore this further in Part 2, and I’d be very pleased if it leads to a lively discussion.  

To summarise then, Jackendoff (2000) replaces Chomsky’s generative grammar with the view that syntax is only one of several generative components. Lexical items are not, pace Chomsky, inserted into initial syntactic derivations, and then interpreted through processes of derivations, but rather, speech signals are processed by the auditory-to-phonology interface module to create a phonological representation. After that, the phonology-to-syntax interface creates a syntactic structure, which is then, aided by the syntax-to-semantics interface module, converted into a propositional structure, i.e. meaning. Which is why, when a lexical item becomes activated, it not only activates its phonology, but it also activates its syntax and semantics and thus “establishes partial structures in those domains” (Jackendoff, 2000: 25). The same but reversed process takes place in language production.

What does Suzanne Carroll make of it all? See my posts “Carroll’s Induction Theory Parts 2 and 3” for more (Use the Search Bar on the right).


Carroll, S. (2001) Input and Evidence. Amsterdam, Bejamins.

Ellis, N. C. (2019). Essentials of a theory of language cognition. Modern Language Journal, 103.

Fordor, J. (1987) The Modularity of mind. Cambridge, MA, MIT Press.

Gregg. K.R. (1993) Taking Explanation seriously. Applied Linguistics, 14, 3.  

Jackendoff, R.S. (1992) Language of the mind. Cambridge, Ma; MIT Press.

O’Grady, W. (2005)  How Children Learn Language. Cambridge, UK: Cambridge University Press

Schmidt,R. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Stevenson, S. (1997) A Review of The Architecture of the Language Faculty. Computational Linguistics, 24, 4.

Sun, Y.A. (2008) Input Processing in Second Language Acquisition: A Discussion of Four Input Processing Models. Working Papers in TESOL & Applied Linguistics, Vol. 8, No. 1.

Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.

Truscott, John (1998). “Noticing in second language acquisition: a critical review”  Second Language Research14 (2): 103–135.