Susanne Carroll’s AIT: Part 3

In this third part of my exploration of Carroll’s Autonomous Induction Theory (AIT), I’ll look at “categorization” and feedback. In what follows I try to speak for Carroll and I apologise for the awful liberties I’ve taken with her texts.  All the quotes come from Carroll (2002), unless otherwise cited.


A theory of SLA must start from a theory of grammar. When we look at the grammars of natural languages, we note that they differ in their repertoires of categories: words are divided into different segments, and sentences comprise different classes of words and phrases. But how? As a basic example, a noun is not reducible to the symbol ‘N’: word classes consist of sound-meaning correspondences, so a noun is a union of phonetic features, phonological structure, morphosyntactic features, morphological structure, semantic features and conceptual structure. As Jackendoff says, words are correspondences connecting levels of representation.


UG provides the representational primitives of each autonomous level of representation, and UG provides the operations which the parsers can perform. In other words, UG severely constrains the ways that the categories at different levels of representation are unified and project into hierarchical structure.


A theory of i-learning explains what happens when a parser fails, and a new constituent or a new procedure must be learned.

In the case of category i-learning, UG provides a basic repertoire of  features in  each autonomous representational system. Features will combine to form complex units at a given level: phonetic features combine to form segments (a timing unit of speech), morphosyntactic features combine to form morphemes (the basic unit of the morphosyntax),and primitive semantic features combine to form complex concepts like Agent, Male, Cause, Consequence, and so on.

But UG is not the whole story: the acquisition of basic units within an integrative processor will reflect various constraints on feature unification within the limits defined by “unification grammars”.

Some of these constraints will presumably also be attributable to UG. What these restrictions actually consist of, however, is an empirical question and our understanding of such issues has come, and will continue to come, largely from cross-linguistic and typological grammatical research.

Having constructed representations, learners then have to identify them as instances of a category. So SLA consists of learning the categories and the correspondence rules which apply to a specific L2. UG provides some correspondence rules which, in first language acquisition, are used by infants to learn the language specific mappings needed for rapid processing of the particularities of the L1 phonology and morphosyntax. These are carried over into SLA, as are all L1 correspondence rules, which leads to transfer problems.

AIT is embedded in a theory of the functional architecture of the language faculty and linked to theories of parsing and production. Autonomous representational systems of language work with  constrained processing modules in working memory. When parsing fails, acquisition mechanisms try to fix the problem. A correspondence failure can only be fixed by a change to a correspondence rule, and an integration problem can only be changed by a change to an integration procedure.

Very importantly, evidence for acquisition comes in the form of mental representations, not from the speech stream, except in the case of i-learning of acoustic patterns of the phonetics of the L2. Carroll explains:

In this respect, this theory differs radically from the Competition Model and from all theories which eschew structural representations in favour of direct mappings between the sound stream and conceptual representations. If correct, it means that simply noting the presence or absence of strings in the speech stream is not going to tell us directly what is in the input to the learning mechanisms.


The place of lexis needs special mention. Following Jackendoff, Lexical items have correspondence rules, linking phonological, morphosyntactic and conceptual representations of words.

Since the contents of lexical entries in SLA are known to be a locus of transfer, the contents of lexical entries will constitute a major source of information for the learning mechanisms in dealing with stimuli in the L2.

Carroll says (2001, p. 84) “In SLA, the major “bootstrapping” procedure may well be lexical transfer”. She says this in the context of arguing for the limited effects of UG on SLA, and I wish she’d said more.


So, a theory of SLA must start with a theory of linguistic knowledge, of mental grammars. Then, it has to explain how a mental grammar is restructured. After that, a theory of linguistic processing must explain how input gets into the system, thereby creating novel learning problems, and finally, a theory of learning must show how novel information can be created to resolve learning problems. I’ve covered all this, however badly, but more remains.


On page 31 of Input and Evidence we get a re-formulation of Carroll’s research questions:

I have to say that I see little of relevance in the next 300 pages, but the last three chapters do have a shot at answering them. I don’t think she does a good job of it, but that’s for Part 4. If you’re already exhausted, think how I feel about the task of telling you about it.

We must return again to Carroll’s most central claims (IMHO) that ‘input’ and ‘intake’ are badly defined theoretical constructs which make a bad starting point for any theory of SLA, and that consequent talk of ‘L1 transfer’; ‘noticing’; ‘negotiation of meaning’; and ‘ouput’ are similarly unsatisfactory components of a theory of SLA. The starting point should be stimuli from the environment, not linguistic input (whatever that is), and we must then explain how these stimuli get represented and successfully transformed into developing interlanguages. This demands not just a property theory to describe what is being developed, but a much better model of the learning mechanisms and the reasoning involved than is presently on offer.

A taster

Long’s Interaction Hypothesis states that the role of feedback is to draw the learner’s attention to mismatches between a stimulus and the learner’s output, and that they can learn a grammar on the basis of the “negotiation of meaning.” But what is  meant by these terms? For Carroll, “input” means stimulus, and “output” means what the learner actually says, so the claim is that the learner can compare a representation of their speech to a representation of the speech signal. Why should this help the learner in learning properties of the morphosyntax or vocabulary, since the learner’s problems may be problems of incorrect phonological or morphosyntactic structure? To restructure the mental grammar on the basis of feedback, the learner must be able to construct a representation at the relevant level and compare their  output — at the right level — to that.

It would appear then,…  that the Interaction Hypothesis presupposes that the learner can compare representations of her speech and some previously heard uttefance at the right level of analysis. But this strikes me as highly implausible cognitively speaking. Why should we suppose that learners store in longterm memory their analyses of stimuli at all levels of analysis? Why should we assume that they are storing in longterm memory all levels of analysis of their own speech?… Certainly nothing in current processing theory would lead us to suppose that humans do such things. On the contrary, all the evidence suggests that intermediate levels of the analysis of sentences are fleeting, and dependent on the demands of working memory, which is concerned only with constructing a representation of the sort required for the next level of processing up or down. Intermediate levels of analysis of sentences normally never become part of longterm memory. Therefore, it seems reasonable to suppose that the learner has no stored representations at the intermediate levels of analysis either of her own speech or of any stimulus heard before the current “parse moment.” Consequently, he cannot compare his output (at the right level of analysis) to the stimulus in any interesting sense….. Moreover, given the limitations of working memory, negotiations in a conversation cannot literally help the learner to re-parse a given stimulus heard several moments previously. Why not? Because the original stimulus will no longer be in a learner’s working memory by the time the negotiations have occurred. It will have been replaced by the consequences of parsing the last utterance from the NS in the negotiation. I conclude that there is no reason to believe that the negotiation of meaning assists learners in computing an input-output comparison at the right level of representation for grammatical restructuring to occur (Carroll, 2001, p. 291).

Preposterous, right, Mike?

Fun will finally ensue when, in Part 5, I get together with a bunch of well oiled chums in a video conference session to defend Carroll’s insistence on a property theory and a language faculty against the usage based (UB) hordes. Neil McMillan (whose Lacan in Lockdown: reflections from a small patio is eagerly awaited by binocular-wielding graffiti fans in Barcelona); Kevin Gregg (train schedule permitting); and Mike (‘pass the bottle’) Long are among the many who probably won’t take part.


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Carroll’s AIT Theory: Part 2

This is the second part of my exploration of Susanne Carroll’s theory of SLA. Carroll’s work is important, IMHO, because it questions many of the constructs used by SLA theorists, including ‘comprehensible input’, ‘processing’, ‘i +1’, ‘noticing’, ‘noticing the gap’, ‘L1 transfer’, ‘chunk learning’, and many others. By examining Carroll’s work, I think we can throw light on all these constructs and come to a better understanding of how people learn an L2.

In Part One, I looked at Carroll’s adoption of Jackendoff’s Representational Modularity (RM) theory; a theory of modular mind where each module contains levels of representation organised in chains going from the lowest to the highest. The “lowest” representations are stimuli and the “highest” are conceptual structures. This leads to the hypothesis of levels:

Selinker, Kim and Bandi-Rao (2004, p. 82) summarise RM thus:  

The language faculty consists of auditory input, motor output to vocal tract, phonetic, phonological, syntactic components and conceptual structure, and correspondence rules, various processors linking/regulating one autonomous representational type to another. These processors, domain specific modules, all function automatically and unconsciously, with the levels of modularity forming a structural hierarchy representationally mediated in both top-down and bottom-up trajectories.

And Carroll (2002) says:

What is unique to Jackendoff’s model is that it makes explicit that the processors which link the levels of grammatical representation are also a set of modular processors which map representations of one level onto a representation at another level. These processors basically consist of rules with an ‘X is equivalent to Y’ type format. There is a set of processors for mapping ‘upwards’ and a distinct set of processors for mapping ‘downwards’.

Bottom-up correspondence processors

a.         Transduction of sound wave into acoustic information

b.         Mapping of available acoustic information into phonological format.

c.         Mapping of available phonological structure into morphosyntactic format.

d.         Mapping of available syntactic structure into conceptual format.

Top-down correspondence processors

a.         Mapping of available syntactic structure into phonological format.

b.         Mapping of available conceptual structure into morphosyntactic format.

Integrative processors

a.         Integration of newly available phonological information into unified phonological structure.

b.         Integration of newly available morphosyntactic information into unified morphosyntactic structure.

c.         Integration of newly available conceptual information into unified conceptual structure.

(Jackendoff 1987, p. 102, cited in Carroll, 2002, p. 16).  


The second main component of Carroll’s AIT is induction. Induction is a form of reasoning which involves going from the particular to the general. The famous example (given in Philosophy for Idiots, Thribb, 17, cited in Dellar, passim) is of swans. You define a swan and then search lakes looking to see what colour particular examples of it are. All the swans you see in the first lake are white, and so are those in the second lake. Everywhere you look, they’re all white, so you conclude that “All swans are white”. That’s induction. Hume (see Neil McMillan (unpublished) The influence of Famous Scottish Drunkards on Lacard’s psychosis; a bipolar review) famously showed that induction is illogical – no inference from the particular to the general is justified. No matter how many white swans you observe, you’ll never know that they’re ALL white, that there isn’t a non-white swan lurking somewhere, so far unobserved. Likewise, you can’t logically induce that because the sun has so far always risen in the East that it will rise in the East tomorrow. Popper “solved” this conundrum by saying that we’ll never know the truth about any general theory or generalisation, so we just have to accept theories “tentatively”, testing them in attempts not to prove them (impossible) but, rather, to falsify them. If they withstand these tests, we accept the theory, tentatively, as “true”.    

The assumption of all SLA “cognitive processing” transition theories is that the development of interlanguages depends on the gradual reformulation of the learner’s mental conceptualisations of the L2 grammar. These reformulations can be seen as following the path suggested by Popper to get to reliable knowledge:

P1 -> TT¹ -> EE -> P2 -> TT², etc.

P = problem

TT = tentative theory

EE = testing for empirical evidence which conflicts with TT   

You start with a problem and you leap to a tentative theory (TT) and then you test it, trying to falsify it with empirical evidence. If you find such contradictory evidence, you have a problem, and you re-formulate the theory (TT²) which tries to deal with the problem, and you then test again, and round we go again, slowly improving the theory. Popper is talking about hypothesis testing and theory construction in the hard sciences (particularly physics), and while it’s a long way from describing what scientists actually do, it’s even further away from describing what L2 learners do in developing interlanguages. Nevertheless, it’s common to hear people describing SLA as hypothesis formation and hypothesis testing.

We could, I suppose, see the TT1¹ as the learner’s initial interlanguage theory. Then, at any given point in its trajectory, the theory gets challenged by evidence that doesn’t fit (perhaps went when goed is expected, for example) and the problem is resolved by a new, more sophisticated theory, the TT². But it doesn’t work – interlanguage development is not a matter of hypothesis formation and testing in Popper’s sense, and I agree with Carroll that it’s “a misleading metaphor”. In her view, SLA is a process of “learning new categories of the grammar, new structural arangements in on-line parses and hence new parsing procedures and new productive schemata” (Carroll, 2001, p. 32). Still, Hume’s problem of underdeterminism remains – the inductions that learners are said to make aren’t strictly logical. (“Just saying” (McMillan, ibid)).  

So anyway, Carroll wants to see SLA development (partly) as a process of induction. The most respectable theory of induction is inference to the best explanation, also known as abduction, and I think Lipton (1991) provides the best account, although Gregg (1993) does a pretty good job of it in a couple of pages (adeptly including a concise account of  Hempel’s D-N model, by the way). Carroll, however, ducks the issues and follows Holland et al., (1986), who define induction as a set of procedures which lead to the creation and/or refinement of the constructs which form mental  models (MMs). Mental models are “temporary and changing complex conceptual representations of specific situations”.  Carroll gives the example of a Canadian’s MM of a breakfast event, versus, say, the very different one of a Japanese MM breakfast event. MMs are domains of knowledge, schemata, if you like, and Carroll makes lots of use of MMs which I’m going to skip over. She then goes into considerable detail about categorising MMs, and then procedes to “Condition-action rules” which govern induction. These are competition rules which share ideas from abduction in as much as they say “When confronted with competing solutions to a problem, choose the most likely, the best ‘fit’ ”.

 Carroll (2001, p. 170) finally (sic) defines induction as a process

leading to revision of representations so that they are consistent with information currently represented in working memory. Its defining property is that it is rooted in stimuli made available to the organism through the perceptual system, coupled with input from Long Term Memory and current computations. … the results of i-learning depend upon the contents of symbolic representations.


Carroll’s theory of learning rests on i-learning (as opposed to ‘I language’ in Chomsky’s sense, which has very little to do with it, and one can only wish she’d chosen some other term, rather like Long’s unhappy choice of “Focus on FornS”). I-learning depends on the contents of symbolic representations being computed by the learning system.

At the level of phonetic learning content of phonetic representations, to be defined in terms of acoustic properties. At the level of phonological representation, i-learning will depend on the content of phonological representations, to be defined in terms of  prosodic  categories, and featural specification ef segments. At the level of morphosyntactic learning, i-learning will depend upon the content ‹if morphosyntactic representations. And so on.

So, it seems, i-learning  goes on autonomously within all the parts of Lackendoff’s theory of modularity,  not just in the conceptual representational system. (I take it that this is where Carroll’s ‘competition’ comes in – analysing a novel form involves competition  among various information sources from different levels.) Anyway, the key point is that i-learning is triggered by the failure of current  representations  to “fit” current models in conjunction with specific environmental stimuli.

More light

I usually don’t comment on my choice of images, but the above image shows Goethe on his death bed. His wonderful dying words were, according to his doctor, Carl Vogel, “Mehr Licht!” And I can’t help sharing this anecdote. In my first seminar, in my first term of my first year at LSE, I read a paper presided over by Imre Lakatos, one of the finest scholars I’ve ever met, and later a friend who committed perjory in court to help me avoid being found guilty of a criminal charge. The paper was about German developments in science, and I mentioned Goethe, whose name I pronounced ‘Go eth’. Lakatos was drinking a coffee at the moment when I said “Go eth” and reacted very violently. He spat the coffee out, all over the alarmed students sitting round the table in his study, jumped to his feet, and shouted hysterically: “I fail to understand how anybody who’s been accepted into this university can so hopelessly mispronounce the name of Germany’s most famous poet!”

I use Goethe’s dying words here to refer to Carroll’s 2002 paper, which really does throw more light on her difficult-to-follow 2001 work.

In her (2002) account of I-learning, Carroll argues that researching the nature of induction in language acquisition requires the notion of a UG, which describes the properties of grammatical knowledge shared by all human languages. The psycholinguistic processes which result in this knowledge are constrained by UG – which, she insists, doesn’t mean that “UG is thereby operating on-line in any fashion or indeed is stored anywhere to be consulted, as one might infer from much generative SLA research” (Carroll, 2002, p. 11).

Carroll goes on to say that a speaker’s I-language consists of a particular combination of universal and acquired contents, so that a theory of SLA must explain not only what is universal in our mental grammars, but also what is different both among speakers of the same E-language and among the various E-languages of the world.

In order to have a term to cover a UG-compatible theory of acquisition, as well as to make an explicit connection to I-language, I suggest we call such a theory of acquisition a theory of i(nductive)-learning, specifically the Autonomous Induction Theory (Carroll, 2002, p.12).

In other words, while Chomsky is concerned with explaining I-language, Carroll is concerned with explaining the much wider construct of I-learning; she wants to integrate a theory of linguistic competence with theories of performance. So, it goes like this:

The perception of speech, the recognition of words, and the parsing of sentences in the L2 requires the application of unit detection and structure-building procedures. When those procedures are in place, speech processing is perfomed satisfactorily. But when the procedures are not available, (e.g., to the beginning L2 learner), speech proccessing will fail, forcing the learner to fall back on inferences from the context, stored knowledge, etc. But, of course, beginners have very few such rescources to draw on, and so interpretation of the stimulus will fail, which is when i-learning mechanisms will be activated.

When speech detection, word recognition, or sentence parsing fail,… only the i-learning mechanisms can fix the problem . They go into action automatically and unconsciously (Carroll, 2002, p13).

To start with then, the learner hears the speech stream as little more than noise. Comprehension depends on their learning the right cues to syllable edges and and the segments which comprise the syllables. Only once these cues to the identification of phonological units has been learned can word learning begin. After that, form-extraction processes which map some unit or other of the phonology onto a morphosyntactic word will allow the learner to hear a form in the speech stream, but still without necessarily knowing what it means. After that, when learners can identify words and know what they mean, they might still lack the parsing procedures needed  to use morphosyntactic cues to arrive at the correct sentence structure and hence arrive at the correct sentence meaning. Either they fail to arrive at any interpretation,  or they arrive at the wrong one – their semantic representation isn’t the same as what was intended by the speaker. Finally, their i-learning allows them to get the right meaning – the parsers can now do their job satisfactorily.

Recall what was said in Part 1: Krashen got it backwards! This is the real thrust of Carroll’s argument: input must be seen not as linguistic stuff coming straight from the environment, but rather as stuff that results from processes going on in the mind which call on innate knowledge. Furthermore: YOU CAN’T NOTICE GRAMMAR!

So there you have it. Except that, really, that’s nowhere near “it”. Carroll admits that her theory doesn’t explain what the acquisition mechanism does when parsing breaks down. She asks:

How does the mechanism move beyond that point of parse, and what are the constraints on the solution the learner alights on?  Why do the acquisition mechanisms which attempt to restructure the existing parsing procedures and correspondence rules to deal with a current parse problem often fail?”

The answers partly lie in Carroll’s investigation of  “Categories and categorization” and partly in the roles of feedback and correction. In an early refomulation of her research questions in Input and Evidence, Carroll emphasises the importance of feedback and correction to her work, which points to her important contributions to examining the empirical evidence found in the SLA literature, and also highlights some of the ways in which this evidence has been (mis)used. All this will be discussed in Part 3, where I’ll also look at what Carroll’s AIT has to say about explicit and implicit learning, and about what some of today’s gurus in ELT might learn from Carroll’s work.

This is a blog post, not an academic text. I’m exploring Carroll’s work, and I’ve no doubt made huge mistakes in describing and interpreting it. I await correction. But I hope it will provoke discussion among the many ELT folk who enjoy shooting the breeze about important questions which have a big impact (or should I say ‘impact big time’) on how we organise and implement teaching programmes.


Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.

Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.

Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Lipton, P.: (1991) Inference to the Best Explanation . London: Routledge.

Popper, K. R. (1972)  Objective Knowledge.  Oxford: Oxford University Press.

Selinker, L., Kim, D. and Bandi-Rao, S. (2004) Linguistic structure with processing in second language research: is a ‘unified theory’ possible? Second Language Research 20,1, pp. 77–94

The place of Jackendoff’ Representational Modularity Theory in Carroll’s Autonomous Induction Theory


Jackendoff’s Representational Modularity Theory (Jackendoff, 1992) is a key component in Susanne Carroll’s Autonomous Induction Theory, as described in her book Input and Evidence (2001). Carroll’s book is too often neglected in the SLA literature, and I think that’s partly because it’s very demanding. Carroll goes into so much depth about how we learn; she covers so much ground in so much methodical detail; she’s so careful, so thorough, so aware of the complexities, that if you start reading her book without some previous understanding of linguistics, the philosophy of mind, and the history of SLA theories, you’ll find it very tough going. Even with some such understanding of these matters, I myself find the book extremely challenging. Furthermore, the text is dense and often, in my opinion, over elaborate; you have to be prepared to read the text slowly, and at the same time keep on reading while not at all sure where the argument’s going, in order to “get” what she’s saying.

One criterion for judging theories of SLA is an appeal to Occam’s Razor: ceteris paribus (all other things being equal), the theory with the simplest formula, and the fewest number of basic types of entity postulated, is to be preferred for reasons of economy. Carroll’s theory scores badly here: it’s complicated! Her use of Jackendoff’s theory, and of the Induction Theory of Holland means that her theory of SLA counts on a variety of formula and entities, and thus it’s not “economical”. On the other hand, it’s one of the most complete theories of SLA on offer.

Over the years, I’ve spent weeks reading Carroll’s Input and Evidence, and now, while reading it yet again in “lockdown”, I’m only just starting to feel comfortable turning from one page to the next. But it’s worth it: it’s a classic; one of the best books on SLA ever, IMHO, and I hope to persuade you of its worth in what follows. I’m going to present The Autonomous Induction Theory (AIT) in an exploratory way, bit by bit, and I hope we’ll end up, eventually, with some clear account of AIT and what it has to say about second language learning, and its implications for teaching.

To the issues, then.

UG versus UB

In the current debate between Chomsky’s UG theory and more recent Usage-based (UB) theories of language and language learning, most of those engaged in the debate see the two theories as mutually contradictory: one is right and the other is wrong. One says language is an abstract system of form-meaning mappings governed by a grammar (in Chomsky’s case a deep grammar common to all natural languages as described in the Principles and Parameters version of UG), and this knowledge is learned with the help of innate properties of the mind. The other says language should be described in terms of its communicative function; as Saussure put it “linguistic signs arise from the dynamic interactions of thought and sound – from patterns of usage”. The signs are form-meaning mappings; we amass a huge collection of them through usage; and we process them by using relatively simple, probabilistic algorithms based on frequency.

O’Grady (2005) has this to say:

The dispute over the nature of the acquisition device is really part of a much deeper disagreement over the nature of language itself. On the one hand, there are linguists who see language as a highly complex formal system that is best described by abstract rules that have no counterparts in other areas of cognition. (The requirement that sentences have a binary branching syntactic structure is one example of such a “rule.”) Not surprisingly, there is a strong tendency for these researchers to favor the view that the acquisition device is designed specifically for language. On the other hand, there are many linguists who think that language has to be understood in terms of its communicative function. According to these researchers, strategies that facilitate communication – not abstract formal rules – determine how language works. Because communication involves many different types of considerations (new versus old information, point of view, the status of speaker and addressee, the situation), this perspective tends to be associated with a bias toward a multipurpose acquisition device.

Susanne Carroll tries to take both views into account.

Property Theories and Transition Theories of SLA

Carroll agrees with Gregg (1993) that any theory of SLA has to consist of two parts:

1) a property thory which describes WHAT is learned,

2) a transition theory which explains HOW that knowledge is learned.

As regards the property theory, it’s a theory of knowledge of language, describing the mental representations that make up a learner’s grammar – which consists of various classifications of all the components of language and how they work together. What is it that is represented in the learner’s knowledge of the L2? Chomsky’s UG theory is an example; Construction grammar is another; The Competition Model of Bates & MacWhinney (1989, cited in Carroll, 2001) is another; while general knowledge representations, and forms of rules of discourse, Gricean maxims , etc. are, I suppose also candidates.

Transition theories of SLA explain how these knowledge states change over time. The changes in the learner’s knowledge, generally seen as progress towards a more complete knowledge of the target language, need to be explained by appeal to a causal mechanism by which one knowledge state develops into another.

Many of the most influential cognitive processing theories of SLA (Chaudron, 1985; Krashen, 1982; Sharwood Smith, 1986, Gass, 1997, Towell & Hawkins, 1994, cited in Carroll, 2001) concentrate on a transition theory. They explain the process of L2 learning in terms of the development of interlanguages , while largely ignoring the property theory, which they sometimes, and usually vagely, assume is dealt with by UG. New UB theories (e.g. Ellis, 2019; Tomesello, 2003) reject Chomsky’s UG property theory and rely on what Chomsky regards as performance data for a description of the language in terms of a Construction Grammar. More importantly, perhaps, their ‘transition theory’ makes a minimal appeal to the workings of the mind; they’re at pains to use quite simple general learning mechanisms to explain how “associative” learning, acting on input from the environment, explains language learning.

Mentalist Theories

Carroll bases her approach on the view that humans have a unique, innate capacity for language, and that language learning goes on in a modular mind. Here, I’ll leave discussions about the philosophy of mind to one side, but suffice it to say for now that ‘mind’ is a theoretical construct referring to a human being’s world of thought, feeling, attitude, belief and imagination. When we talk about the mind, we’re not talking about a physical part of the body (the brain), and when we talk about a modular mind, we’re not talking about well-located, separate parts of the brain.

Carroll rejects Fodor’s (1983) claim that the language faculty comprises a single language module in the mind’s architecture, and she sees Chomsky’s LAD as an inadequate description of the language faculty. Rather than accept that language learning is crucially explained by the workings of a “black box”, Carroll explores the mechanisms of mind more closely, and, following Jackendoff, suggests that the language faculty operates at different levels, and is made up of a chain of mental representations, with the lowest level interacting with physical stimuli, and the highest level interacting with conceptual representations. Processing goes on at each level of representation, and a detailed description of these representations explains how input is processed for parsing.

Carroll further distinguishes between processing for parsing and processing for learning, such that, in speech, for example, when the parsers fail to get the message, the learning mechanisms take over. Successful parsing means that the processors currently at the learner’s disposal are able to use existing rules which categorize and combine representations to understand the speech signal. When the rules are inadequate or missing, parsing breaks down; and in order to deal with this breakdown, the known rule that helps most in parsing the problematic item of input is selected and subsequently adapted or refined until parsing succeeds at that given level. As Sun (2008) summarises “This procedure explains the process of acquisition, where the exact trigger for acquisition is parsing failure resulting from incomprehensible input”.

Scholars from Krashen to Gass take ‘input’ and ‘intake’ as the first two necessary steps in the SLA process (Gass’s model suggests that input passes through the stages of “apperceived” and “comprehended” input before becoming ‘intake’), and ‘intake’ is regarded as the set of processed structures waiting to be incorporated into interlanguage grammar. The widely accepted view that in order for input to become intake it has to be ‘noticed’, as described by Schmidt in his influential 1990 paper, has since, as the result of criticism (see, for example, Truscott, 1998) been seriously modified so that it now approximate to Gass’ ‘apperception’ (see Schmidt 2001, 2010), but it’s still widely seen as an important part of the SLA process.

Processing Theories of SLA

Caroll, on the other hand, sees input as physical stimuli, and intake as a subset of this stimuli.

The view that input is comprehended speech is mistaken. Comprehending speech ..happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards! (Carroll, 2001, p. 78).

Referring not just to Krashen, but to all those who use the constructs ‘input’, ‘intake’ and ‘noticing’, Gregg (in a comment on one of my blog posts) makes the obvious, but crucial point: “You can’t notice grammar”! Grammar consists of things like nouns and verbs, which, are, quite simply, not empirically observable things existing “out there” in the environment, waiting for alert, on-their-toes learners to notice them.

So, says Carroll, language learning requires the transformation of environmental stimuli into mental representations, and it’s these mental representations which must be the starting point for language learning. In order to understand speech, for example, properties of the acoustic signal have to be converted to intake; in other words, the auditory stimulus has to be converted into a mental representation. “Intake from the speech signal is not input to leaning mechanisms, rather it is input to speech parsers. … Parsers encode the signal in various representational formats” (Carroll, 2001, p.10).  


Sorry for the poor quality of the scan.

Jackendoff’s Representational Modularity

We now need to look at Jackendoff ‘s (1992) Representational Modularity. Jackendoff presents a theory of mind which contrasts with Fodor’s modular theory (where the language faculty constitutes a single module which processes already formed linguistic representations) by proposing that particular types of representation are sets belonging to different modules. The language faculty has several autonomous representational systems and information flows in limited ways from a conceptual system into the grammar via correspodence rules which connect the autonomous representational systems (Carroll, 2001, p. 121).

Jackendoff’s model has various cognitive faculties, each associated with a chain of levels of representation. The stimuli are the “lowest” level of representation, and “conceptual structures” are the “highest”. The chains intersect at various points allowing information encoded in one chain to influence the information encoded in another. This amounts to Jackendoff’s hypothesis of levels.


Here’s a partial model


Jackendoff  proposes that, in regard to language learning, the mind has three representational modules: phonology, syntax, and semantics, and that it also has interface modules which, by defining correspondence rules between representational formats, allow them to pass information along from the lowest to the highest level. This is important for Carroll, because, as we’ll see, the different modules are autonomous and so there must be a translation processor for each set of correspondence rules linking one autonomous representation type to another.

What Carroll wants from Jackendoff is “a clear picture of the functional architecture of the mind” (Carroll, 2001, p. 126), on which to build her induction model. In Part 2, I’ll deal with the Induction bit, but we must finish Part 1 by looking at other parts of Jackendoff’s work.

The Architecture of the Language Faculty,

 In The Architecture of the Language Faculty, Jackendoff  argues for the central part played in language by the lexicon. The lexicon is not part of one of his representational modules, but rather the central component of the interface between them. Lexical items include phonological, syntactic, and semantic content, and thus any lexical item is a set of three structures linked by correspondence rules. Furthermore, since lexical items are part of this general interface, there is no need to restrict them to word-sized elements–they can be affixes, single words, compound words, or even whole constructions, including MWUs, idioms, and so on. As Stephenson (1997) says: Simply put, the claim is that what we call the lexicon is not a distinct entity but rather a subset of the interface relations between the three grammatical subsystems. … Jackendoff’s proposal thus has the potential to provide a uniform characterization of morphological, lexical, and phrase-level knowledge and processes, within a highly lexicalized framework.

To bring this home, I offer two presentations by Jackendoff. In the first presentation, Jackendoff  argues that lexis only – “linear grammar” – paved the way for modern languages. It’s eloquent, to say the very least.

The main argument is, of course, the importance of the lexicon, but I think this diagram is particularly interesting.

Never mind the details, just that comprehending starts with percepual stimuli and goes through various levels of representation from lowest to highest, while speaking starts with responding to stimuli actively and goes in the opposite direction.

In the second presentation, Jackendoff talks about mentalism and formalism. Please skip to Minute 49.

There a handout fot this which I recommend you download and then follow. Click here.

In this presentation Jackendoff argues that we should abandon the assumption made by generative grammar that lexicon and grammar are fundamentally different kinds of mental representations. If the lexicon gets progressively more and more rule-like, and you erase the line between words and rules, then you slide down a slippery slope which ends up with HPSG (Head-driven phrase structure grammar), Cognitive Grammar, and Construction Grammar, which, he says, is “not so bad”.

So, we may well ask, is Jackendoff  a convert to UB theories? How can he be, if he bases his theory of Representational Modularity on the assumption of our possession of a modular mind? How can all this ‘mental representation’ stuff be reconciled with an empiricist view like N. Ellis’ which wants to explain language learning almost exclusively in terms of input from the environment? Part of the answer is, surely, that UB theory has a lot more mental stuff going on than it cares to recognise, but, in any case, I hope we can explore this further in Part 2, and I’d be very pleased if it leads to a lively discussion.  

To summarise then, Jackendoff (2000) replaces Chomsky’s generative grammar with the view that syntax is only one of several generative components. Lexical items are not, pace Chomsky, inserted into initial syntactic derivations, and then interpreted through processes of derivations, but rather, speech signals are processed by the auditory-to-phonology interface module to create a phonological representation. After that, the phonology-to-syntax interface creates a syntactic structure, which is then, aided by the syntax-to-semantics interface module, converted into a propositional structure, i.e. meaning. Which is why, when a lexical item becomes activated, it not only activates its phonology, but it also activates its syntax and semantics and thus “establishes partial structures in those domains” (Jackendoff, 2000: 25). The same but reversed process takes place in language production.

What does Suzanne Carroll make of it all? Can you make do with Netflix till the next exciting episode comes along??

Well, well. I hope you find this half as interesting as I do. Onward through the fog.


Carroll, S. (2001) Input and Evidence. Amsterdam, Bejamins.

Ellis, N. C. (2019). Essentials of a theory of language cognition. Modern Language Journal, 103.

Fordor, J. (1987) The Modularity of mind. Cambridge, MA, MIT Press.

Gregg. K.R. (1993) Taking Explanation seriously. Applied Linguistics, 14, 3.  

Jackendoff, R.S. (1992) Language of the mind. Cambridge, Ma; MIT Press.

O’Grady, W. (2005)  How Children Learn Language. Cambridge, UK: Cambridge University Press

Schmidt,R. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Stevenson, S. (1997) A Review of The Architecture of the Language Faculty. Computational Linguistics, 24, 4.

Sun, Y.A. (2008) Input Processing in Second Language Acquisition: A Discussion of Four Input Processing Models. Working Papers in TESOL & Applied Linguistics, Vol. 8, No. 1.

Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.

Truscott, John (1998). “Noticing in second language acquisition: a critical review”  Second Language Research14 (2): 103–135.


In a recent blog post I said:

“UB theories are increasingly fashionable, but I’m still not impressed with Construction Grammar, or with the claim that language learning can be explained by appeal to noticing regularities in the input”.

Scott Thornbury replied:

“Read Taylor’s The Mental Corpus (2012) and then say that!”.  

Well I’ve had a second go at reading Taylor’s book, and here’s my reply, based partly on a review by Pieter Seuren

Taylor’s thesis is that knowledge of a language can be conceptualized in terms of the metaphor of a “mental corpus”. Language knowledge is not knowledge of grammar rules, but rather  “accumulated memories of previously encountered utterances and the generalizations which arise from them.” Everybody remembers everything in the language that they have ever heard or read. That’s to say, they remember everything they’ve ever encountered linguistically, including phonetics, the context of utterances, and precise semantic, morphological and syntactic form. Readers may well think that this has much in common with Hoey’s (2006) theory, and that’s not where the similarities end: like Hoey, Taylor offers no explanaton of how people draw on this “literally fabulous” memory. Taylor says nothing about the  formula of analysis in memory; nothing about the internal structure of that memory; nothing about how speakers actually draw on it; nothing about the kind of memory involved; “in short, nothing at all”.

Seuren argues that while there’s no doubt that speakers often fall back on chunks drawn holistically from memory, they also use rules. Thus, criticism of Chomsky’s UG is no argument against the viability of any research programme involving the notion of rules.

Taylor never considers the possibility of different models of algorithmically organized grammar. One obvious possibility is a grammar that converts semantic representations into well-formed surface structures, as was proposed during the 1970s in Generative Semantics. One specific instantiation of such a model is proposed in my book Semantic Syntax (Blackwell, Oxford 1996), … This model is totally non-Chomskyan, yet algorithmic and thus rule-based and completely formalized. But Taylor does not even consider the possibility of such a model.

Without endorsing Seuran’s model of grammar, or indeed his view of language, I think he makes a good point. He concludes

Apart from the inherent incoherence of Taylor’s ‘language-as-corpus’ view, the book’s main fault is a conspicuous absence of actual arguments: it’s all rhetoric, easily shown up to be empty when one applies ordinary standards of sound reasoning. In this respect, it represents a retrograde development in linguistics, after the enormous methodological and empirical gains of the past half century.

In a tweet, Scott Thornbury points to Martin Hilbert’s (2014) more favourable review of Taylor’s book, but neither he nor I have a copy of it, so we’ll have to wait while Scott gets hold of it.

Meanwhile, let’s return to the usage based (UB) theory claim that language learning can be explained by appeal to noticing regularities in the input, and that Construction Grammar is a good way of describing the regularities that are noticed in this way.

Dellar & Walkley, and Selivan, the chief proponents of “The Lexical Approach”, can hardly claim to be the brains of the UB outfit, since they all misrepresent UB theory to the point of travisty. But there are, of course, better attempts to describe and explain UB theory, most noteably by Nick Ellis (see, for example Ellis, 2019). Language can be described in terms of constructions (see Wuff & Ellis, 2018), and language acquisition can be explained by simple learning mechanisms, which boil down to detecting patterns in input: when exposed to language input, learners notice frequencies and discover language patterns. As Gregg (2003) points out, this amounts to the claim that language learning is associative learning.

When Ellis, for instance, speaks of ‘learners’ lifetime analysis of the distributional characteristics of the input’ or the ‘piecemeal learning of thousands of constructions and the frequency-biased abstraction  of regularities’, he’s talking of association in the standard empiricist sense.

Here we have to pause and look at empiricism, and its counterpart rationalism.

Empiricists claim that sense experience is the ultimate source of all our concepts and knowledge. There’s no such thing as “mind”; we’re born with a “tabula rasa”, our brain an empty vessel that gets filled with our experiences and so our knowledge is a posteriori, dependent wholly upon our history of sense experience. Skinner’s version of Behaviourism serves as a model. Language learning, like all learning, is a matter of associating one thing with another, with habit formation.      

Compare this to Chomsky’s view that what language learners get from their experiences of the world can’t explain their knowledge of their language: a better explanation is that learners have an innate knowledge of a universal grammar which captures the common deep structure of all natural languages. A set of innate capacities or dispositions enables and determines their language development. In my opinion, there’s no need to go back to the historical debate between rationalists like Descartes and empiricists like Locke: indeed, I think that these comparisons are often misleading, usually because those that use them to argue for UB theories give a very distorted description of Descartes, and fail to appreciate the full implications of adopting an empiricist view. What’s important is that the empiricism adopted by Nick Ellis, Tomasello and others today is a less strict version than the original, – talk of mind and reason is not proscibed, although for them, the simpler the mechanisms employed to explain learning, the better.

Chomsky is the main target; motivated by the desire to get rid of any “black box” and any appeal to inference to the best explanation when confronted by arguments about the poverty of the stimulus, the UB theorists appeal to frequency, Zipfian distribution, power laws, and other flimsy bits and pieces in order to replace the view that language competence is knowledge of a language system which enables speakers to produce and understand an infinite number of sentences in their language, and to distinguish grammatical sentences from ungrammatical sentences; while language learning goes on in the mind, equipped with a special language learning module to help interpret the stream of input from the environment. Such a view led to theories of SLA which see the L2 learning process as crucially a psycholinguistic process involving the development of an interlanguage, where L2 learners gradually approximate to the way native speakers use the target language.

We return to Nick Ellis’s view. Language is a collection of utterances whose regularities are explained by Construction Grammar, and language learning is based on associative learning – the frequency-biased abstraction of regularities. I’ve already expressed the view that Construction Grammmar seems to me little more than a difficult to grasp taxonomy, an a posteriori attempt to classify bits of attested language use collected from corpora; while the explanation of how we learn this grammar relies on associative learning processes which do nothing to adequately explain SLA or what children know about their L1. Here’s a bit more, based on the work of Kevin Gregg, whose view of theory construction in SLA is more eloquently stated and more carefully argued than that of any scholar I’ve ever read.

N. Ellis claims that language emerges from relatively simple developmental processes being exposed to a massive and complex environment. Gregg (2003) uses the example of the concept SUBJECT to challenge Ellis’ claim.

 The concept enters into various causal relations that determine various outcomes in various  languages: the form of relative clauses in English, the  assignment  of reference in Japanese anaphoric  sentences, agreement  markers on verbs, the existence of expletives in  some languages, the form of the verb in others, the possibility of certain null arguments in still others and so on.

Ellis claims that the concept SUBJECT emerges; it’s the effect of environmental influences that act by forming associations in the speaker’s mind such that the speaker comes to have the relevant concepts as specified by the linguist’s description. But how can the environment provide the necessary information, in all languages, for all learners to acquire the concept?  What sort of environmental information could be instructive in the right ways, and how does this information act associatively?

Gregg comments:

Frankly, I do not think the emergentist has the ghost of a chance of showing this, but what I think hardly matters. The point is that so far as I can tell, no emergentist has tried. Please note that connectionist simulations, even if they were successful in generalizing beyond their training sets, are beside the point here. It is not enough to show that a connectionist model could learn such and such: In order to underwrite an emergentist claim about language learning, it has to be shown that the model uses information analogous to information that is to be found in the actual environment of a human learner. Emergentists have been slow, to say the least, to meet this challenge.

Amen to that.

So, the choice is yours. If you choose to accept Dellar’s account of language and language learning, then you base your teaching on the worst “principles” of language and language learning in print. If you choose to follow Nick Ellis’ account, then you’ll probably have to pass on trying to figure out Construction Grammar, or explaining not just what children know about their L1 with zero input from the environment, but also how associative learning explains adult L2 learning trajectories as reported in hundreds of studies over the last 50 years. If you choose to accept one or another cognitive, psycholinguistic theory of SLA which sees L2 learning as a process of deleoping interlanguages, then you are left with the problem of providing what Gregg refers to as the property theory of SLA – In what does the capacity to use an L2 consist?; What are the properties of the language which is learned in this way? Chomsky’s explanation of language and language learning might well be wrong, but it’s still the best description of language competence on offer, (language, quite simply, is not exclusively a tool for social interaction), and it’s still the best explanation of what children know about language and how they come to know it.  


Ellis, N. (2019) Essentials of a Theory of Language Cognition. The Modern Language Journal, 103 (Supplement 2019).

Seuren, P. (1996) Semantic Syntax. Oxford: Blackwell.

Wuff. S. and Ellis, N.  (2018) Usage-based approaches  to second language acquisition. Downloadable here:

A Review of “Teaching Lexically” (2016) by H. Dellar and A. Walkley

(Note: I’ve moved this post from its old place in my “stuff” to here because the old blog is getting increasingly difficult to access and to edit.)

Teaching Lexically is divided into three sections.

Part A. 

We begin with The 6 principles of how people learn languages:

“Essentially, to learn any given item of language, people need to carry out the following stages:

  • Understand the meaning of the item.
  • Hear/see an example of the item in context.
  • Approximate the sounds of the item.
  • Pay attention to the item and notice its features.
  • Do something with the item – use it in some way.
  • Repeat these steps over time, when encountering the item again in other contexts” 

These “principles” are repeated in slightly amplified form at the end of Part A, and they inform the “sets of principles” for each of the chapters in Part B.

Next, we are told about Principles of why people learn languages

These “principles” are taken en bloc from the Common European Framework of Reference for languages.  The authors argue that teachers should recognise that

“for what is probably the majority of learners, class time is basically all they may have spare for language study. [This] … “emphasises how vital it is that what happens in class meets the main linguistic wants and needs of learners, chiefly:

  • To be able to do things with their language.
  • To be able to chat to others.
  • To learn to understand others cultures better”.   

We then move to language itself.

Two Views of Language

1. Grammar + words + skills

This is the “wrong” view, which, according to Dellar and Walkley, most people in ELT hold. It says that

language can be reduced to a list of grammar structures that you can drop single words into.

The implications of this view are:

  1. Grammar is the most important area of language. …The examples used to illustrate grammar are relatively unimportant. …It doesn’t matter if an example used to illustrate a rule could not easily (or ever) be used in daily life.
  2. If words are to fit in the slots provided by grammar, it follows that learning lists of single words is all that is required, and that any word can effectively be used if it fits a particular slot.
  3. Naturalness, or the probable usage of vocabulary, is regarded as an irrelevance; students just need to grasp core meanings.
  4. Synonyms are seen as being more or less interchangeable, with only subtle shades of meaning distinguishing them.
  5. Grammar is acquired in a particular order – the so-called “buildings blocks” approach where students are introduced to “basic structures”, before moving to “more advanced ones”.
  6. Where there is a deficit in fluency or writing or reading, this may be connected to a lack of appropriate skills. These skills are seen as existing independently of language .

2. From words with words to grammar

This is the “right” view, and is based on the principle that “communication almost always depends more on vocabulary than on grammar”. The authors illustrate this view by taking the sentence

I’ve been wanting to see that film for ages.

They argue that “Saying want see film is more  likely to achieve the intended communicative message than only using what can be regarded as the grammar and function words I’ve been –ing to that for. “

The authors go on to say that in daily life the language we use is far more restricted than the infinite variety of word combinations allowed by rules of grammar. In fact, we habitually use the same chunks of language, rather than constructing novel phrases from an underlying knowledge of “grammar + single words”.  This leads the authors to argue the case for a lexical approach to teaching  and to state their agreement with Lewis’ (1993) view that

 teaching should be centred around collocation and chunks, alongside large amount of input from texts.  

They go on:

From this input a grasp of grammar ‘rules’ and correct usage would emerge. 

The authors cite Hoey’s Lexical Priming (2005) as giving theoretical support for this view of language.  They explain Hoey’s view by describing the example Hoey gives of the the two words “result” and “consequence”. While these two words are apparently synonymous, they function in quite different ways, as can be seen in statistics from corpora which show when and how they are used.  Dellar and Walkley continue:

Hoey argues that these statistical differences must come about because, when we first encounter these words (he calls such encounters ‘primings’) our brains somehow subconsciously record some or all of this kind of information about the way the words are used. Our next encounter may reaffirm – or possibly contradict – this initial priming, as will the next encounter, and the one after that – and so on. 

The authors go on to explain how Hoey uses “evidence from psycholinguistic studies” to support the claim that we remember words not as individual units, but rather, in pairings and in groups, which allows for quicker and more accurate processing. Thus,

 spoken fluency, the speed at which we read and the ease and accuracy with which we listen may all develop as a result of language users being familiar with groupings of words.

A lexical view of teaching

Dellar & Walkley urge teachers to

think of whole phrases, sentences or even ‘texts’ that students might want to say when attempting a particular task or conversation. ….. At least some of those lexical items are learnable, and some of that learning could be done with the assistance of materials before students try to have particular kinds of communication.

It seems that the biggest problem of teaching lexically is that it’s difficult for teachers to come up, in real time, with the right kind of lexical input and the right kind of questions to help students notice lexical chunks, collocations, etc.. The practicalities of teaching lexically are discussed under the heading “Pragmatism in a grammar-dominated world”, where teachers are advised to work with the coursebooks they’ve got and approach coursebook materials in a different way, focusing on the vocabulary and finding better ways of exploiting it.

The rest of Part 1 is devoted to a lexical view of vocabulary (units of meaning, collocation, co-text, genre and register, lexical sets, antonyms, word form pragmatic meanings and synonyms are discussed), a lexical view of grammar (including “words define grammar” and “grammar is all around”), and a lexical view of skills.

Part 1 ends with “A practical pedagogy for teaching and learning”, which stresses the need to consider “Naturalness, priming and non-native speakers”, and ends with “The Process”, which repeats the 6 processes introduced at the start, noting that noticing and repetition are the two stages that the lexical teacher should place the most emphasis on.

Part B offers 100 worksheets for teachers to work through. Each page shares the same format: Principle; Practising the Principle; Applying the principle. In many of the worksheets, it´s hard to find the “principle” and in most worksheets “applying the principle” involves looking for chances to teach vocabulary, particularly lexical chunks.  Here’s an example:

 Worksheet 2: Choosing words to teach.

Principle: prioritise the teaching of more frequent words.

Practicing the principle involves deciding which words in a box (government / apple for example)  are more frequent and looking at the on line Macmillan Dictionary or the British Corpus to check.

Applying the Principle involves choosing 10 words from “a word list of a unit or a vocabulary exercise that you are going to teach”, putting the words in order of frequency, checking your ideas, challenging an interested colleague with queries about frequency and “keeping a record of who wins!”

The worksheets cover teaching vocabulary lexically, teaching grammar lexically, teaching the 4 skills lexically, and recycling and revising. Many of them involve looking at the coursebook which readers are presumed to be using in their teaching, and finding ways to adapt the content to a more lexical approach to teaching. In the words of the authors,

the book is less about recipes and activities for lessons, and more about training for preparing lexical lessons with whatever materials you are using.       


Part C (10 pages long) looks at materials, teaching courses other than general English, and teacher training.


Language Learning

Let’s start with Dellar and Walkley’s account of language learning. More than 50 years of research into second language learning is “neatly summarised” by listing the 6 steps putatively involved in learning “any given item of language”.  You (1) understand the meaning, (2) hear/see an example in context, (3) approximate the sound, (4) pay attention to the item and notice its features, (5) do something with it – use it some way, and (6) then repeat these steps over time.  We’re not told what an “item” of language refers to, but we may be sure that there are tens, if not hundreds of thousands of such items, and we are asked to believe that they’re all learned, one by one, following the same 6-step process.

Bachman (1990) provides an alternative account, according to which  people learn languages by developing a complex set of competencies, as outlined in the figure below.

There remains the question of how these competencies are developed. We can compare Dellar and Walkley’s 6-step account with that offered by theories of interlanguage development (see Tarone, 2001, for a review). Language learning is, in this view, gradual, incremental and slow, sometimes taking years to accomplish. Development of the L2 involves all sorts of learning going on at the same time as learners use a variety of strategies to develop the different types of competencies shown in Bachman’s model, confronting problems of comprehension, pronunciation, grammar, lexis, idioms, fluency, appropriacy, and so on along the way. The concurrent development of the many competencies Bachman refers to exhibits plateaus, occasional movement away from, not toward, the L2, and U-shaped or zigzag trajectories rather than smooth, linear contours.  This applies not only to learning grammar, but also to lexis, and to that in-between area of malleable lexical chunks as described by Pawley and Syder.

As for lexis, explanations of SLA based on interlanguage development assert that learners have to master not just the canonical meaning of words, but also their idiosyncratic nature and their collocates. When learners encounter a word in a correct context, the word is not simply added to a static cognitive pile of vocabulary items. Instead, they experiment with the word, sometimes using it incorrectly, thus establishing where it works and where it doesn’t. By passing through a period of incorrectness, in which the lexicon is used in a variety of ways, they climb back up the U-shaped curve. Carlucci and Case (2011) give the example of the noun ‘shop.’ Learners may first encounter the word in a sentence such as “I bought this wine at the duty free shop”. Then, they experiment with deviant utterances such as “I am going to the supermarket shop,” correctly associating the word ‘shop’ with a place they can purchase goods, but getting it wrong. By making these incorrect utterances, the learner distinguishes between what is appropriate, because “at each stage of the learning process, the learner outputs a corresponding hypothesis based on the evidence available so far” (Carlucci and Case, 2011).

Dellar and Walkley’s “Six Step” account of language learning is neither well explained nor complete. These are not, I suggest, very robust principles on which to build. The principles of why people learn are similarly flimsy. To say that people learn languages “to be able to do things with their language; to be able to chat to others; and to learn to understand others cultures better” is to say very little indeed.

Two Views of Language

Dellar & Walkley give one of the most preposterous misrepresentations of how most teachers see English grammar that I’ve ever seen in print. Recall that they describe this popular view of language as “grammar + words”, such that language can be reduced to a list of grammar structures that you can drop single words into.

In fact, grammar models of the English language, such as that found in Quirk (1985), or Swan (2001), and used in coursebooks such as Headway or English File, describe the structure of English in terms of grammar, the lexicon and phonology. These descriptions have almost nothing in common with the description given on page 9 of Teaching Lexically, which is subsequently referred to dozens of times throughout the book as if it were an accurate summary, rather than a biased straw man used to promote their own view of language. The one sentence description, and the 6 simplistic assumptions that are said to flow from it, completely fail to fairly represent grammar models of the English language.

The second view of language, the right one according to the authors, is “language = from words + words to grammar”. Given that this is the most important, the most distinguishing, feature of the whole approach to teaching lexically, you’d expect a detailed description and a careful critical evaluation of their preferred view of language. But no; what is offered is a poorly articulated inadequate summary, mixed up with one-sided arguments for teaching lexically. It’s based on Hoey’s (2005) view that the best model of language structure is the word, along with its collocational and colligational properties, so that collocation and “nesting” (words join with other primed words to form sequence) are linked to contexts and co-texts, and grammar is replaced by a network of chunks of words. There are no rules of grammar; there’s no English outside a description of the patterns we observe among those who use it. There is no right or wrong in language. It makes little sense to talk of something being ungrammatical.

This is surely a step too far; surely we need to describe language not just in terms of the performed but also in terms of the possible. Hoey argues that we should look only at attested behaviour and abandon descriptions of syntax, but, while nobody these days denies the importance of lexical chunks, very few would want to ignore the rules which guide the construction of novel, well formed sentences. After all, pace Hoey, people speaking English (including learners of English as an L2) invent millions of novel utterances every day.  They do so by making use of, among other things, grammatical knowledge.

The fact that the book devotes some attention to teaching grammar indicates that the authors recognise the existence and importance of grammar, which in turn indicates that there are limits to their adherence to Hoey’s model. But nothing is said in the book to clarify these limits. Given that Dellar and Walkley repeatedly stress that their different view of language is what drives their approach to teaching,  their failure to offer any  coherent account of their own view of language is telling. We´re left with the impression that the authors are enthusiastic purveyors of a view which they don’t fully understand and are unable to adequately describe or explain.

Teaching Lexically

1. Teaching Lexically concentrates very largely on “doing things to learners” (Breen, 1987): it’s probably the most teacher-centred book on ELT I’ve ever read. There’s no mention in the book of including students in decisions affecting what and how things are to be learned: teachers make all the decisions. They work with a pre-confected product or synthetic syllabus, usually defined by a coursebook, and they plan and execute lessons on the basis of adapting the syllabus or coursebook to a lexical approach. Students are expected to learn what is taught in the order that it’s taught, the teacher deciding the “items”, the sequence of presentation of these “items”, the recycling, the revision, and the assessment.

2.  There’s a narrowly focused, almost obsessive concentration on teaching as many lexical chunks as possible. The need to teach as much vocabulary as possible pervades the book. The chapters in Part B on teaching speaking, reading, listening and writing are driven by the same over-arching aim: look for new ways to teach more lexis, or to re-introduce lexis that has already been presented.

3. Education is seen as primarily concerned with the transmission of information. This view runs counter to the principles of learner-centred teaching, as argued by educators such as John Dewey, Sebastian Faure, Paul Friere, Ivan Illich, and Paul Goodman, and supported in the ELT field by all progressive educators who reject the view of education as the transmission of information, and, instead, see the student as a learner whose needs and opinions have to be continuously taken into account. For just one opinion, see  Weimer (2002) who argues for the need to bring about changes in the balance of power; changes in the function of course content; changes in the role of the teacher: changes in who is responsible for learning; and changes in the purpose and process of evaluation.

4. The book takes an extreme interventionist position on ELT.  Teaching Lexically involves dividing the language into items, presenting them to learners via various types of carefully-selected texts, and practising them intensively, using pattern drills, exercises and all the other means outlined in the book, including comprehension checks, error corrections and so on, before moving on to the next set of items.  As such, it mostly replicates the grammar-based PPP method it so stridently criticises. Furthermore, it sees translation into the L1 as the best  way of dealing with meaning, because it wants to get quickly on to the most important part of the process , namely memorising bits of lexis with their collocates and even co-text.  Compare this to an approach that sees the negotiation of meaning as a key aspect of language teaching, where the lesson is conducted almost entirely in English and the L1 is used  sparingly, where students have chosen for themselves some of the topics that they deal with, where they contribute some of their own texts, and where most of classroom time is given over to activities where the language is used communicatively and spontaneously, and where the teacher reacts to linguistic problems as they arise, thus respecting the learners’ ‘internal syllabus’.

Teaching Lexically sees explicit learning and explicit teaching as paramount, and it assumes that explicit knowledge, otherwise called declarative knowledge, can be converted into implicit (or procedural) knowledge through practice. These assumptions, like the assumptions that students will learn what they’re taught in the order they’re taught it, clash with SLA research findings. As Long says: “implicit and explicit learning, memory and knowledge are separate processes and systems, their end products stored in different areas of the brain” (Long, 2015, p. 44).  To assume, as Dellar and Walkley do, that the best way to teach English as an L2 is to devote the majority of classroom time to the explicit teaching and practice of pre-selected bits of the language is to fly in the face of SLA research.

Children learn languages in an implicit way – they are not consciously aware of most of what they learn about language. As for adults, all the research in SLA indicates that implicit learning is still the default learning mechanism. This suggests that teachers should devote most of the time in class to giving students comprehensible input and opportunities to communicate among themselves and with the teacher.

Nevertheless, adult L2 learners are what Long calls partially “disabled” language learners, for whom some classes of linguistic features are “fragile”. The implication is that, unless helped by some explicit instruction, they are unlikely to notice these fragile (non-salient )features, and thus not progress beyond a certain, limited, stage of proficiency.  The question is: What kind of explicit teaching helps learners progress in their trajectory towards communicative competence?  And here we arrive at lexical chunks.

Teaching Lexical Chunks

One of the most difficult parts of English for non native speakers to learn is collocation. As Long (2015, pages 307 to 316) points out in his section on lexical chunks, while children learn collocations implicitly, “collocation errors persist, even among near-native L2 speakers resident in the target language environment for decades.” Long cites Boers work, which suggests a number of reasons for why L2 collocations constitute such a major learning  problem, including L1 interference, the semantic vagueness of many collocations, the fact that collocates for some words vary , and the fact that some collocations look deceptively similar.

The size and scope of the collocations problem can be appreciated by considering findings on the lesser task of word learning. Long cites work by Nation (2006) and Nation and Chung (2009) who have have calculated that learners require knowledge of between 6000 and 7000 word families for adequate comprehension of speech and 9000 for reading. Intentional vocabulary learning has been shown to be more effective than incidental learning in the short tem, but, the authors conclude, “there is nowhere near enough time to handle so many items in class that way”.  The conclusion is that massive amounts of extensive reading outside class, but scaffolded by teachers, is the best solution.

As for lexical chunks, there are very large numbers of such items, probably hundreds of thousands of them. As Swan (2006) points out, “memorising 10 lexical chunks a day, a learner would take nearly 30 years to achieve a good command of 10,000 of them”. So how does one select which chunks to explicitly teach, and how does one teach them? The most sensible course of action would seem to be to base selection on frequency , but there are problems with such a simple criterion, not the least being the needs of the set of students in the classroom. Although Dellar and Walkley acknowledge the criterion of frequency, Teaching Lexically gives very little discussion of it, and there is very little clear or helpful advice offered about what lexical chunks to select for explicit teaching, – see the worksheet cited at the start of this review. The general line seems to be: work with the material you have, and look for the lexical chunks that occur in the texts, or that are related to the words in the texts. This is clearly not a satisfactory criterion for selection.

The other important question that Teaching Lexically does not give any well considered answer to  is: how best to facilitate the learning of lexical chunks?  Dellar and Walkley could start by addressing the problem of how their endorsement of Hoey’s theory of language learning, and Hoey’s “100% endorsement” of Krashen’s Natural Approach, fit with their own view that explicit instruction in lexical chunks should be the most important part of classroom based instruction. The claim that they are just speeding up the natural, unconscious process doesn’t bear examination because two completely different systems of learning are being conflated. Dellar and Walkley take what’s called a “strong interface” position, whereas Krashen and Hoey take the opposite view. Dellar and Walkley make conscious noticing the main plank in their teaching approach, which contradicts Hoey’s claim that lexical priming is a subconscious process.

Next, Dellar and Walkley make no mention of the fact that learning lexical chunks is one of the most challenging aspects of learning English as an L2 for adult learners.  Neither do they discuss the questions related to the teachability of lexical chunks that have been raised by scholars like Boers (who confesses that he doesn’t know the answer to the problems they have identified about how to teach lexical chunks). The authors of Teaching Lexically blithely assume that drawing attention to features of language (by underlining them, mentioning them and so on), and making students aware of collocations, co-text, colligations, antonyms, etc., (by giving students (repeated) exposure to carefully-chosen written and spoken texts, using drills, concept questions, input flood, bottom-up comprehension questions, and so on) will allow the explicit knowledge taught to become fully proceduralised.  Quite apart from the question of how many chunks a teacher is expected to treat so exhaustively, there are good reasons to question the assumption that such instruction will have the desired result.

In a section of his book on TBLT, Long (2015) discusses his 5th methodological principle: “Encourage inductive ·chunk” learning”.  Note that Long discusses 10 methodological principles, and sees teaching lexical chunks as an important but minor part of the teacher’s job. The most important concluson that Long comes to is that there is, as yet, no satisfactory answer to “the $64,000 dollar question: how best to facilitate chunk learning”.  Long’s discussion of explicit approaches to teaching collocations includes the following points:

  • Trying to teach thousands of chunks is out of the question.
  • Drawing learners attention to formulaic strings does not necessarily lead to memory traces usable in subsequent receptive L2 use, and in any case there are far too many to deal with in that way.
  • Getting learners to look at corpora and identify chunks has failed to produce measurable advantages.
  • Activities to get learners to concentrate on collocations on their own have had poor results.
  • Grouping collocations thematically increases the learning load (decreasing transfer to long term memory) and so does presentation of groups which share synonymous collocates, such as make and do.
  • Exposure to input floods where collocations are frequently repeated has poor results.
  • Commercially published ELT material designed to teach collocations have varying results. For example, when lists of verbs in one column are to be matched with nouns in another, this inevitably produces some erroneous groupings that, even when corrective feedback is available, can be expected to leave unhelpful memory traces.
  • It is clear that encouraging inductive chunk learning is well motivated, but it is equally unclear how best to realise it in practice, i.e., which pedagogical procedures to call upon.


Teaching Lexically is based on a poorly articulated view of the English language and on a flimsy account of second language learning. It claims that language is best seen as lexically driven, that a grasp of grammar ‘rules’ and correct usage will emerge from studying lexical chunks, that spoken fluency, the speed at which we read, and the ease and accuracy with which we listen will all develop as a result of language users being familiar with groupings of words, and that therefore, the teaching of lexical chunks should be the most important part of a classrooms teacher’s job. These claims often rely on mere assertions, and include straw man fallacies, cherry picking the evidence of research findings and ignoring counter evidence. The case made for this view of teaching is in my opinion, entirely unconvincing. The concentration on just one small part of what’s involved in language teaching, and the lack of any well considered discussion of the problems associated with teaching lexical chunks, are seriously flaws in the book’s treatment of an interesting topic.


Bachman, L. (1990). Fundamental considerations in language testingOxford University Press.

Breen, M. (1987) Contemporary Paradigms in Syllabus Design, Parts 1 and 2. Language Teaching 20 (02) and 20 (03).

Carlucci, L. and Case, J.  (2013)  On the Necessity of U-Shaped Learning. Topics.

Hoey, M.(2005) Lexical Priming. Routeledge.

Long, M. (2015) Second Language Acquisition and Task Based Language Teaching. Wiley.

Swan, M. (2006) Chunks in the classroom: let’s not go overboard. The Teacher Trainer, 20/3.

Tarone, E. (2001), Interlanguage. In R. Mesthrie (Ed.). Concise Encyclopedia of Sociolinguistics. (pp. 475–481) Oxford: Elsevier Science.

Weimer, M. (2002) Learner-Centered Teaching. Retrieved from  3/09/2016

Anybody seen a pineapple?

Usage-based (UB) theories see language as a structured inventory of conventionalized form-meaning mappings, called  constructions, Thus, the first thing one needs to get a handle on is Construction Grammar, which is summarised in Wuff & Ellis (2018). I’ve just been reading Smiskova-Gustafsson’s (2013) doctoral thesis and her brief summary of Nick Ellis’ UB theory reminded me of why I find it so wierd.  Acording to N. Ellis, detecting patterns from frequency of forms in input is the way people learn language: when exposed to language input, learners notice frequencies and discover language patterns. Those advocating Construction Grammar insist that the regularities that learners observe in the input emerge from complex interactions of a multitude of variables over time, and that, therefore, the regularities in language we call grammar are not rule-based; rather, they emerge as patterns from the repeated use of symbolic form-meaning mappings by speakers of the language. “Therefore, grammar is not a set of creative rules but a set of regularities that emerge from usage” (Hopper, 1998, cited in Smiskova-Gustafsson, 2013). Emergent structures are nested; consequently, any utterance consists of a number of overlapping constructions (Ellis & Cadierno, 2009, cited in Smiskova-Gustafsson, 2013). Linguistic categories are also emergent, – they emerge bottom-up and thus not all linguistic structures fall easily into prescribed categories. In other words, some linguistic structures are prototypical, while others fit their category less well.

Examining some of the abstract constructions developed by UB scholars, Smiskova-Gustafsson’s (2013) notes that frequency of forms interacts with psycholinguistic factors, most importantly, prototypicality of meaning. She gives the example of the verb-argument construction “V Obj Oblpath/loc”, or VOL, an abstract construction that enables syntactic creativity by abstracting common patterns from lexically specific exemplars such as put it on the table:

The exemplar itself is a highly frequent instantiation of the VOL construction, and the verb put that it contains is prototypical in meaning. This means that put is the verb most characteristic of the VOL construction and so the one most frequently used. Other verbs in VOL are used less; the type/token distribution of all verbs used in the VOL construction is Zipfian (i.e., the verb put is the one most frequently used, about twice as frequently as the next verb). Such prototypes are crucial in establishing the initial form/meaning mapping – in this case, the phrase put it on the table, meaning caused motion (X causes Y to move to a location). Repeated exposure to other instantiations of the VOL construction will gradually lead to generalization and the establishment of the abstract productive construction (Smiskova-Gustafsson (2013, p. 18).

Got it? If you find that taster a rather abstract and obtuse way to try to explain how input can of itself contain all the information learners need to learn English as an L2 (for example), then try reading Wuff & Elllis (2018), or the Approaches book in the graphic above. But what about chunks? From a UB perspective, chunks are “conventionalized formmeaning mappings, the result of repeated use of certain linguistic units, which then give rise to emergent patterns in language at all levels of organization and specificity” (Smiskova-Gustafsson, 2013, p. 21). Chunks go from word sequences that are semantically opaque (spill the beans) or structurally irregular (by and large) to everyday usage such as in my opinion, or nice to see you.  And here’s the rub: as Smiskova-Gustafsson (2013, p. 21) points out, “if we take a usage-based perspective, where all units of language are conventionalized, identifying chunks would become pointless, since we could say that all language is in fact a chunk”. The natural, seamless flow of native-like language use can thus be seen as “formulaicity all the way down” (Wray, 2012 p. 245, cited in Smiskova-Gustafsson, 2013, p. 22): language consists of almost endless overlapping and nesting of chunks, as in this example:

In winter Hammerfest is a thirty-hour ride by bus from Oslo, though why anyone would want to go there in winter is a question worth considering.

thirty hour ride by bus from

[thirty hour [[ride][ by bus]] from]]

chunks: thirty hour ride, ride by bus, by bus, by bus from, etc.

though why anyone would want to go there

[though [why] anyone would] [want to] go] there]

chunks: though why, why anyone would, why anyone would want to, want to go, etc. ( Smiskova-Gustafsson, 2013, p. 11).

Since learners of English as an L2 tend to use English in terms of grammar and individual words, and often combine words in awkward ways, their lack of the ability to produce “natural, seamless flows of native-like language use” must be because they don’t have the necessary procedural knowledge of the Construction Grammar which underpins the  “conventionalized English ways” of expressing any particular concept.

The question is, of course, Is this a good way to see language and language learning? If it is, then how do teachers of English as an L2 help their students develop proficiency? How do they teach students English, if it amounts to no more – and no less! – than procedural knowledge of Construction Grammar, the pre-requisite for the proficient use of tens of thousands of overlapping and nested chunks? To be thorough, if teachers accepted the UB approach, then instead of following the confused and contradictory advice offered by Dellar & Walkley or by Selivan, they would first have to understand Construction Grammar, then understand UB theories of language learning, and then articulate methodological principles and pedagogical practices for implementing a principled lexical approach.  Were teachers to attempt this, I suggest that they’d find Construction Grammar more difficult and less useful than the grammar described in Swan’s Practical English Usage; Ellis’ UB theory more difficult and less useful than the theories described in Mitchell & Myles (2019) Second Language Learning Theories; and accounts of methodological principles and pedagogical practices found in Teaching Lexically or Lexical Grammar less convincing than the account of them found in Chapter 3 of Long’s (2015) SLA & TBLT.

UB theories are increasingly fashionable, but I’m still not impressed with Construction Grammar, or with the claim that language learning can be explained by appeal to noticing regularities in the input. As to the latter, I recommend Gregg’s (2003) article; seventeen years later, I’ve still not read a convincing reply to it. Anyway, I think it’s fair to say that there’s no consensus among SLA scholars on the question of whether language learning is done on the basis of input exposure and experience or by the help of innate knowledge of learners, and it’s still not clear whether grammatical learning is usage-based or universal grammar-based.

Meanwhile, it seems sensible for teachers to continue to regard English as a language with grammar rules that can help students make well-formed (often novel) utterances, and to help their students by giving them maximum opportunities to use English for their own relevant, communicative purposes, while encouraging inductive learning of chunks. Likewise, it seems foolish to accept the counsel of teacher trainers who misrepresent the complexities of a UB approach and who recommend teachers to focus on the impossible task of explicitly teaching lexical chunks.


Dellar, H. and Walkley, A. (2016) Teaching Lexically. Delta.

Gregg, K. R. (2003) The State of Emergentism in SLA. Second Language Research, 19,2, 95-128.

Selivan, L. (2018) Lexical Grammar. CUP.

Smiskova-Gustafsson, H. (2013). Chunks in L2 development: a usage-based perspective. Groningen: s.n.

Wuff. S. and Ellis, N.  (2018) Usage-based approaches  to second language acquisition. Downloadable here:


Alternative Proposal for IATEFL Global Get-Together 2020

IATEFL’s proposal for a global get-together is a disappointing, lack-lustre programme that perfectly reflects its status as the stuck-in-the-mud, unimaginative voice of current, commercially-driven ELT practice. The perfect example of this lamentable state of affairs is that Catherine Walters, one of the most reactionary voices in ELT over the last four decades and President of IATEFL in 1993, is asked to address the most crucial issue currently facing us, namely, how to adapt classroom teaching to distance learning. The blurb for her presentation Losing Our Bells and Whistles: Will asynchronous teacher education return? suggests that she’ll do nothing more than warn teachers of the perils of cutting edge innovation. “Keep it simple!”, she’ll say. “Don’t try any clever synchronous stuff – it always goes wrong!”. That’s it: that’s IATEFL for you.

As for the rest of the programme, what can we find that might possibly drag us away from Netflix? The President’s address? Tell me a President’s address that you remember anything about! Will poor David Crystal, dragged out yet again, this time to promote the new edition of his Big Book do more than entertain? I doubt it. How about somebody selling a commercial Business English test? Definitely not. And advice on how to be mindful, or eulogies to young learners as global citizens? Useless pap is my guess. The only things that might be interesting are the local reports, but they’re not properly situated or focused.

Here’s my suggestion.

Re-examining Principles of ELT

All sessions last 2 hours. They’re round table discussions with a Moderator. Each speaker has 10 minutes. Questions are sent in to the organisers

Session 1: How do people learn an L2? : S. Gass, N. Ellis, M. Pienemann, S. Carroll, K. Gregg

The main debate these days is between emergentists (we learn from input from the environment) and nativists (we learn with help from innate hard wiring). Where are we now? What principles can we agree on which will underline our work as teachers?

Session 2: Teaching implications of SLA Findings: L. Ortega, A. Benati, M. Long, H. Marsden, H. ZhaoHong, H. Nassaji

Recent research findings have challenged previously accepted meta-analyses. Where are we now? Most importantly: can we agree on the relative importance of explicit and implicit teaching?

Session 3:  Syllabus design:  R. Ellis, M. Swan, M.Long, S Thornbury, C. Doughty

The big debate today is between synthetic syllabuses, as implemented in General English Coursebooks, and analytic syllabuses, like Long’s TBLT and Thornbury & Meddings Dogme.  This is probably the second most important question of them all. We need to clarify all the “false” alternatives and agree on principles for syllabus design and materials production.

Session 4: Distance Learning: G.Mottram, G. Dudney, C. Chapelle.

Tech experts present their platforms and respond to questions sent in by participants prior to the conference. .

Session 5: ELT as a profession: TEFL Workers Union, N. McMilan, S. Millin, S. Brown, R. Bard.

The most important question of them all. How do we improve our lot? How do we organise?  Ideally, we should produce a Manifesto.

Session 6: Hope For the Future: T. Hampson, M. Griffin, J. Mackay, K. Linkova, L. Havaran

This is my own, very personal choice of teachers, new and old, whose voices need to be heard.

A 2-day programme, properly organised, would allow the invited speakers to briefly state their cases and for discussion to ensue. I think the success of the event would depend on careful monitoring and follow up. The organisers would have to edit the material and then get back to contributors to help compile really solid take away stuff. Ideally, we’d have Summary Statements on each of the 6 issues and the beginnings of a network.

I’m confident that I could organise such an event if

  1. Neil McMillan did it with me (I haven’t even mentioned this to him yet!)
  2. We had a small group of helpers, and
  3. we had some cash.

I invite comments.

Words from the Wise

Here are some quotes from ELT experts who currently inform teachers. Who said them?

  • J. Harmer
  • A. Holliday
  • A. Maley
  • P. Ur
  • D. Larsen-Freeman
  • S. Carroll
  • L. Selivan
  • J. Anderson
  • S. Richardson
  • H. Dellar

(Note: There’s one “rogue statement” in there which I profoundly agree with.) 

1. It’s time to shift metaphors. Let’s sanitise the language. Join with me; make a pledge never to use “input” and “output” again.

2. Instead of the big top down grammar, which we just drop words into as Chomsky suggested, it’s thinking about the individual words that drive our communication and the grammatical patterns which often attach themselves to those particular words.

3. Teaching may be a visceral art, but unless it is informed by ideas it is considerably less than it might be.

4. It’s essentially racist to imagine a group here and a group there who are essentially different to each other.

5. Chunks …. are stored in the brain as single units. .. However, this does not completely negate the role of generative grammar. Knowledge of grammar rules is still important to fine-tune chunks so that they fit new contexts.

6. We have no evidence that PPP is less effective than other approaches.

7. We should not expect research to have any necessary or close link with the activity of teaching.

8. There is no evidence that TBLT works.

9. You can’t notice grammar. ..  the stuff of acquisition (phonemes, syllables, morphemes, nouns, verbs, cases, etc.) consists of mental constructs that exist in the mind and not in the environment at all. If not present in the external environment, there is no possibility of noticing them.

10. Most SLA researchers assume that native speakers make the best teachers.. and view the L1 as “an obstacle”.










1. Larsen-Freeman; 2. H. Dellar; 3. J. Harmer; 4. A. Holliday; 5. L.Selivan; 6. J. Anderson; 7. A. Maley; 8. P. Ur; 9. S. Carroll; 10. S. Richardson

A Reply to Dellar on the difference between his “Lexical Approach” and TBLT

Dellar has a new video on YouTube explaining the difference between his “Lexical Approach” and Communicative Language Teaching (CLT).

Dellar’s Lexical Approach is distinguished by its special “approach to language”. While most ELT approaches wrongly see language as “grammar and single words”, his approach sees language as more “patterned and formulaic” than is commonly assumed, where collocations, chunks, “fixed and semi-fixed expressions, discoursal patterns that are predictable and repeatable” should be the focus of teaching. That’s it – that’s the special approach which Dellar claims teachers need training in, so as to think about language in “a more sophisticated, nuanced way”.

On the other hand, CLT is “primarily to do with classroom methodology” – “interaction is both the means and the ultimate goal of study (sic)”.  Dellar has no objections to communicative activities, but he thinks teachers can do them better by adopting his more sophisticated approach to language, because it better equips them to provide students with “the actual language that they need in order to carry out communicative tasks”. Thus, TBLT and Dogme could be improved by doing what he does – “predict the language students need to perform these tasks”.

On Language

Dellar fails here, as he does elsewhere, to give any coherent description of this special view of the English language. I’ve discussed Dellar’s view of language in a separate post, so suffice it to say here that Dellar & Walkley’s Teaching Lexically gives one of the most absurd misreprentations of pedagogical grammars (“grammar plus words”) ever published, and follow it with an incoherent account of the important role that collocations and lexical chunks play in understanding the English language. Dellar’s various attempts to describe his special approach to the English language  – in this video, in Teaching Lexically, in his podcasts, videos and conference presentations (note particularly his contorted versions of a “bottom-up” grammar) – are an unscholarly sham.

On Teaching 

Dellar, as we’ve seen, says that teachers following TBLT and Dogme syllabuses would benefit if they shared his more sophisticated, more nuanced understanding of English, because this would allow them to predict the language which students need to perform tasks. But how would it do that? What guidance does Dellar give teachers to inform their “predictions”? Given his focus on lexical chunks, and given that proficient English speakers know tens of thousands of lexical chunks, how does Dellar suppose that teachers, once trained in his approach to language, will select the chunks that their students need? What criteria  will they use to narrow down the many thousands of candidate chunks to a managable number? Dellar has never offered any coherent criteria or principles for making such a selection, nor has he shown any critical acumen in assessing the enormous problems involved in selecting and teaching lexical chunks. For example, what principles or criteria inform the selection of chunks to be found in Dellar’s One Minute videos? A recent aticle in Applied Research on Language Learning, lists the most frequent idioms used in contemporary American English, in the academic, fiction, spoken, newspaper, & magazine genres. Not one of Dellar’s over 200 selected chunks (which include the gems “It does my head in”, and “budge up”) is mentioned in the lists. So if teachers ever make the mad decision to base their teaching on presenting and practicing chunks, how will they “predict the language their students need”? Throw darts at a board full of “Hugh’s Favorite Chunks”? No, of course not – all they have to do is leave it to Dellar, and use the Outcomes series of coursebooks.

I wonder if “Help! Get me out of here!” appears in any of them.

Arguing about mansplaining on Twitter

This Tweet appeared recently. You can see my comment below it, and the 21st comment after that, which got over 200 “Likes”. 















The Reaction

Dozens of tweets followed my “What nonsense!” tweet. Some, from men, were crass and insulting (You’re shit. Shut the fuck up moron), while women preferred joshing and taking the mickey. Just about everybody agreed I was mansplaining. For example, A tweeter called M commented:

This really is quite meta: a historical reference to mansplaining met by the the most peak of mansplainers ever imaginable.

While Raw posted this

I wrote more than 30 replies in 3 hours; a few were angry; many were ill considered; and many had mistakes (in one, I referred to Eleanor Marx as Karl Marx’s sister, for example); so I’m not pretending that I put my case coherently and cohesively and I’m not complaining about the reactions, either. I just want to state my case calmly here and make a couple of comments.

Louise Raw’s view 

From the tweet, I judge Raw’s view to rest on the special status of Eleanor Marx. She was a Marxist scholar; she’d spent years working with Marx; she was chosen by Marx to carry on his work; and Marx entrusted her with the job of publishing the English version of Capital. In a famous quote (Florence, 1975) Karl Marx said “Tussy [Eleanor] is me”. Thus, it is extremely unlikely that the man who stood up at the end of her lecture and told her what Marx really meant knew better than she did what Marx meant. So the man is guilty  of mansplaining.   

My View

Mansplaining is when a man explains something to a woman in a condescending or patronising way. If the man explained what Marx meant to to Eleanor in a condescending or patronising way, then he was mansplaining. But if the man offered an interpretation of some aspect of Marx’s work which contradicted Eleanor’s account, without stooping to condescension or patronisation, then he wasn’t mansplaining. The fact that he was talking to Eleanor Marx doesn’t mean that his remarks were necessarily condescending or patronising – or even wrong. Louise Raw gives no information about the man’s intervention, and without a reliable acount of what the man said and the way he said it, we can’t be sure he was mansplaining. Saying that the man “told her what Marx had really meant” could be seen as implying that he was being patronising & condescending, but Raw’s a historian – she should have supported her assertion of historical mansplaining with a reliable account of the man’s words and actions at the 1893 lecture given by Eleanor Marx in Aberdeen.   

The False Claim: You say that the man knew better than Eleanor what Marx meant. Ergo: you’re anti-feminist.  

Over 50 tweets had the same theme: I was called a “sexist”, “misongynist”, “old white man”, “woman hater” who “despised feminists”. The tweet from Audrey shown above says this: 

So random man knew better what Karl meant than Karl’s own daughter who worked with him…. Were you related to that man by any chance? 

Audrey puts words in my mouth and attributes completely false views to me. In no tweet did I say, or imply, that the man (now “random man” and perhaps my relative) knew better than Eleanor Marx what Marx meant, or question Eleanor’s expert status. But, never mind; the twisting of my words became an established “fact” from then on. Dozens of tweets supposed that I had indeed said that random man knew better, and, on that basis, accused me of bias and sexism.  Today, this was posted:  

That quote is not what I said; I don’t know where bb davey got it from; but there it is again: the false assumption that I had suggested that “random man” knew better.  

A bit later Audrey says:  

Yeah… How dare we thinking that a woman who is also his daughter would know better the subject she was working on?? We’re so silly Louise…. Aren’t we?

I didn’t criticise anybody for thinking that Eleanor knew more than “random man”, but again, never mind; it sounds good and was the cue for merry “I’m in the kitchen, where Geoff thinks I belong” exchanges among some women tweeters, which, stupidly, rattled me enough to call them “dummies”.

Ad Hominens or Gratuitous Insults? 

Louise Raw’s tweets contained these remarks: 

  • (Geoff) is an heroic leftie whose politics are beyond question. As we know, loathing feminists & insulting women is no bar to this. 
  • (Geoff) doesn’t realise Eleanor Marx was one of the feminists he despises.
  • (Geoff) sees no irony in getting furious with having HIS knowledge challenged whilst saying it was fine for a random dude to challenge ELEANOR’S
  • (Eleanor) was absolutely Marx’s literary collaborator, as everyone acknowledges- apart from Geoff!
  •  (Geoff) came to us, calling us dummies and idiotic feminists, but is The Real Victim Here? 
  • Geoff likes to insult women whilst accusing US of ad hominems.

The Appeal to authority 

Louise Raw says this:  Karl Marx: ‘Eleanor IS ME’. He meant politically. So the mansplainer was doing the closest thing he could, after Karl’s death, to correct Marx himself on Marxism.

The actual quotation is ‘”Jenny is most like me, but Tussy (Eleanor) is me” (Florence, 1975, p. 57). I think it’s fair to say he meant politically, but to suggest that Eleanor Marx was the voice of Marx himself is surely taking things too far. Marx trusted his daughter to faithfully interpret hiis work, but that doesn’t mean she always did so. Marx died in 1883, and in 1884 Eleanor, along with other members of the Social Democratic Federation (SDF), including William Morris and Ernest Bax left the SDF and formed the Socialist League. It’s a mute point what Karl Marx might have advised. And it’s not sure whether father and daughter were entirely in agreement about sexual politics and the wrongs of the bourgeois family. In any case, while it’s perfectly reasonable to claim that Eleanor was a reliable source of information about Marx’s work – especially the later work, including Capital – that doesn’t mean she had – or should have had- the final word on all the myriad controversies and disagreements that raged in the 1890s about what Marx really meant, or that in her lecture that day she didn’t say anything that might be seen as offending the Marxist canon. 

Critical Thinking

Here’s a tweet from Bygone (sic) Jim:  

The pompous adage in the first sentence is followed by a completely unsupported criticism in the second. But my argument during the exchange, and now, more calmly here, is based on the first principle of critical thinking: Question everything: examine the logic of any assertion and ask for evidence; don’t believe what you’re told. In a polite exchange with Sue Lyon-Jones, she agrees that mansplaining is when a man tells a woman something she already knows in a way that is patronising and dismissive, and she thinks that Louise Raw’s tweet demonstrates that the man was guilty of it. Where’s the evidence? I ask. She replies: 

That is how I read it. As a woman, it rings a fairly loud bell for me.
I replied, a bit hysterically
But you don’t know what he said! …..  WE NEED TO KNOW WHAT HAPPENED!  
And there’s the rub. I don’t think we should take Louise Raw’s word for it that the man was “obviously” mansplaining and I think she failed as a historian to give the evidence that would have allowed us to judge for ourselves. Either Raw doesn’t have the evidence, in which case she shouldn’t have made the accusation, or she has it, and for some reason decided not to give it in a follow up to her original tweet. 



 Florence, R. (1975) Marx’s Daughters. Dial Press,