Teacher Trainers and Educators in ELT

This blog is dedicated to improving the quality of Second Language Teacher Education (SLTE)

The Teacher Trainers and Educators 

The most influential ELT teacher trainers and educators are those who publish “How to teach” books and articles, have on-line blogs and a big presence on social media, give presentations at ELT conferences, and travel around the world giving workshops and teacher training & development courses. Many of the best known and highest paid teacher educators are also the authors of coursebooks. Apart from the top “influencers”, there are tens of thousands of  teacher trainers worldwide who deliver pre-service courses such as CELTA, or the Trinity Cert TESOL, or an MA in TESOL, and thousands working with practicing teachers in courses such as DELTA and MA programmes. Special Interest Groups in TESOL and IATEFL also have considerable influence.

What’s the problem? 

Most current SLTE pays too little attention to the question “What are we doing?”, and the follow-up question “Is what we’re doing effective?”. The assumption that students will learn what they’re taught is left unchallenged, and those delivering SLTE concentrate either on coping with the trials and tribulations of being a language teacher (keeping fresh, avoiding burn-out, growing professionally and personally) or on improving classroom practice. As to the latter, they look at new ways to present grammar structures and vocabulary, better ways to check comprehension of what’s been presented, more imaginative ways to use the whiteboard to summarise it, more engaging activities to practice it, and the use of technology to enhance it all, or do it online.  A good example of this is Adrian Underhill and Jim Scrivener’s “Demand High” project, which leaves unquestioned the well-established framework for ELT and concentrates on doing the same things better. In all this, those responsible for SLTE simply assume that current ELT practice efficiently facilitates language learning.  But does it? Does the present model of ELT actually deliver the goods, and is making small, incremental changes to it the best way to bring about improvements? To put it another way, is current ELT practice efficacious, and is current SLTE leading to significant improvement? Are teachers making the most effective use of their time? Are they maximising their students’ chances of reaching their goals?

As Bill VanPatten argues in his plenary at the BAAL 2018 conference, language teaching can only be effective if it comes from an understanding of how people learn languages. In 1967, Pit Corder was the first to suggest that the only way to make progress in language teaching is to start from knowledge about how people actually learn languages. Then, in 1972, Larry Selinker suggested that instruction on formal properties of language has a negligible impact (if any) on real development in the learner.  Next, in 1983, Mike Long raised the issue again of whether instruction on formal properties of language made a difference in acquisition.  Since these important publications, hundreds of empirical studies have been published on everything from the effects of instruction to the effects of error correction and feedback. This research in turn has resulted in meta-analyses and overviews that can be used to measure the impact of instruction on SLA. All the research indicates that the current, deeply entrenched approach to ELT, where most classroom time is dedicated to explicit instruction, vastly over-estimates the efficacy of such instruction.

So in order to answer the question “Is what we’re doing effective?”, we need to periodically re-visit questions about how people learn languages. Most teachers are aware that we learn our first language/s unconsciously and that explicit learning about the language plays a minor role, but they don’t know much about how people learn an L2. In particular, few teachers know that the consensus of opinion among SLA scholars is that implicit learning through using the target language for relevant, communicative  purposes is far more important than explicit instruction about the language. Here are just 4 examples from the literature:

1. Doughty, (2003) concludes her chapter on instructed SLA by saying:

In sum, the findings of a pervasive implicit mode of learning, and the limited role of explicit learning in improving performance in complex control tasks, point to a default mode for SLA that is fundamentally implicit, and to the need to avoid declarative knowledge when designing L2 pedagogical procedures.

2. Nick Ellis (2005) says:

the bulk of language acquisition is implicit learning from usage. Most knowledge is tacit knowledge; most learning is implicit; the vast majority of our cognitive processing is unconscious.

3. Whong, Gil and Marsden’s (2014) review of a wide body of studies in SLA concludes:

“Implicit learning is more basic and more important  than explicit learning, and superior.  Access to implicit knowledge is automatic and fast, and is what underlies listening comprehension, spontaneous  speech, and fluency. It is the result of deeper processing and is more durable as a result, and it obviates the need for explicit knowledge, freeing up attentional resources for a speaker to focus on message content”.

4. ZhaoHong, H. and Nassaji, H. (2018) review 35 years of instructed SLA research, and, citing the latest meta-analysis, they say:

On the relative effectiveness of explicit vs. implicit instruction, Kang et al. reported no significant difference in short-term effects but a significant difference in longer-term effects with implicit instruction outperforming explicit instruction.

Despite lots of other disagreements among themselves, the vast majority of SLA scholars agree on this crucial matter. The evidence from research into instructed SLA gives massive support to the claim that concentrating on activities which help implicit knowledge (by developing the learners’ ability to make meaning in the L2, through exposure to comprehensible input, participation in discourse, and implicit or explicit feedback) leads to far greater gains in interlanguage development than concentrating on the presentation and practice of pre-selected bits and pieces of language.

One of the reasons why so many teachers are unaware of the crucial importance of implicit learning is that so few of those responsible for SLTE talk about it. Teacher trainers and educators don’t tell pre-service or practicing teachers  about the research findings on interlanguage development, or that language learning is not a matter of assimilating knowledge bit by bit; or that the characteristics of working memory constrain rote learning; or that by varying different factors in tasks we can significantly affect the outcomes. And there’s a great deal more we know about language learning that those responsible for SLTE don’t pass on to teachers, even though it has important implications for everything in ELT from syllabus design to the use of the whiteboard; from methodological principles to the use of IT, from materials design to assessment.

We know that in the not so distant past, generations of school children learnt foreign languages for 7 or 8 years, and the vast majority of them left school without the ability to maintain an elementary conversational exchange in the L2. Only to the extent that teachers have been informed about, and encouraged to critically evaluate, what we know about language learning, constantly experimenting with different ways of engaging their students in communicative activities, have things improved. To the extent that teachers continue to spend most of the time talking to their students about the language, those improvements have been minimal.  So why is all this knowledge not properly disseminated?

Most teacher trainers and educators, including Penny Ur (see below), say that, whatever its faults, coursebook-driven ELT is practical, and that alternatives such as TBLT are not. Ur actually goes as far as to say that there’s no research evidence to support the view that TBLT is a viable alternative to coursebooks. Such an assertion is contradicted by the evidence. In a recent statistical meta-analysis by Bryfonski & McKay (2017) of 52 evaluations of program-level implementations of TBLT in real classroom settings, “results revealed an overall positive and strong effect (d = 0.93) for TBLT implementation on a variety of learning outcomes” in a variety of settings, including parts of the Middle-East and East Asia, where many have flatly stated that TBLT could never work for “cultural” reasons, and “three-hours-a-week” primary and secondary foreign language settings, where the same opinion is widely voiced. So there are alternatives to the coursebook approach, but teacher trainers too often dismiss them out of hand, or simply ignore them.

How many  SLTE courses today include a sizeable component devoted to the subject of language learning, where different theories are properly discussed so as to reveal the methodological principles that inform teaching practice?  Or, more bluntly: how many such courses give serious attention to examining the complex nature of language learning, which is likely to lead teachers to seriously question the efficacy of basing teaching on the presentation and practice of a succession of bits of language? Current SLTE doesn’t encourage teachers to take a critical view of what they’re doing, or to base their teaching on what we know about how people learn an L2. Too many teacher trainers and educators base their approach to ELT on personal experience, and on the prevalent “received wisdom” about what and how to teach. For thirty years now, ELT orthodoxy has required teachers to use a coursebook to guide students through a “General English” course which implements a grammar-based, synthetic syllabus through a PPP methodology. During these courses, a great deal of time is taken up by the teacher talking about the language, and much of the rest of the time is devoted to activities which are supposed to develop “the 4 skills”, often in isolation. There is good reason to think that this is a hopelessly inefficient way to teach English as an L2, and yet, it goes virtually unchallenged.


The published work of most of the influential teacher educators demonstrates a poor grasp of what’s involved in language learning, and little appetite to discuss it. Penny Ur is a good example. In her books on how to teach English as an L2, Ur spends very little time discussing the question of how people learn an L2, or encouraging teachers to critically evaluate the theoretical assumptions which underpin her practical teaching tips. The latest edition of Ur’s widely recommended A Course in Language Teaching includes a new sub-section where precisely half a page is devoted to theories of SLA. For the rest of the 300 pages, Ur expects readers to take her word for it when she says, as if she knew, that the findings of applied linguistics research have very limited relevance to teachers’ jobs. Nowhere in any of her books, articles or presentations does Ur attempt to seriously describe and evaluate evidence and arguments from academics whose work challenges her approach, and nowhere does she encourage teachers to do so. How can we expect teachers to be well-informed, critically acute professionals in the world of education if their training is restricted to instruction in classroom skills, and their on-going professional development gives them no opportunities to consider theories of language, theories of language learning, and theories of teaching and education? Teaching English as an L2 is more art than science; there’s no “best way”, no “magic bullet”, no “one size fits all”. But while there’s still so much more to discover, we now know enough about the psychological process of language learning to know that some types of teaching are very unlikely to help, and that other types are more likely to do so. Teacher educators have a duty to know about this stuff and to discuss it with thier trainees.

Scholarly Criticism? Where?  

Reading the published work of leading teacher educators in ELT is a depressing affair; few texts used for the purpose of teacher education in school or adult education demonstrate such poor scholarship as that found in Harmer’s The Practice of Language Teaching, Ur’s A Course in Language Teaching, or Dellar and Walkley’s Teaching Lexically, for example. Why are these books so widely recommended? Where is the critical evaluation of them? Why does nobody complain about the poor argumentation and the lack of attention to research findings which affect ELT? Alas, these books typify the general “practical” nature of SLTE, and their reluctance to engage in any kind of critical reflection on theory and practice. Go through the recommended reading for most SLTE courses and you’ll find few texts informed by scholarly criticism. Look at the content of SLTE courses and you’ll be hard pushed to find a course which includes a component devoted to a critical evaluation of research findings on language learning and ELT classroom practice.

There is a general “craft” culture in ELT which rather frowns on scholarship and seeks to promote the view that teachers have little to learn from academics. Those who deliver SLTE are, in my opinion, partly responsible for this culture. While it’s  unreasonable to expect all teachers to be well informed about research findings regarding language learning, syllabus design, assessment, and so on, it is surely entirely reasonable to expect teacher trainers and educators to be so. I suggest that teacher educators have a duty to lead discussions, informed by relevant scholarly texts, which question common sense assumptions about the English language, how people learn languages, how languages are taught, and the aims of education. Furthermore, they should do far more to encourage their trainees to constantly challenge received opinion and orthodox ELT practices. This surely, is the best way to help teachers enjoy their jobs, be more effective, and identify the weaknesses of current ELT practice.

My intention in this blog is to point out the weaknesses I see in the works of some influential ELT teacher trainers and educators, and invite them to respond. They may, of course, respond anywhere they like, in any way they like, but the easier it is for all of us to read what they say and join in the conversation, the better. I hope this will raise awareness of the huge problem currently facing ELT: it is in the hands of those who have more interest in the commercialisation and commodification of education than in improving the real efficacy of ELT. Teacher trainers and educators do little to halt this slide, or to defend the core principles of liberal education which Long so succinctly discusses in Chapter 4 of his book SLA and Task-Based Language Teaching.

The Questions

I invite teacher trainers and educators to answer the following questions:

1 What is your view of the English language? How do you transmit this view to teachers?

2 How do you think people learn an L2? How do you explain language learning to teachers?

3 What types of syllabus do you discuss with teachers? Which type do you recommend to them?

4 What materials do you recommend?

5 What methodological principles do you discuss with teachers? Which do you recommend to them?


Bryfonski, L., & McKay, T. H. (2017). TBLT implementation and evaluation: A meta-analysis. Language Teaching Research.

Dellar, H. and Walkley, A. (2016) Teaching Lexically. Delata.

Doughty, C. (2003) Instructed SLA. In Doughty, C. & Long, M. Handbook of SLA, pp 256 – 310. New York, Blackwell.

Long, M. (2015) Second Language Acquisition and Task-Based Language Teaching. Oxford, Wiley.

Ur, P. A Course in Language Teaching. Cambridge, CUP.

Whong, M., Gil, K.H. and Marsden, H., (2014). Beyond paradigm: The ‘what’ and the ‘how’ of classroom research. Second Language Research, 30(4), pp.551-568.

ZhaoHong, H. and Nassaji, H. (2018) Introduction: A snapshot of thirty-five years of instructed second language acquisition. Language Teaching Research, in press.

Thesis on Current ELT, Part Two


In part One, I suggested that coursebook-driven ELT is a prime example of the commodification of education. Here, in Part Two, I focus on the Common European Frame of Reference (CEFR) and high stakes tests. The global adoption of coursebook-driven ELT is illustrated by the increasing use of the CEFR, which informs not just coursebooks, but the high stakes tests which loom large in the background. I rely mostly on the work of Glenn Fulcher and on Jordan & Long (2022).  

18,1. As Fulcher (2010), argues, citing Bonnet (2007), the CEFR is increasingly being used to promote a move towards “a common educational policy in language learning, teaching and assessment, both at the EU level and beyond”. The rapid spread of the use of the CEFR across Europe and other parts of the world is due to the ease with which it can be used in standards-based assessment. As a policy tool for harmonization, the CEFR is manipulated by “juggernaut-like centralizing institutions”, which are using the CEFR to define required levels of achievement for school pupils as well as adult language learners worldwide.

The indiscriminate exportation of the CEFR for use in standards-based education and assessment in non-European contexts, such as Hong Kong and Taiwan, shows that it is being increasingly used as an instrument of power ((Davies 2008: 438).

18.2. Fulcher (2008: 170) nails the problem of the CEFR. It requires a few seconds close reading, if you’ll forgive me, to appreciate its full import.

It is a short step for policy makers, from ‘the standard required for level X’ to ‘level X is the standard required for….’ (emphasis added).

Fulcher (ibid.) comments: “This illegitimate leap of reasoning is politically attractive, but hardly ever made explicit or supported by research. For this step to take place, a framework has to undergo a process of reification, a process defined as “the propensity to convert an abstract concept into a hard entity” (Gould 1996: 27)”.

18.3. The CEFR scale descriptors are based entirely on intuitive teacher judgments rather than on samples of performance. The scales have no empirical basis or any basis in theory, or in SLA research. They’re “Frankenstein scales”, as Fulcher calls them. We can’t reasonably expect the CEFR scale to relate to any specific communicative context, or even to provide a measure of any particular communicative language ability. To quote Fulcher (2010) again:

Most importantly, we cannot make the assumption that abilities do develop in the way implied by the hierarchical structure of the scales. The scaling methodology assumes that all descriptors define a statistically unidimensional scale, but it has long been known that the assumed linearity of such scales does not equate to how learners actually acquire language or communicative abilities (Fulcher 1996b, Hulstijn 2007, Meisel 1980). Statistical and psychological unidimensionality are not equivalent, as we have long been aware (Henning 1992). The pedagogic notion of “climbing the CEFR ladder” is therefore naïve in the extreme (Westhoff 2007: 678). Finally, post-hoc attempts to produce benchmark samples showing typical performance at levels inevitably fall prey to the same critique as similar ACTFL studies in the 1980s, that the system states purely analytic truths: “things are true by definition only” (Lantolf and Frawley 1985: 339), and these definitions are both circular and reductive (Fulcher 2008: 170-171). The reification of the CEFR is therefore not theoretically justified.

19.1. Current English language testing uses the CEFR scale in three types of test: first, placement tests, which assign students to a CEFR level, from A1 to C2, where an appropriate course of English, guided by an appropriate coursebook, awaits them; second, progress tests, which are used to decide if students are ready or not for their next course of English; and third, high-stakes-decision proficiency tests (a multi-billion-dollar commercial activity in its own right), which are used purportedly to determine students’ current proficiency level.

19.2. The key place of testing in the ELT industry is demonstrated not just by exam preparation materials which are a lucrative part of publishing companies’ business, but by the fact that most courses of English provided by schools and institutes at all three educational levels start and finish with a test.

Perhaps the best illustration of how language testing forms part of the ELT “hydra” is the Pearson Global Scale of English (GSE), which allows for much more finely grained measurement than that attempted in the CEFR. In the Pearson scale, there are 2,000 can-do descriptors called “Learning Objectives”; over 450 “Grammar Objectives”; 39,000 “Vocabulary items”; and 80,000 “Collocations”, all tagged to nine different levels of proficiency (Pearson, 2019).  Pearson’s GSE comprises four distinct parts, which together create what they proudly describe as “an overall English learning ecosystem” (Pearson, 2019, p.2.). The parts are: 

•           The scale itself – a granular, precise scale of proficiency aligned to the CEFR.

•           GSE Learning Objectives – over 1,800 “can-do” statements that provide context for teachers and learners across reading, writing, speaking and listening.

•           Course Materials – digital and printed materials, most importantly, series of General English coursebooks.

•           Assessments – Placement, Progress and Pearson Test of English Academic tests. 

As Jordan & Long (2022) comment:

Pearson say that while their GSE “reinforces” the CEFR as a tool for standards-based assessment, it goes much further, providing the definitive, all-inclusive package for learning English, including placement, progress and proficiency tests, syllabi and materials for each of the nine levels, and a complete range of teacher training and development materials. In this way the language learning process is finally and definitively reified: the abstract concepts of “granular descriptors” are converted into real entities, and it is assumed that learners move unidimensionally along a line from 10 to 90, making steady, linear progress along a list of can-do statements laid out in an easy-to-difficult sequence, leading inexorably, triumphantly, to the ability to use the L2 successfully for whatever communicative purpose you care to mention. It is the marketing division’s dream, and it shows just how far the commodification of ELT has already come.

19.3. The power of high stakes tests is exemplified by the work of the Cambridge Assessment Group. It has three major exam boards: Cambridge Assessment English, Cambridge Assessment International Education, and Oxford Cambridge and RSA Examinations. (Note that all these companies are owned by the University of Cambridge and are registered as charities, exempt from taxes!) The group are responsible for the Cambridge B2 (formerly the First Certificate Exam) and Cambridge C1 (formerly the Cambridge Advanced Exam), and also, along with their partners, for the IELTS exams, used globally as a university entrance test (the Academic module), an entrance test to many professions and job opportunities, and as a test for those wishing to migrate to an English-speaking country (the General English module).

In 2018, the Cambridge Assessment Group designed and delivered assessments to more than 8 million learners in over 170 countries, employed nearly 3,000 people in more than 40 locations around the world and generated revenue of over £382 million (tax free). More than 25,000 organizations accept Cambridge English exams as proof of English language ability, including top US and Canadian institutions, all universities in Australia, New Zealand and in the UK, immigration authorities across the English-speaking world, and multinational companies including Adidas, BP, Ernst & Young, Hewlett-Packard, Johnson & Johnson, and Microsoft. The Cambridge English exams can be taken at over 2,800 authorized exam centers, and there are 50,000 preparation centers worldwide where candidates can prepare for the exams. The impact of the Cambridge Assessment Group’s tests on millions of individual lives can be life-changing, and the scale of their activities means that they have global political, social, economic, and ethical consequences, suggesting to many that an independent body is needed to regulate them.

19.4. As indicated above, “proficiency” in the high scale tests is an epiphenomenon – a secondary effect or by-product of the thing itself. Overall “proficiency” is divided into levels on a proficiency rating scale, determined by groups of people who write proficiency level descriptors, and decide that there are X levels on the particular scale they develop. In fact. only zero and near-native proficiency levels are truly measurable. We know this from the results from countless empirical SLA studies that have tried to identify the advanced learner, which has required the ability to distinguish near-native speakers from true native speakers. Results of these studies consistently show such distinctions are possible provided measures are sufficiently sensitive (Hyltenstam, 2016), and they demonstrate that any other distinctions along proficiency scales are unreliable.

19.5. Beyond the proficiency scale descriptors, there are numerous problems in the tests that elicit language samples on which scores and ratings are based. For example, proficiency tests typically employ speaking prompts and reading texts which purport to have been “leveled,” i.e., judged to aim at the level concerned. This is nonsense. Apart from highly specialized material, all prompts and all texts can be responded to or read at some level; the amount of information conveyed or understood will simply vary as function of language ability. Moreover, proficiency scales offer little in the way of diagnostic information which could indicate to teachers and learners what they would need to do to improve their scores and ratings.

19.6. There is little evidence that proficiency ratings are predictive of success in any language use domain. Even if a test taker can succeed in the testing context, there is no way to tell whether this means the person will succeed outside that context, for example in using language for professional purposes.

19.7. The administration and management of high stakes tests raises the issue of discrimination based on economic inequality. The test fees are high and vary significantly – in the IELTS tests, fees vary from the equivalent of approximately US$150 in Egypt to double that in China, a difference explained more by Chinese students’ desire to study abroad than by any international differences in administration or management costs. Such are the expenses involved in taking these tests that they evidently discriminate against those with lower economic means and make it impossible for some people to take the test multiple times in order to achieve the required score. W.S. Pearson (2019) also points out that the owners of IELTS produce and promote commercial IELTS preparation content, which takes the form of printed and on-line materials and teacher-led courses. These make further financial demands on the test-takers, and while some free online preparation materials are made available on the IELTS website, full access to the materials costs approximately US$52, and is free only for candidates who do the test or a preparation course with the British Council. Likewise, details of the criteria used to assess the IELTS writing test are only freely available to British Council candidates; all other candidates are charged approximately US$55 for this important information. Finally, it should be noted that it is common, for those who can afford it, to take the IELTS multiple times in an attempt to improve their scores, and that the score obtained in an IELTS test is only valid for two years.   

19.8. The simplicity and efficiency with which high stake test scores can be processed strengthens the perception that the scores are used blindly by the gatekeepers of university entrance,. If an overseas student does not achieve the required score, their application for admission to the university is normally turned down. Even more questionable is the use of the test by employers to assess prospective employees’ ability to function in the workplace, despite the fact that, in most cases, none of the test tasks closely corresponds with what an employee is expected to do in the job. Worst of all, band scores in the test are used by some national governments as benchmarks for migration: It is quite simply immoral to use a score on these tests to deny a person’s application for immigration. 

Those who seek to study at universities abroad or to work for a number of large multinational companies, or to migrate, are forced to engage with these tests on the terms set by the test owners, conferring on the owners considerable global power and influence; and they suffer dire consequences if they fail to achieve the required mark in tests which, in a great many cases, are not fit for purpose.

I’ll give a full list of references at the end of the thesis.

2022: A Personal View of The Ups and Downs in ELT


It’s been another bad year for ELT. You win some, you lose some, and if you’re fighting the hydra of the ELT establishment, you generally lose. A quick look at the money made by the owners and main stakeholders of a 200 billion dollar industry is enough to appreciate the power that vested interests have to ensure that coursebook-driven ELT keeps on going, despite its abysmal failure to deliver the promised goods. What chance have those who promote radical alternatives like Dogme or TBLT got?

After all, those of us fighting for change in ELT share the obstacles facing any group that fights for radical change – we fight against a well-entrenched establishment that defends and promotes the interests of a ruling class bent on accruing wealth. And today, the hopes of success are surely worse than they’ve ever been – David has never been so puny, Goliath never so colossal. Our enemy is truly imperial. Above our bosses – those who run the ELT industry (stuffing their own pockets while pushing their employees towards, or further into, poverty) – are their bosses: a ruling class of plutocrats.  

Piketty’s acclaimed “Capital in the 21st Century”, for all its analytical shortcomings, provides some good, quantifiable descriptors of 21st century wealth and describes how, after a short dip in the mid 20th century, wealth is increasingly concentrated in ever-fewer hands. The Squeeze goes on, the gap between rich and poor widens, the middle classes collapse, the lives of the uber-wealthy serve as vulgar, unobtainable goals for the rest, who become spectators, as the Situationists called them in the sixties, of the fake accounts of the lives of celebrities. Neoliberal Capitalism strides on. It drives not just coursebook-driven ELT, but also the commodification of education and the commodification of everyday life. As a necessary consequence, it also drives the destruction of the planet. Homo sapiens (“the wise human”!) has so lost its way that it has now turned its back on wisdom, on living a good life, on its own survival. We’re on our way to such catastrophic climate change that talking about the state of ELT – like talking about anything other than what’s happening to the planet – seems absurd. Hey Ho. The band strikes up Monty Python’s “Always Look On The Bright Side” and onward thru the fog we go.


1. The Book

For me, the highlight of the year was the publication of the book Mike Long and I had worked on for two years. A huge cloud hung over its publication because Mike Long wasn’t here to see it; but, thanks to Cathy Doughty’s efforts, it’s finally out there. Even with Mike’s clout behind it, we couldn’t find a big publisher willing to take it on, so we ended up with a publisher that has aimed the book exclusively at university libraries. If you haven’t got access to it, and you’d like to read it, get in touch with me and I’ll send you a pdf. copy.

Our book starts from the premise that ELT practice should be informed by what we know about how people learn an L2. The first six chapters outline a view of SLA based on 60 years of SLA research. It goes on to describe how current ELT largely ignores that view, and thus leads to inefficacious ELT practices: synthetic syllabuses, teacher training, classroom practice and high stakes exams all focus on the English language as an object of study, rather than as something that is learned by doing. We argue that, driven by commercial interests which insist on packaging ELT into commodities for sale, ELT today betrays educational principles in general, and the special characteristics of learning an L2 in particular. We conclude with a chapter that describes and discusses promising alternatives. See here for a review.

It was an honour to collaborate with Mike on this book; it represents the most enjoyable and rewarding work in my professional life.

2. Lunch with Scott Thornbury

After forty years of reading Scott’s stuff, bumping into him at conferences, tangling with him on social media – his blog, mine, Twitter – we finally met for a lunch in Barcelona. I was so looking forward to our meeting that I drank too much wine too quickly (the story of my life, right there!), but anyway, I thoroughly enjoyed it. Scott hurried into the restaurant looking fit, vibrant, handsome, and then sat down in a way that all of us over 70 years old do: you take in the height of the chair, you poise yourself, you let go in an act of faith, and you let out a sigh of relief as your bum mercifully hits the middle of the chair: “Aghhh Umphhhh!”. Warm handshakes, “It’s been too long…”, all that, and off we went.

We went down memory lane and talked about our initiation into ELT. We both did International House courses in London. While I remember that course as one of the worst “educational” experiences of my life – to me, it felt like brain-washing  –  Scott remembers it as informative and inspiring. Well there you go. On then to our encounters with leading figures in ELT history. Chris Candlin (abrasive, radical, a dauntingly assertive pioneer); Henry Widdowson (a beautifully articulate, incisively critical conservative); Earl Stevick (the maestro, a truly lovely human being. We agreed that he was the most influential voice for humanistic teaching that we’d ever met); Jack Richards (multi-millionaire who paid a six-figure sum for a tiny bit of Ming pottery while shopping in Barcelona); David Nunan (charming, well-informed, workaholic, almost as rich and ambitious as Richards); Dick Schmidt (frighteningly brilliant; a bit like Mike Long: if you weren’t at your best, he’d shoot you down in flames); John Fanselow (another lovely human being, the very best, much neglected expert on class observation and, IMHO (with Scott a close second), the best plenary speaker in the history of international IATEFL and TESOL conferences); Mike Long (despite his reputation, enormously witty and fun to be with); and many others. What had we learnt? Me: respect for scientific enquiry and humanistic pedagogy are the recipes for good ELT practice. Scott: learn from the past, take gems from everywhere, be realistic and tolerant, base ELT on doing things in the L2.      

Then, as the pudding was served (flan de la casa, of course), we waded into the present malaise of current ELT. There were few things we disagreed about. Scott, after all, is the one who has most often and most eloquently skewered coursebook-driven ELT. He famously described current ELT as serving up “grammar McNuggets”, and he is, with co-author Luke Meddings, the driving force behind “Dogme”, a radical alternative to current ELT practices. While he’s pessimistic about any big changes in ELT happening soon, he works for change. Scott supports the “Hands Up” project, he engages with radicals, he’s a force for change.    

As we left the restaurant, my “unsteady gait” led Scott to gently express concern. “Are you OK?”, he asked. “Don’t worry”, I told him cheerfully, as I waved vaguely at passing taxis, “I’ll make it”. Slumped in a taxi, I reflected on Scott. What a man! He walks the tightrope between the establishment and its dissidents with remarkable grace and aplomb. He so often wobbles perilously up there on the high wire, his theoretically-opposed left and right arms outstretched, flapping precariously up and down, seeking balance, dangerously aware of contradictions. Will he fall? No he won’t! He’ll stay up there, charming us all, doing much more good than harm to the cause of efficacious ELT. Like so many, I’m a devoted fan of Scott Thornbury, and I remember my lunch with him very fondly.

3. Employment

Late this year, I took on a new job as supervisor for Ph.D. students at the University of Wales, Trinity Saint David. I was afraid that my age (78 years old) would disqualify me, but I’m very pleased to say that it didn’t. I get paid significantly more than Leicester University pays me and I will work with students who are engaged in what I regard as the only real challenge left in today’s higher education certificates.

Doing a Ph.D. involves intellectual curiosity, digging (by copious reading), intellectual discipline to cope with the digging, stamina and motivation. Supervising Ph.D. students involves appreciating the task they’ve taken on, a real engagement with their topic, helping with the formulation and execution of their study, helping with the organization and presentation of the thesis, and giving them encouragement throughout. This is the kind of teaching I enjoy the most.

I got my love of studying (the necessary quality of an academic) from my school days. Aged 16, in the “Sixth Form”, I chose three A-Level subjects (economics, philosophy and European history), exam results in which would decide if I went to university. The way I was taught suddenly changed: teaching went from working through a coursebook to asking students to explore for themselves. We were given a topic (e.g., “Causes of the First World War”), a list of books, and we were required to produce an essay which was improved by successive drafts, helped by peer and teacher feedback. I took to this new way of learning like the proverbial duck takes to water.

At university, the same kind of teaching continued, but far less guidance was given. My tutor in my first year as an undergraduate at the LSE was Prof. De Smith, an expert in International Law who wrote constitutions for African countries recently freed from British rule. His efforts – hundreds of pages of closely-argued guidance on how to run a country – were usually tossed in the bin soon after the new governments got power. I made several unsuccessful attempts to see my tutor before I learned from conversations with post-grads in the Three Tuns bar that Prof. De Smith never talked to undergraduates. His room was on the top floor of the East Wing building, and whenever he heard a knock on his door, he scooted up a ladder and sat on the roof till he was sure the would-be intruder had gone away. In the entire year, I never spoke to Prof. De Smith, or heard from him, so I got on with my work without his help. At the end of the year, I got a letter from the uni. telling me I’d passed all my exams. In the same envelope was a letter from the great man himself. He congratulated me on my results and concluded “You have been a model tutee. If ever you need a reference, do not hesitate to contact me.”

That’s a radical version of learner-centred education, but it’s one I’ve always tilted towards. Teachers should get out of the way of their students’ learning trajectory, and nowhere is that imperative more true than in ELT.



Most of the “downs” of 2022 relate to the increased precarity of teachers’ jobs. In all sectors of education – private & public, primary, secondary, tertiary – we’ve seen the erosion of decent contracts, pay and pension plans. The ELT section of education is worse than most. It’s such a big profit-maker that it’s particularly riddled with government corruption, cronyism, and disgraceful exploitation of workers in unregulated private schools all over the world. IATEFL’s non-engagement in the fight for teachers’ rights is a disgrace. Despite attempts by Paul Walsh and others, IATEFL refuses to change its constitution, refuses to allow these matters to be discussed in plenaries at their conferences, refuses to devote time or funds to fighting the most blatant examples of worker exploitation. Shame on the organization, and shame particularly on those who lead it and its Special Interest groups.  

2. Raciolinguistics and Translanguaging

This year has seen the continued promotion of raciolinguistics and translanguaging in ELT circles, perhaps influenced by the increasing number of articles on these topics appearing in academic journals. However much the discussion of these topics might help those involved in ELT think more carefully about racism and about the English language as a purveyor of imperialist ideologies, I suggest that

  1. they’re both blighted by their reliance on relativist epistemologies, and
  2. neither offers any clear progressive alternative to how ELT should be carried out.       

As to the first point, relativists argue that there’s no such thing as objective knowledge – everything we “know” is socially constructed; there’s no reliable way we can judge between rival explanations of certain phenomena. Anybody who takes this view seriously, or adopts it as an intellectual posture (“trying it on” as Auden said) is beyond the realm of critical, rational discourse. Note that one thing is to discuss epistemology in a philosophical way, and another is to simply adopt a relativist stance because it suits you. I like reading Derrida, Foucault, Baudrillard and others. I confess that they often lose me, there’s stuff I just don’t get, and much that I think is rubbish. But I don’t think it’s bullshit, which, in my opinion, is what people like Shawer, Guba and Lincoln write. They adopt relativism because it suits them, but they don’t know what they’re talking about. (Bullshit involves talking about things that you don’t know much about as if you knew a lot more. Critical thinking (one more commodity these days) involves logical thinking, but after that, it’s largely about developing the ability to recognise bullshit.) Guba & Lincoln’s bullshit has led to millions of gullible people with Masters degrees in TESOL completely misunderstanding scientific method. By extension, it’s led to the flimsy support offered for a lot of sociolinguistic research in the last 30 years.  

Talking of bullshit, the worst example of it in 2022 is surely Gerald’s attempt to apply raciolinguistics to ELT in his frantically self-promoted book Antisocial Language Teaching. I’ve reviewed the book already, so let me here focus on its contribution to ELT practice. It’s pathetic. My on-line thesaurus suggests feeble, paltry, miserable, puny, useless as synonyms and I think they all apply. Gerald is one of those native speaker chancers who taught EFL for a few years abroad (S. Korea) and then came home (New York) to do a Ph.D. and join the crowded ranks of social media bullshitters. There’s no indication in anything that he’s published to suggest that Gerald has even the most elementary grasp of SLA, or of the development of ELT methodology, or syllabus design, or language assessment, or any damn thing related to the principles or practice of ELT. Gerald’s suggestions for improving things indicate a flimsy, superficial understanding of what ELT involves. More than that, they indicate that, really, he doesn’t give a flying fuck about ELT.    

Let’s go from the gutter to the dizzy realms of academia. Up here, where the air is thin, translanguaging is trending. Garcia, Flores, Rosa and Li Wei are the translanguaging crusaders whose obscurantist prose is a give-away for the fact that they have nothing interesting to offer ELT practice (or anything else for that matter). During 2022 I’ve written several posts on their stuff, so here, it’s enough to say that seeing language as “a fluid, embodied social construct” contributes little to the task of bringing about new, innovative English language education. None of the Fantastic Four has come anywhere near to presenting a clear outline of how translaguaging – a fashionable, incoherent theoretical construct, handy these days to gain quick promotion up there in acamedia – can encourage the radical change in ELT which is so urgently needed.

Onward thru the fog, then, and best wishes for 2023.             

SLB: Task-Based Language Teaching Course No. 4

Roll Up! Roll Up! Limited Places! Hury! Hurry! Hurry!

The fourth run of our online TBLT course starts on January 23rd 2023 and subscription is now open. It’s a 100-hour, online tutored course aimed at 

  • classroom teachers
  • course designers,
  • teacher-trainers, 
  • directors of studies and 
  • materials writers.

The growing popularity of TBLT as an approach to language teaching is surely explained by increasing dissatisfaction among EFL professionals with current ELT practice. As convenient as coursebook-driven courses might be, they’re tedious, based on false assumptions about how people learn an L2, and they frequently fail to deliver the improvement that students hope for. In contrast, TBLT focuses on meaning-making and engagement with real-world language needs; they give experienced teachers fresh opportunities to re-engage with their practice, they offer new teachers a more challenging, much more rewarding framework for their work, and they allow students to learn through scaffolded use of the language (learning by doing), which, as we know from evidence from research, is the best way to learn an L2.

The vibrancy of TBLT is evidenced by animated discussions on social media, by increasing presentations at conferences (including the biennial International Conference on TBLT), by the recently-formed International Association of TBLT (IATBLT), and by the wave of new publications, including thousands of journal articles, special issues in prominent journals, and the new journal specifically dedicated to the topic, TASK: Journal on Task-Based Language Teaching and Learning, the first volume of which appeared in 2021. Books which followed Long’s seminal (2015) SLA and TBLT include

  • Ahmadian & García Mayo, M. (2017) Task-Based Language Teaching: Issues, Research and Practice,
  • Ellis, R., Skehan, P., Li, S., Shintani, N., & Lambert, C. (2019) Task-Based Language Teaching: Theory and Practice. and
  • Ahmadian, M. & Long, M. (2021) Cambridge Handbook of Task-Based Language Teaching. 

All of these will be dealt with in the course.

The Course

Our SLB course tries to “walk the talk” by working through a series of tasks relating to key aspects of TBLT, from needs analysis through syllabus and material design to classroom delivery and assessment. While we are influenced by Long’s particular version of TBLT, we also explore lighter, more feasible versions of TBLT which can be adopted by smaller schools or individual teachers working with groups with specific needs.

 Neil McMillan (president of SLF) and Geoff Jordan (both experienced teachers with Ph.Ds) do most of the tutoring, but we are priveleged to be assisted by the following experts:

Roger GilabertAn expert on TBLT, Roger worked with Mike Long on several projects and has developed a TBLT course for Catalan journalists. His contributions to our three previous courses have been extremely highly rated by participants.

Marta González-Lloret: Marta did her PhD with Mike Long at the University of Hawai’i, is currently book series co-editor of Task-Based Language Teaching. Issues, Research and Practice, Benjamins, and is espcially interested in using technology-mediated tasks.

Glenn FulcherGlenn is a renowned testing & assessment scholar. His (2015) Re-examining Language Testing. A Philosophical and Social Inquiry was winner of the 2016 SAGE/ILTA Book Award, jointly with Fulcher and Davidson (2012) The Routledge Handbook of Language Testing.  He’ll help us with our discussion of task-based, criterion-referenced performance tests.

Peter Skehan Peter is one of the most influential scholars in SLA, with a particular interest in TBLT. Peter was the inaugural recipient, along with Mike Long, of the IATBLT’s Distinguished Achievement Award, made in 2017 at the Barcelona conference. We will use recordings we made of discussions with Peter, where he helps us get to grips with a key part of TBLT: designing and sequencing pedagogic tasks. We hope Peter will also join us for a video-conference session during the course.

Ljiljana Havran Ljiljana is an experienced teacher and teacher trainer who works in Belgrade. Her blog is one of the most read and respected in the ELT community. Ljiljana will share her experiences of designing and implementing a TBLT course for pilots and air traffic controllers.

Rose Bard Rose Bard works in Brazil and, like Ljiljana, has a blog which enjoys a wide audience. In this course, Rose is going to tell us how she uses Minecraft in her TBLT courses aimed at young learners.

Mike LongThe course will also include exclusive recordings of Mike Long, who inspired Neil and me to design the course, and who contributed to our first three versions of the course.

1. Modules

Now you can choose individual modules or the whole course. The whole course takes 100 hours and consists of five modules (see below). If you choose to do one or two individual modules, you’ll have the chance to do further modules in later courses to achieve complete certification.

2. TBLT & Technology

We will give more attention to the increasingly important influence of new technologies on the TBLT field. As the tasks people need to perform are increasingly mediated by technologies, so is TBLT itself, with consequences for how TBLT courses are designed and run.

3. More Flexible approach to TBLT

Thanks to the truly impressive work of the participants in the three previous courses, we’ve learned a lot about the problems of implementing a full version of Long’s TBLT, and we now better appreciate the need for a flexible case-by-case approach to the design and implementation of any TBLT project.

In the third course, we were very pleased to see how each participant slowly developed their own TBLT agenda, working on identifying their own target tasks, breaking these down into relevant pedagogic tasks, finding suitable materials, and bringing all this together using the most appropriate pedagogic procedures.

Another gratifying aspect of all the courses has been the way participants have learned from each other; most of the individual participant’s TBLT models contain common elements which have been forged from the forum discussions.

So in this course, we’ll make even more effort to ensure that each participant works in accord with their own teaching context, and at the same time contributes to the pooled knowledge and expertese of the group.

2. There are 5 modules:

  • Presenting TBLT
  • Designing a TBLT Needs Analysis
  • Designing a task-based pedagogic unit
  • Task-Based Materials:
  •  Facilitating and evaluating tasks

3. Each Module contains:

  • Background reading.
  • A video presentation from the session tutor and/or guest tutors.
  • Interactive exercises to explore key concepts. 
  • An on-going forum discussion with the tutors, guest turors and fellow course participants.
  • An extensive group videoconference session with the tutors and/or guest tutors. 
  • An assessed task (e.g. short essay, presentation, task analysis etc.). 

Sneak Preview

To get more information about the course, and try out a “taster” CLICK HERE

Materials for ELT – and Noticing

Further to discussions on Twitter with Matt Bury and Peter Fenton, here’s a summary of Mike Long’s view of elaborated and modified elaborated input, with some comments about noticing that just sort of happened. The main text is Long (2020), but I also refer to Long (2015) and to Jordan & Long (2022).  

Spoken and written input for language learning traditionally focus on the relative merits of authentic and linguistically simplified spoken and written texts. Long argues that elaborated input and, in particular, modified elaborated input, are better options, especially when the input texts are part of tasks.

Genuine (authentic) input

Authentic texts are spoken or written records of real-world communication among native or non-native speakers, i.e. texts not spoken or written in conformity with any particular linguistic guidelines or vocabulary list.

Widdowson (1976) rightly problematized ‘authentic’, pointing out that a text may be genuine in the sense of not originally having been produced for language teaching, but its use in a coursebook or classroom lesson may not be authentic. If a teacher records and transcribes segments of a radio news broadcast or undergraduate economics lecture, and then, with key information bits deleted, presents written excerpts to students, whose job it is to fill in the missing words or phrases as they listen to the recording, then the classroom activities based upon them are not authentic.

An obvious problem with genuine texts for language teaching is that most are produced for native speakers. Parts will be linguistically too simple, and (more often) other parts too complex to be processed. Teachers waste a lot of limited classroom time in an effort to make institutionally mandated, inappropriately complex (simplified or genuine) dialogs or reading passages comprehensible for students who were  simply not ready for them. The devices to which teachers resort to increase comprehensibility (schema building, vocabulary pre-teaching, visual aids, translation, grammatical explanations, vocabulary glossing, etc.) are precisely those that make classroom use of genuine (or any other) texts inauthentic.

Simplified input

Simplified texts typically take the form of graded readers and the dialogs and reading passages found in coursebooks. In many cases, simplified texts are themselves originals, written that way from the get-go, so strictly speaking, linguistically simple, not simplified. In either case, they are created using only those linguistic forms, verb tenses, grammatical structures, lexical items, and collocations thought appropriate for learners of a given L2 ‘proficiency level’. Learners are presented with examples of what Widdowson (1972) called target language usage (What am I wearing? Youre wearing a sweater), not use. The aim is to show the inner workings of the code, not how the language is used for communication.

I think this bit of Long (2020) is worth quoting:

Whereas real conversations are marked by open-endedness, implicitness, and intertextuality, there is a tendency for textbook writers to produce stand-alone dialogs (and reading passages) with a beginning, a middle, and an end, containing all the information needed for the inevitable comprehension questions, and no more. Did David take the pills? How many pills did he take? When did he take them? Did David exercise? And so on. (For detailed examples and discussion, see Long, 2015, pp. 169–204.) Little or nothing is left unstated, and allusions are rare. The Blands of Potters Bar greet one another, say something banal that includes several models of the “structure du jour”, and take their leave. Even in the hands of  the most skillful materials writers, the end-product can be painful, and the resulting classroom inter- action reminiscent of Becket on a bad day (Dinsmore, 1985). The artificiality is increased by the fact that materials writers’ intuitions are the basis for the texts they write, and every study that has looked at the issue (e.g., Cathcart,1989; Williams,1988; Bartlett, 2005; Granena, 2008) has found materials written that way differ markedly from real speech, a problem exacerbated when unfamiliar specialist discourse domains are involved.

Constantly recycling the same grammatical patterns and limited set of vocabulary items results in impoverished input, which is counter-productive from an acquisition perspective. Acquisition potential is sacrificed for comprehensibility thus excluding many new opportunities for learning. Comprehensibility is needed, but language acquisition from the input should not be sacrificed.

Below, Tables 1.1 and 12 juxtapose genuine and simplified versions of the same short text. Regarding the simplified text, Long comments

 in the simplified version, the series of short, choppy sentences creates an irritating, breathless, staccato effect, and can make processing for meaning harder. The intra-sentential linker, so, has been lost, and with it, the explicit marking (an example of redundancy) of the causal relationship between the driver fleeing the scene  and the woman’s inability to provide the police with anything more than a rough description. This, too, can make comprehension harder, as the causality is now left to inference on the  reader’s part.  There is unnatural repetition of full noun phrases (driver driver, woman woman) instead of target-like pronominalization and anaphoric reference (driver he, woman she). Examples of genuine NS language use (catch a glimpse, fled the scene, provide a rough description, bolded in Table 1.1 to make tracking them easier) are lost, replaced by higher frequency, less informative, unnatural-sounding items with  less precise meanings (saw for only a moment, immediately drove away fast, tell (the police) a little about him). Comprehensibility has been improved, no doubt, but of a text that has been bled  of semantic detail and realistic models of target  language  use.  Fortunately,  genuine  and  simplified  texts are not the only options.

Elaborated input

Long’s research into how native speakers (NS) modified the way they talked to non-native speakers (NNS) – so-called foreigner talk discourse  –  found that while NSs made some quantitative changes to the language they used, e.g., by employing shorter utterances and favoring yes/no or or-choice over wh questions, modifications of the Interactional Structure of Conversation were more pervasive and more important. In other words, comprehensibility was achieved not so much by simplifying the input as by changing the ways communicative talk was accomplished. Long says:

When NSs converse with NNS, they use a wide variety of devices for the purpose of input elaboration, including slower speech rate, relinquishing topic control, making new topics salient, preference for a here-and-now over a there-and-then orientation, decomposing complex topics into their component parts, eight types of repetition (exact and semantic, partial and complete, self and other), one-beat pauses before and/or after key information-bearing words, clarification requests, com- prehension checks, confirmation checks, lexical switches, synonyms, antonyms, and informal definitions.

Table 1.1 Traffic accident sentences

Genuine The only witness just caught a glimpse of the driver as he fled the scene, so she could only provide the police with a rough description.
SimplifiedA woman was the only person who saw the accident. She saw the driver for only a moment. The driver did not stop. He immediately drove away fast. The woman could only tell the police a little about him.
ElaboratedThe only person who saw the accident, the only witness, was a woman. She only caught a glimpse of the driver, just saw him for a moment, because he fled the scene, driving away fast without stopping, so she could only provide the police with a rough description of him, not an accurate one.
Modified elaboratedThe only person who saw the accident, the only witness, was a woman. She only caught a glimpse of the driver, just saw him for a moment, because he fled the scene, driving away fast without stopping. As a result, she could only provide the police with a rough description of him, not an accurate one.

Table 1.2 Descriptive statistics for the traffic accident sentences

Modified elaborated56318.672.3

*MVs, main verbs and modals.

Genuine: The only witness just caught a glimpse of the driver as he fled the scene, so she could only provide the police with a rough description.

Elaborated: The only person who saw the accident, the only witness, was a woman. She only  caught  a glimpse of the driver, just saw him  for a  moment,  because  he  fled  the  scene,  driving  away fast without stopping, so she could only  provide  the  police with  a rough  description  of him,  not an accurate one.

The total word count is twice that for the genuine version, and the syntactic complexity over twice that of the simplified version. The increase in word count is a direct result of various kinds of input redundancy, which, by definition, result from elaboration. For example, informal definitions have been added to facilitate understanding of the (bolded) unknown lexical items and collocations. The bolded items were  lost in the simplified version, but now retained. Help with the meaning of ‘rough’ in a rough description is offered by the added contrast, not an accurate one. NNSs in most research comparing the comprehensibility of spoken and written texts have been found to understand elaborated versions almost as well as simplified versions, and both significantly better than genuine versions.

The Yano et al. (1994), Oh (2001) and O’Donnell (2009) studies showed that elaborative modifications were as successful in improving comprehension as simplification, and did so without sacrificing unknown lexical items. The finding suggests that whereas simplified input tends to bleed semantic content, elaborated input does not. The findings are positive from an acquisition perspective, too, as they mean that NS models of target language use need not be sacrificed on the altar of comprehensible input.

Modified elaborated input

Despite its overall advantages, input elaboration also has some undesirable side-effects, in particular, excessive utterance or sentence length When designing pedagogic materials, it is easy to provide teachers and learners with the advantages of elaborated input while eliminating the readability problem. In most cases, all that is required is to split unwieldy utterances or sentences into shorter ones, and where needed,  add intra- or inter-sentential  linking expressions to strengthen any vulnerable semantic relationships among the resulting parts:

The only person who saw the accident, the only witness, was a woman. She only  caught  a glimpse of the driver, just saw him  for a  moment,  because  he  fled  the  scene,  driving  away fast without stopping. As a result, she  could  only provide  the police with a rough  description    of him, not an accurate one.

All the important examples of native language use (witness, catch a glimpse, flee the scene, rough description, etc.), lost in the simplified version, are again retained, and the causal relationship between the driver fleeing and the witness only being able to provide a rough description is marked explicitly by addition of the inter-sentential connector as a result.  Modified elaborated input exposes learners to nativelike L2 use, increases comprehensibility by retaining the  redundancy and other features typical  of elaboration, and restores normal sentence length and reasonable syntactic complexity. Just as training wheels are removed from bicycles as children learn to balance, or crutches replaced by a walking cane as an athlete recovers from a broken leg, the cane subsequently discarded, so the devices  employed to achieve elaboration are gradually removed as learners’ proficiency increases, eventually giving way to genuine texts for advanced students.

The crucial requirement is that the focus be on communication through the L2, not on the L2 code itself. The easiest way to create that condition, and thereby ensure that modified elaborated input is produced, is for materials writers and classroom teachers to recognize a variety of relevant sources  of input. Those include teacher speech, input from classmates, and language use surrounding the performance of dynamic tasks. Then, just as caretakers effortlessly and instinctively modify the way they talk to infant children, so, after some simple training, materials writers and classroom teachers can produce appropriate modified elaborated input for L2 learners, including students learning subject matter or to perform tasks in the L2.

The Kicker

For most learners of an L2, a functional command of the L2 is more important than knowing about the L2 grammar. Learners depend primarily on their implicit knowledge – L2 knowledge they have but do not know they have. Implicit knowledge is the result of incidental L2 learning – picking up parts of a language without intending to, as a by-product of doing something else, such as reading a newspaper, watching TV, living overseas in an immersion setting, or working in the L2 on a problem-solving task.

Long recognizes that while it is important to increase classroom opportunities for incidental learning, there is insufficient time, and usually, insufficient input for classroom language learning by older children and adults to be accomplished purely implicitly, and far too many items for them all to be learned explicitly, either, of course.

Simply exposing learners to comprehensible input and relying on inductive learning to do the rest is to assume, wrongly, that the power of incidental learning, especially instance learning, remains as strong in older learners as it is in young children.

Long has argued for decades that a “Focus on Form” approach (deal with adult learners’ deficiencies by focusing on formal aspects of the language reactively, as they occur, thru negative feedback) is better than a grammar-based diet of “Focus on Forms” (i.e., lessons in which the primary focus is on language as object). He now asks: “Can the same results be achieved through even less intrusive means, specifically through enhanced incidental learning?”

Enhanced incidental learning refers to an internal process in the mind of the learner. It differs from input enhancement, which relies on a third party adding things like bolding, underlining, or italicizing words in written input, or stress, pauses, and increased volume in speech. While input enhancement attempts to promote conscious “noticing” of the target items, enhanced incidental learning is intended to increase unconscious detection, and thereby, the efficiency of the incidental learning process, without necessarily raising learning to the level of conscious awareness at all.

Conscious noticing, says Long, may be optimal for non-salient linguistic targets, but “can perception at the level of unconscious detection work with adults for vocabulary and collocations, at least, and possibly for perceptually salient grammatical issues, like negation or adverb placement?”

For me, this is a crucial question, almost hidden in the text. I don’t agree that conscious noticing is “optimal” for learning anything about language, but here, Long is entertaining an idea that he and I often argued about till I fell off my bar chair, my final words usually being something to the effect that noticing is probably the worst construct ever in SLA.   

Long asks if it’s necessary, or just facilitative, for teachers

to help learners notice a new form and establish a first representation in long-term memory – not to teach it in the traditional understanding of the term, but so that the representation thereafter serves  as a selective cue that primes the learner to attend to and perceive subsequent instances in the input during implicit processing?”

There’s a marvelous doubt percolating there – is “noticing” as a theoretical construct of any use at all?!  I was arguing amicably with Mike Long about this right up to when cancer so quickly snatched him from us. Noticing, Schmidt’s twice amended, much battered theoretical construct, is generally used, quite inappropriately and crassly in its dictionary sense, to justify explicit grammar teaching. The theoretical construct of noticing is more carefully used by those who suggest that SLA is a process involving “disabled” adult learners, who, while they learn an L2 pretty well in the normal implicit way, have trouble overcoming the influences of their L1. Nick Ellis says that adults learning English don’t “notice” non-salient, low frequency, etc., items of the L2, and they therefore resort to the norms of their L1, which leads to L1-enduced “errors” in their use of the L2. What’s needed to overcome these errors, says Nick Ellis, is for learners of English as an L2 to “re-set the dial” (i.e., the dial set by their L1). Re-setting the dial is best done, says Nick Ellis, by Long’s “focus on form” – quick interventions during meaning-focused communication which focus on form in such a way that a first representation of the recalcitrant form is placed in long-term memory, after which, the default implicit learning mechanism takes over again. To quote Nick Ellis: “The general principle of explicit learning in SLA is that changing the cues that learners focus on in their language processing changes what their implicit learning processes tune” (2005, p. 327). Mike Long, to my dismay, agreed with Nick Ellis. To borrow from Groucho Marx, I may be wonderful, but I think they’re wrong. I don’t think language learning is usage-based, I don’t think noticing is a useful construct, and I don’t think re-setting the dial is a good principle that usefully informs teaching.

Bimodal Input

Well, never mind. Whatever doubts I have about Long’s final views on SLA, he made a huge contribution to developing our knowledge of the field, and his suggestions for designing courses of English as an L2 remain among the best-informed, best-described and best-motivated of them all. Long insists that language learning is learning by doing and that therefore syllabuses should be analytic, not the awful synthetic bits of crap served up by coursebooks.

Long finishes his 2020 article by asking how incidental learning can be sped up and generally made more efficient. (See – in his heart, he really believes that we must leave behind all forms of instruction-based ELT!) He recommends the use of “bimodal input” – a growing and enormously important development in ELT materials. Input can be presented  in oral and written form simultaneously, as when a learner reads a story while listening to a recording of it being read aloud by someone else, e.g., in an audiobook. Other options include a combination of oral and visual, as when someone watches a video, written and visual, e.g., a silent movie with sub- titles, and (tri-modally) oral, written and visual, e.g., a movie with sound and sub-titles. Bimodal presentation doesn’t require all the work involved in making elaborated texts – recordings can simply use slow pace, with salience added to specific vocabulary items or collocations, through stress and one-beat pauses before and/or after key information-bearing items, with or without corresponding changes to the written version (italics, bolding, capitalization, color, etc.).

There remains the question of the content of the texts used. Use your own common sense, your own experience, your own judgement, your own political and cultural values, and your knowledge of the local context, to guide you. For additional help, talk to Tyson Seburn.   


Jordan, G & Long, M. ELT now and How it Could Be. Cambridge Scholars.

Long, M. (2020) Optimal input for language learning: Genuine, simplified, elaborated, or modified elaborated? Language Teaching, 53, 169–182

Long, M. (2015) Second Language Acquistion and Task-Based Language Teaching. Wiley, Blackwell.

Thesis on Current ELT Part One

1. Currently, over a billion people do courses in English as an L2. Most of them fail to reach their objectives.

2. They fail because ELT is a multi-billion dollar commercial enterprise, where the aim is to maximise profits, rather than help people to use English for their own purposes / needs.  

3. The ELT industry is an inter-locking hydra composed of publishers, teacher trainers, course providers and examination boards.  

4. All four heads of the ELT hydra focus on selling products: coursebooks and related materials, teacher training courses, courses of English, and exams.

5. Selling ELT products involves the reification of language learning. (Reification is changing abstract ideas into something real. For example, proficiency is changed into the CEFR scale. The abstract idea of language learning is changed into products for sale.)

6. The products of ELT are:

  • coursebooks and related materials;
  • training courses like CELTA and DELTA, and CPD courses offered by a host of teacher educators
  • EFL/ ESL courses like those offered by private outfits (International House, the British Council, Berlitz, etc.) and public schools across the world
  • Exams such as the IELTS, the Cambridge suite and TOEFL.

7. All these products suffer from the same weakness: they ignore the fact that learning an additional language (L2) is not the same as learning other subjects like geography or biology. They wrongly assume that knowing things about the L2 (e.g., in English, to form the 3rd person singular of the present tense of verbs, add an “s” to the infinitive) leads to the ability to use this knowledge for practical purposes.

8. Language is far too complex to be described by pedagogic grammars. As VanPatten says, what the textbook says about the present perfect is not a good description of what’s in the head of any learner.

9. Chomsky’s work is one attempt to describe linguistic knowledge (competence), Nick Ellis is working on another usage-based description. Both Chomsky’s and Nick Ellis’ attempts to describe linguistic knowledge indicate that it’s practically impossible to reduce it to something that can be taught to non-specialist linguists. Linguistics is a very specialized academic discipline that has almost nothing in common with ELT. English language teachers should appreciate that they’re not teaching linguistics – their job is to help their students use the L2. They are enabling agents, not content specialists.

10. Teaching students about the L2 doesn’t lead to an ability to use the L2, as shown by the experiences of the billions (sic) of students who, in the past hundred years, have been taught about an L2 and who ended up, after hundreds of hours of instruction, incapable of using it for any communicative purpose. Our knowledge about the languages we use is overwhelmingly unconscious: it is completely different to our knowledge of, for example, geography.

11. We learn an L2 by processing input. We get language input from the environment, from people who talk to us and those we listen to or read. We make sense of the input in our heads – in our brains, or minds, if the latter theoretical construct is allowed. All the processing that goes on in our brain / mind results in the development of what is referred to as “interlanguage” – a dynamic, non-linear representation of the L2 that gets increasingly closer to the way native speakers use it.

12. Regardless of their theoretical orientation (UG or usage-based), SLA scholars agree that interlanguage development goes through certain stages that are impervious to instruction. This leads to Pienemann’s teachability hypothesis, which says that you can’t teach students things about the language that they’re not ready to learn. Let me “up the anti”, as we poker players say, by suggesting that you can’t teach students anything worthwhile about how the L2 works through explicit instruction. I’m increasingly drawn to the view that teachers who dedicate classroom time to “transmitting” explicit knowledge about the L2 are wasting that time, which would be  better spent by giving students opportunities to use the L2 for themselves. VanPatten says that  most teachers’ explicit knowledge of English (or Spanish, or any other L2) isn’t worth trying to teach (i.e. pedagogic grammars are crap) and I take his point. We might see this as a recommendation that there be no – zero! – explicit teaching about the L2 in ELT syllabuses, and I confess that I incline to such a view. Mike Long and I had lots of discussions about this, and I think he was close to agreeing. It doesn’t rule out negative feedback, of course.    

13. Regardless of their theoretical orientation (UG or usage-based), SLA scholars agree that the development of learners’ interlanguages depends on unconscious, internal mental processes. Given the right opportunities – rich input and communicative exchanges – learners will work out for themselves how the L2 (its grammar and lexis and pragmatics) work. Simply put: learning an L2 is best done by “doing it”. Give students tasks where they learn by doing.

14. Learning by doing is the best, most efficacious way of organizing a course in English as an L2.  

15. ELT teacher trainers and educators have a duty to inform teachers about what we know about how people learn an L2. With a few notable exceptions, they fail miserably in that duty. They are part of the problem, they resist any attempt to rescue ELT from its current reliance on coursebooks and its benchmarks of proficiency. That’s hardly surprising because they’re embedded members of the status quo: they write coursebooks and materials, they design and teach courses for teachers, and they design and act as examiners for the high stake proficiency English exams.       

16. Coursebook-driven ELT commodifies language learning. It pushes relentlessly towards the packaging and sale of products like coursebooks, teacher development courses, English courses, and exams which are judged by commercial, not educational, standards. It  rides roughshod over what we know about language learning. Even judged by the declining standards of education in general, ELT is rightly seen as a pariah, an inefficacious disgrace to research-driven education.    

On Being a Troll

My only experience of being a troll was a few years ago on Twitter. I responded to a tweet by JPB Gerald, an inarticulate, wannabe wordsmith, who remarked that white people deserved to be left to rot in the misery of their own damned whiteness. I called his remark “crap” and he retorted by calling me a racist. Dozens of Gerald’s followers piled in. I angrily replied to some of them, and in a couple of days, having received hundreds (sic) of insulting tweets, I was quickly established as a racist troll. Racist because I, a privileged, white academic, had insulted a young, aspiring black academic, and a troll because I didn’t immediately apologise and shut up.  

Let’s be clear about what a troll is. A troll is a person who sets out to antagonize another by “deliberately posting inflammatory, irrelevant, or offensive comments or other disruptive content” (Merriam Webster). There’s the added ingredient of hounding or persistency – the “stalker” dimension of trollism. If we can agree on this rough definition, perhaps we can agree that there are two claases of troll:

  • Willing trolls (those who set out to deliberately offend and pursue the offence) and
  • Unwitting trolls – those who insist that they’ve been wrongly maligned and that subsequently they’re just trying to defend themselves.   

You won’t be surprised to hear that I see myself in the second category – an unwitting troll.

I was called a racist by Gerald, and I replied that the accusation was malignant and false. In various tweets and a blog post, I pointed out that I’d fought racism since I was old enough to perceive what racism was. When I was 20 years old, I risked my life fighting the South African apartheid government (I’m mentioned in ANC records). I was an active member of the Anti Aparheid movement and organized AA conferences at 6 different universities. I worked with the Young Lords in Chicago and the Black Panthers in California, where I was shot at by police and went to comrades’ funerals. I organized a special team, all members of a squat in Chalk Farm, that went with first generation British Jamaican immigrants to get their rightful payments at Social Security offices. I gave night classes to British Jamaican kids in that Chalk Farm squat – many got O levels, six went to university. I helped organize the very first Notting Hill Carnival; I went to Wolverhampton and called out Enoch Powell at a public meeting (reported in the national press). I’ve spent my whole adult life calling out racists, and I’ve got scars to prove it. To be called a racist by the self-promoting, deluded dickhead who calls himself JPB Gerald, a privileged, jumped-up buffoon, who I bet my hat has never done anything to risk his precious black skin while he sits in his NY apartment browsing real estate brochures and sending out an endless stream of narcissistic, please-send-me-money-so-I-can-keep-on-writing-about-tv-shows-and-ridding-the-world-of-whiteness tweets, is a calumny that I’ve yet to get over. I don’t know why I found his racist slurs so deeply offensive, but I did – there are still days when I wake up thinking about it all.

All very personal, but I think it will resonate with lots of readers who have felt deeply affected by being called trolls of one sort or another, and by the sheer volume, the crushing weight, of the consquent clamour.

When the attacks, sparked by Gerald, began, I hit back, calling Gerald a bullshitter, which is exactly what he is, but as a result I got bombarded with so much shit that I closed down my Twitter account.

Gerald was ecstatic: “We’ve closed him down!” he tweeted triumphantly.

And that’s an increasingly prevelant part of what calling people trolls is all about. There are, of course, people who deserve the dictionary defintion of them. But there are many others who feel that they’ve been unfairly called trolls by those who use it as an easy and powerful way of dealing with their critics.

 Schmidt’s Noticing Hypothesis: A summary


I’ve done a few posts on Schmidt’s Noticing Hypothesis, but here’s what I hope is a simple summary which untangles some of the strands in these previous discussions.

As usual, my main gripe is with those teacher educators who, as members of the establishment with a vested interest in coursebook-driven ELT, continue to fail in their duty to properly inform teachers about how people learn English as an L2.


Two types of knowledge are involved in SLA, and the main difference between them is conscious awareness. Explicit L2 knowledge is knowledge which learners are aware of and which they can retrieve consciously from memory. It’s knowledge about language. In contrast, implicit L2 knowledge is knowledge of how to use language and it’s unconscious – learners don’t know that they know it, and they usually can’t verbalize it. (Ignoring subtle differences, I take the terms Declarative and Procedural knowledge to mean the same as explicit and implicit knowledge of the L2.)

To use explicit knowledge, learners have to retrieve it from memory, which is a slow, effortful process unsuitable for quick and fluent language production. In contrast, learners access implicit knowledge unconsciously, a very quick process allowing for unplanned, fluent language production.

Today, there’s a consensus among SLA scholars that the main way people learn an L2 is unconsciously, when engaged in using the L2 for genuine communicative purposes. Implicit learning is seen as the “default” mechanism: it leads to implicit knowledge, which is automatically and quickly retrieved – the basic components of fluency – and more lasting because of the deeper entrenchment which comes from repeated activation. The findings from over 50 years of studies points to a pervasive implicit mode of learning, and to the limited role of explicit learning. Nobody challenges the important role that explicit knowledge can play in instructed SLA, but what most scholars firmly reject is the erroneous view adopted by most teachers that declarative knowledge is a necessary first step in learning an L2.

Adopting the view that declarative knowledge is a necessary first step in learning an L2 leads to the methodological principle that the first step in a course of English as an L2 is to describe/ demonstrate/ explain the L2, after which students should be given some opprtunities to practice it. This is turn requires chopping the L2 into “items” – like personal pronouns, the present tense, adjectives describing nouns, vowel sounds, and so on – and then dealing with them in a sequential order, in the way that coursebooks do.

Coursebook-driven ELT is demonstrably inefficacious: most of the hundreds of millions of people who do courses in English as an L2 fail to reach their objectives (see Jordan & Long (2022, for examples and sources). The reason for its inefficaciousness is simple: coursebook-driven ELT leads to teachers talking for most of classroom time about the L2, and affords students too few opportunities and too little time to learn by doing.

Teacher trainers and educators must accept a lot of the responsibility for the current deplorable state of ELT. With a few notable exceptions, they consistently fail in their duty to inform teachers about reliable findings of SLA research. Teacher trainees are told next to nothing about interlanguage development – about how what you teach is constrained by what students are ready to learn – and even less about the roles of implicit and explicit knowledge, because the teacher trainers have a personal, vested interest in established practices – they write coursebook series, they write “How to Teach” books, they design, run and assess SLTE courses, they work as examiners for high stakes proficiency exams, and they give the keynotes and plenaries at the yearly round of conferences.    

To the issue, then.   

One part of SLA research that does get talked about by teacher educators is Schmidt’s Noticing Hypothesis, which, the educators say, “proves” that consciously ‘noticing’ formal features of L2 input is a necessary condition for learning. This, in turn, “proves” that basing ELT on using coursebooks where prime place is given to the explicit teaching of grammar points, collocations, pronunciation features, etc., is justified not just by convenience but by support from SLA research.

The Noticing Hypothesis

Schmidt’s extremely influential hypothesis was first formulated in 1990. Attempting to resolve the confusion he describes in talk about conscious and unconscious learning and previous attempts to define input and intake, he defines input as the sensory language stimuli the learner gets from the environment and intake as:

that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139).

The implication of this is that:

subliminal language learning is impossible, and that noticing is the necessary and sufficient condition for converting input into intake (Schmidt, 1990:  130).

The only study mentioned by Schmidt in support of his hypothesis is by Schmidt and Frota (1986) which examined Schmidt’s own attempts to learn Portuguese, and found that his notes matched his output quite closely.  Schmidt himself admits that the study does not show that noticing is sufficient for learning, or that noticing is necessary for intake.  Nevertheless, Schmidt does not base himself on this study alone; there is, Schmidt claims evidence from a wider source:

… the primary evidence for the claim that noticing is a necessary condition for storage comes from studies in which the focus of attention is experimentally controlled. The basic finding, that memory requires attention and awareness, was established at the very beginning of research within the information processing model (Schmidt, 1990: 141).

In brief, Schmidt’s claim is that “intake” is the sub-set of  input which is noticed, and that the parts of input that aren’t noticed are lost. Thus, Schmidt’s Noticing Hypothesis, in its 1990 version, claims that noticing is the necessary condition for learning an L2. ‘Noticing’ is the first stage of the process of converting input into both explicit and implicit knowledge. It takes place in short-term memory and it is triggered by the following factors: instruction, perceptual salience, frequency, skill level, task demands, and comparing.

Criticisms of Schmidt’s hypothesis:

1.  It fails to distinguish carefully enough between attention and awareness

In reply to Schmidt’s argument that attention research supports the claim that consciousness is necessary for learning, Truscott (1998) points out that such claims are “difficult to evaluate and interpret”. He cites a number of scholars and studies to support the view that the notion of attention is “very confused”, and that it’s “very difficult to say exactly what attention is and to determine when it is or is not allocated to a given task. Its relation to the notoriously confused notion of consciousness is no less problematic”. He concludes (1998, p. 107) “The essential point is that current  research and theory on attention, awareness and learning are not clear enough to  support any strong claims about relations among the three.”

2.  Empirical support for the Noticing Hypothesis is weak

  • Truscott (1998) points out that the reviews by Brewer (1974) and Dawson and Schell (1987), cited by Schmidt, 1990), dealt with simple conditioning experiments and that, therefore, inferences regarding learning an L2 were not legitimate. Brewer specifically notes that his conclusions do not apply to the acquisition of syntax, which probably occurs ‘in a relatively unconscious, automatic fashion’ (p. 29).
  • Truscott further points out that while most current research on unconscious learning is plagued by continuing controversy, “one can safely conclude that the evidence does not show that awareness of the information to be acquired is necessary for learning” (p. 108).
  • Altman (1990) gathered data in a similar way to Schmidt (1986) in studying her learning of Hebrew over a five-year period. Altman found that while half her verbalisation of Hebrew verbs could be traced to diary entries of noticing, it was not possible to identify the source of the other half and they may have become intake subconsciously.
  • Alanen’s (1992) study of Finnish L2 learning found no significant statistical difference between an enhanced input condition group and the control group.
  • Robinson’s (1997) study found mixed results for noticing under implicit, incidental, rule-search and instructed conditions.

3. Studies of ‘noticing’ suffer from serious methodological problems   

  • The subsequent studies are not comparable due to variations in focus and in the conditions operationalized.
  • The level of noticing in the studies may have been affected by variables which casts doubt on the reliability of the findings.
  • Cross (2002) notes that “only Schmidt and Frota’s (1986) and Altman’s (1990) research considers how noticing target structures positively relates to their production as verbal output (in a communicative sense), which seems to be the true test of whether noticing has an effect on second language acquisition. A dilemma associated with this is that, as Fotos (1993) states, there is a gap of indeterminate length between what is noticed and when it appears as output, which makes data collection, analysis and correlation problematic.”
  • Ahn (2014) points to a number of problems that have been identified in eye-tracking studies, especially those using heat map analyses. (See Ahn (2014) for the references that follow.) Heat maps are only “exploratory” (p. 239), and they cannot provide temporal information on eye movement, such as regression duration, “the duration of the fixations when the reader returns to the lookzone” (Simard & Foucambert, 2013, p. 213), which might tempt researchers to rush into a conclusion that favors their own predictions. Second, as Godfroid et al. (2013) accurately noted, the heat map analyses in Smith (2012) could not control the confounding effects of “word length, word frequency, and predictability, among other factors” (p. 490). This might have yielded considerable confounding effects as well. As we can infer from the analyses shown in Smith (2012), currently the utmost need in the field is for our own specific guidelines for using eye-tracking methodology to conduct research focusing on L2 phenomena (Spinner, Gass, & Behney, 2013). Because little guidance is available, the use of eye tracking is often at risk of misleading researchers into making unreliable interpretations of their results.
  • Think aloud protocols are also questioned, since perhaps thinking aloud itself can affect learners’ cognitive processes.


In reply to the criticisms of his 1990 paper, Schmidt re-formulated his Noticing Hypothesis in 2001. He begins this paper by saying that to minimise confusion, he will use ‘noticing’ as a technical term equivalent to what Gass (1988) calls  “apperception”, what Tomlin and Villa (1994) call “detection within selective attention,” and what Robinson (1995) calls “detection plus rehearsal in short term memory.”  Crucially, what is noticed are now “elements of the surface structure of utterances in the input, instances of language” and not “rules or principles of which such instances may be exemplars”. Noticing does not refer to comparisons across instances or to reflecting on what has been noticed. This retreat recognizes the fact, pointed out most forcefully by Gregg (see below), that you can only notice what is in the environment, and abstract rules of grammar are evidently not in the environment, thus not part of the input.  

Furthermore, in the section “Can there be learning without attention?”, Schmidt admits there can, with the L1 as a source that helps learners of an L2 being an obvious example. Schmidt says that it’s “clear that successful second language learning goes beyond what is present in input”. Schmidt presents evidence which, he admits, “appears to falsify the claim that attention is necessary for any learning whatsoever”, and this prompts him to propose the weaker version of the Noticing Hypothesis, namely “the more noticing, the more learning”.

Problems remain.

Apperception and Detection

As was mentioned, Schmidt (2001) says that he is using ‘noticing’ as a technical term equivalent to Gass’ apperception. True to dictionary definitions of apperception, Gass defines apperception as “the process of understanding by which newly observed qualities of an object are initially related to past experiences”. The light goes on, the learner realises that something new needs to be learned. It’s “an internal cognitive act in which a linguistic form is related to some bit of existing knowledge (or gap in knowledge)”. It shines a spotlight on the identified form and prepares it for further analysis. This surely clashes with Schmidt’s insistence that noticing does not refer to comparisons across instances or to reflecting on what has been noticed, and in any case, it is not at all clear how the subsequent stages of Gass’ model convert apperceptions into implicit knowledge of the L2 grammar.

Schmidt says that ‘noticing’ is also equivalent to what Tomlin and Villa (1994) call “detection within selective attention.” But, I suggest, ‘noticing’ isn’t at all equivalent to what Tomlin and Villa are talking about about, viz.: detection WITHOUT awareness. According to Tomlin and Villa, the three components of attention are alertness, orientation, and detection, but only detection is essential for further processing, thus   awareness plays no important role in L2 learning.

More Concessions

In the third, 2010, paper, Schmidt confirms the concessions which amount to saying that ‘noticing’ is not needed for all L2 learning and he also confirms that noticing does not refer to reflecting on what is noticed. In the 2010 paper we finally get a glimpse of an answer to Gregg’s crucial question about how we get from ‘noticing’ to the acquisition of linguistic competence, when Schmidt deals with Suzanne Carroll’s objection to his hypothesis. Schmidt succinctly summarises Carroll’s view that attention to input plays little role in L2 learning because most of what constitutes linguistic knowledge is not in the input to begin with. She argues that Krashen, Schmidt and Gass all see “input” as observable, sensory stimuli in the environment, but then proceed to claim that such stimuli allow formal aspects of the language to be noticed,

whereas in reality the stuff of acquisition (phonemes, syllables, morphemes, nouns, verbs, cases, etc.) consists of mental constructs that exist in the mind and not in the environment at all. If not present in the external environment, there is no possibility of noticing them.

Schmidt’s answer is:

In general, ideas about attention, noticing, and understanding are more compatible with instance-based, construction-based and usage-based theories (Bley-Vroman, 2009; Bybee & Eddington, 2006; Goldberg, 1995) than with generative theories.

Which is not much better than no answer at all. Carroll effectively answers Gregg’s question by saying that all those who start with input, following Krashen, get things backwards. Carroll (2001, p. 11) says:

The view that input is comprehended speech is mistaken  and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!” 

Learners do not attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal. Carroll’s assertion that it is possible to have speech-signal processing without attention-as-noticing or attention-as-awareness is persuasive. She argues that learners may unconsciously and without awareness detect, encode and respond to linguistic sounds; that learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; that the processing of forms and meanings are often not noticed; and that attention is the result of processing not a prerequisite for processing.

In brief:

  1. The Noticing Hypothesis even in its amended version does not clearly describe the construct of ‘noticing’.
  2. The empirical support claimed for the Noticing Hypothesis is not as strong as Schmidt (2010) claims.
  3. A theory of SLA based on noticing a succession of forms faces the impassable obstacle that, as Schmidt seemed to finally admit, you can’t notice rules or principles of grammar.
  4. “Noticing the gap” is not sanctioned by Schmidt’s amended Noticing Hypothesis.
  5. The way that so many writers and ELT trainers use “noticing” to justify all kinds of explicit grammar and vocabulary teaching demonstrates that Scmidt’s Noticing Hypothesis is widely misunderstood and misused.


Schmidt’s Noticing Hypothesis is probably the most widely-cited reference (usually only its 1990 formulation) used by teacher educators who continue to promote coursebook-driven ELT.  “Noticing” has been reduced from a vexed theoretical construct to a term used in its ordinary canonical dictionary definition and widened to include “noticing the gap” (even though Schmidt himself ruled this out).

It’s time we recognized the severe limitations of Schmidt’s hypothesis, and the damage that its misrepresentation causes. We must move on, digest the implications of SLA research findings, and adopt analytic syllabuses such as Dogme and strong forms of TBLT, where ELT is based on implicit learning, learning by doing. In such syllabuses, the important, but subsidiary role of explicit instruction, used reactively to provide feedback to students’ performance of meaning-focused tasks, needs further refinement.   


Ahn, J.I. (2014) Attention, Awareness, and Noticing in SLA: A Methodological Review.  MSU Working Papers in SLS, Vol. 5.

Carroll, S. (2001) Input and Evidence: The Raw Material of Second Language Acquisition. Amsterdam, Benjamins.

Cross, J. (2002) ‘Noticing’ in SLA: Is it a valid concept? Downloaded from  http://tesl-ej.org/ej23/a2.html

Gregg, K. 2003). The State of Emergentism in Second Language Acquisition. Second Language Research. 19. 95-128. See also Gregg’s comments on my blog posts on Schmidt and Carroll.

Jordan, G. & Long, M. (2022) English Language Teaching: Now and How it Could be. Cambridge Scholars.

Schmidt,R.W. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. and Frota, S.N.  (1986) Developing  basic  conversational  ability in  a  second language:  a  case  study of an adult learner of Portuguese . In Day , R.R., editor,  Talking to learn: conversation in second language acquisition. Rowley, MA: Newbury.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Review of JPB Gerald (2022) “Antisocial Language Teaching”


The main problem with Gerald’s book is its lack of clarity: part autobiography, part political pamphlet, part college essay, part post-graduate academic assignment, it’s discussion of whiteness and language teaching lurches from one genre to another, and from one topic to another, rarely retaining its style or focus long enough to offer any clear descriptions or analyses of the motley matters it so unevenly and erratically tries to cover. There’s no clear history of colonialism or the slave trade offered here, no clear description or discussion of racism, or capitalism, or linguistics, or education, or anything else that might clarify what, precisely, whiteness is, or what specifically is wrong with current language teaching. Despite his claim to see things from a fresh, new “angle”, everything in the text that deals with whiteness has already been dealt with more eruditely, insightfully, and, above all, more clearly, by previous writers. As for language teaching, Gerald’s ignorant, disorganised and superficial discussion makes his publisher’s claim that the book offers “a vision for a more just version of ELT” comically preposterous.  

In what follows, I quote from my e-book verson of the book, so I apologise for not being able to give page numbers for the quotes.

The Writing

Part of the book’s lack of clarity is due to the poor writing, and a particular feature of this is that many sentences and paragraphs of the text dissolve into incoherence, as if they’ve suddenly fallen off a cliff. I’ll give three examples.

Example 1:

The Introduction begins with a discussion of what Tucker Carlson, a Fox News presenter, says about ‘antisocial thugs with no stake in society”. Following a summary of Carlson’s views, Gerald comments:

He [Carlson] and his writers are not espousing some fringe viewpoint but instead emphasizing a core tenet of his popular ideology, namely the fact that decentralized resistance and opposition to the hegemony of whiteness is anathema to what he refers to as ‘society’, and the common elision of Blackness and criminality as expressed via his use of the word ‘thug’ (Smiley & Fakunle, 2016). As odious as his ideas are to many who might be reading this work, Carlson is not speaking out of turn when compared to the epistemology and the ideology of the whiteness that retains a firm grip on the globe.

The first, typically bloated sentence can be paraphrased: Carlson believes that all resistance to white supremacy is an attempt to destroy American society, but the second sentence simply can’t be rescued – it’s gibberish.

Example 2:

Twenty or so pages later, we eventually arrive at this clumsy attempt to explain what the book’s about:   

Simply put, this book exists to make the case for why it is a moral imperative that ELT severs its ties to whiteness once and for all, and for the bright future that could follow if we ever manage to demolish this structure inside of which we are all trapped.

Well, simply put, This book argues that ELT must sever all ties with whiteness. But what are all the other words doing? The whole awful sentence is incoherent. (Note, by the way, that throughout the text, “Simply put” and “In short” often introduce long, rambling sentences which collapse under their own weight.)

Example 3:

In short, the concept of ‘society’, against which antisocial and other ‘disordered’ behavior is measured, is merely a mask for whiteness, and considering that the epistemology responsible for these diagnostic criteria is itself an exemplar of whiteness, it is difficult to trust whiteness as an objective judge of what is and is not antisocial.

Again, whatever the assertion that the white supremacist concept of society is a mask for whiteness against which antisocial and disordered behaviour are measured might mean, it certainly doesn’t warrant the rest of the sentence. What epistemology is he talking about?

The word ‘epistemology’, occurs sixteen times in the text, and seems to be used to mean something like “system of beliefs” or “ideology”. In the two quotes above, Gerald talks about  “the epistemology and the ideology” of whiteness, and “the epistemology” that is “an exemplar of whiteness”.  Elsewhere he talks about the “epistemological analysis” of axes of oppression; a “fuzzy epistemology” reliant on “race scholarship”; and his own trips down different “epistemological corridors”, to take just three more examples. Epistemology is, of course, the branch of philosophy that deals with theories of knowledge, and nowadays the big debate is between realists, who assume that’s there’s a world out there independent of our experiences of it, which can be more or less accurately observed and described, and relativists who deny the realists’ claim. I suggest that Gerald’s use of the word ‘epistemology” has little to do with this normal use of the word, and, furthermore, that it’s just one example of his failure to clearly define a profusion of key terms and constructs, or to use them consistently. Gerald is like Lewis Carroll’s Humpty Dumpty: when he uses a word, it means anything he chooses it to mean.

The Content

Let’s turn to the content. After the prologue (which gives good warning that the book’s about JPB Gerald, really), the Introduction gives its first sketch of “society” as seen by white supremacists, and ends with a section on “Key Concepts”. These include


… the combination of racial discrimination and societal oppression. Anyone can experience the former, but only certain people can experience the combination of the two. For example, as a Black person, I could tell you I don’t want to have any white friends, and that would absolutely be discriminatory, but because I do not have the full power of society behind me, and because that would not materially impact the people I denied my friendship, it does not qualify.

and whiteness    

there is no functional difference between whiteness and white supremacy. Indeed, whiteness, as a concept, was created to justify colonialism and chattel slavery (Bonfiglio, 2002; Painter, 2011); there had to be a group exempt from these horrors, and as such, whiteness was codified. Whiteness was created to be supreme, as a protection from the oppression that others deserve because of the groups into which they have been placed.

Gerald proceeds to give an overview of the book. Part One deals with Disorder.  

In short, whiteness requires people to be categorized as either ordered or disordered so that it can function effectively and to support its aims of colonialist dominance and capitalism . Accordingly, whiteness uses language ideologies and language teaching to classify Blackness, dis / ability and unstandardized English as representations of pathology and disorder, and is thus able to justify its exploitation and oppression of members of these groups.

In Part 2 , Gerald “demonstrates” how “the field of ELT and its adherence to whiteness” has led to “pervasive oppression”. To do so, he maps its “harmful habits” onto the official criteria for antisocial personality disorder, “not to stigmatize the disorder but to counterpathologize whiteness and the destruction it causes”.

Finally, Part 3 discusses how language teachers can “play a central role in the demolition of whiteness in our field and in our society”.

Part One

Part One begins with an attempt to define whiteness. In essence, Gerald sees it as “The Great Pyramid Scheme”.

When I thought of the best way to describe whiteness and the way it had been sold to me, despite rarely being named as such , I consulted the numerous metaphors that have been used in the literature, many of which remain accurate and resonant, many of which I will cite below. But, in my opinion, when searching for the best way to evoke the sheer confidence game at play, one that empowers a few while convincing the masses that their own power is waiting just around the corner so long as they convince everyone they know to also buy in, I could think only of the sad stories I’ve encountered of friends and acquaintances who were convinced to buy thousands of dollars of terrible products that they could never offload to others.

If I understand him, Gerald sees whiteness as the ultimate Ponzi scheme. He goes on:

Simply put, whiteness is perhaps the world’s greatest example of multilevel marketing, a massive pyramid scheme, but unlike the companies stealing from put – upon individuals and families, there is no single chief executive officer ( CEO ) laughing all the way to the bank. At this point, whiteness feeds upon all of us, including the people who bow before it, and it creates no victors, only a desperate battle to avoid losing.

For the next few pages, Gerald relies on Painter’s (2011) work, starting with his description of pre-industrial societies. Gerald comments:

There are the beginnings of a constructed hierarchy visible in this description, of course, but oppression based on group membership did not originate with the construction of whiteness – it simply had a different manifestation. Much later on, even after slavery was common in Europe, ‘Geography, not race, ruled, and potential white slaves, like vulnerable aliens everywhere, were nearby for the taking’ (Painter, 2011: 38). People with power have always exploited those without it, and it would be inaccurate to blame whiteness for what is clearly an upsettingly human tendency.

Yet again, for the umpteenth time, the paragraph starts with turgid platitudes and ends by falling off a cliff. In this instance, the question isn’t “What does it mean?”, but rather “Does it mean anything worth saying?”. “People with power have always exploited those without it”. Wow! Really? And “it would be inaccurate to blame whiteness for what is clearly an upsettingly human tendency”. No! You don’t say! All through the book, the reader is assalted by tired, lazy, cliché-ridden bullshit. It clutters this guess-where-we’re going narrative like stubborn mud on a badly-signalled track. Endless bits of bathos clog the narrative’s fragile wheels, which just can’t bear the strain of trying in vain to drive forward the heavy, jargon-ridden, one-subordinate-clause-too-many, turgid text.

But it gets worse, as Gerald sets off on a heated selective history of the slave trade, colonialism, eugenics and accounts of assorted atrocities. For example:

In what became the United States, Europeans brought disease alongside their ships, but smallpox didn’t succeed in eliminating everyone, so they were forced to remove them directly in order to control their land (Wolfe, 2006).

I presume he means “along with”, not “alongside”, but who knows. In any case, it’s not the most erudite description of colonialism you’ll ever read.

The last chapter of Part 1 has the heading “Language Teaching as an Instrument of Pathologization”, but it’s actually an attempt to answer the question “So what have we learned so far?”. Well, Whiteness is “a Pyramid Scheme”; it “Justifies Settler Colonialism and Racial Capitalism”; it “Created Blackness out of its Own Darkest Impulses”; it “Dis/abled Blackness to Ensure its Subjugation”; it “Uses Perceived Deficits in Ability, Intelligence and Language to Retain Power”; and it “Devalues Unstandardized English because it Devalues the Racialized”.

Part 2

On the publisher’s web page dedicated to the book, Gerald explains that in Part 2,  

As a rhetorical device, I use the diagnostic criteria of the Diagnostic and Statistical Manual of Mental Disorders to make the point that the way our field was built and is currently maintained could be classified as deeply disordered and only isn’t because of who currently benefits from the system as is; more specifically, I map the seven criteria of antisocial personality disorder onto the connection between whiteness, colonialism, capitalism, and ableism and how these and other -isms harm the vast majority of the students – and educators – in the field of language teaching.

There are thus seven chapters in Part 2. They deal with

  1. Failure to conform to social norms concerning lawful behaviour
  2. Deceitfulness, repeated lying, use of aliases, or conning others for pleasure or personal profit
  3. Impulsivity or failure to plan
  4. Irritability and aggressiveness, often with physical fights or assaults
  5. Reckless disregard for the safety of self or others
  6. Consistent irresponsibility, failure to sustain consistent work behavior, or honor monetary obligations
  7. Lack of remorse, being indifferent to or rationalizing having hurt,mistreated or stolen from another person

As indicated above, Gerald tries to map these 7 criteria onto whiteness, etc., so as to highlight the harm done to nearly everybody in the field of language teaching. Personally, I don’t think this is a very successful rhetorical device; the only reason I can see for using these seven criteria to organize a discussion of how whiteness affects ELT is that, since Gerald himself has been diagnosed with a mental disorder, it gives him a more authoritative “voice”. Unfortunately, it doesn’t give the text clarity. Among the pages, there are interesting ideas trying to get out. Gerald has justified concerns about native speakerism, persistent language deficit views in education, ongoing linguistic imperialism, hopelessly unfit-for-purpose assessment procedures, and on and on. He also indicates where he stands on debates about multilingualism, additive bilingualism, translanguaging, and other issues. But his concerns and his viewpoint are expressed in a text which too often lacks both coherence and cohesion. Ideas trip over themselves, there’s too much verbose hyperbole and too little attention to developing an argument through the use of clearly-defined constructs and well-chosen cohesive devices.

Part 3

Part 3 has two chapters: the first discusses a teacher education course, and the second makes seven nebulous recommendations on how to improve ELT. Dealing with the second chapter first, it suggests that we call ourselves teachers of standardized English (TSE) instead of English language teachers, that teacher education includes a “deep engagement” with all the issues raised, that we use better materials (improved by a similar deep engagement with the book’s message), and a few more anodyne bits of fluff.

Chapter 1, the Ezel Project, describes a teacher training course, and it is, in my opinion, by far the best chapter in the book (I should say at once that I don’t like the style, but that doesn’t matter, because, for a while, the text is quite readable. Perhaps this chapter is adapted from Gerald’s doctoral thesis, and has thus benefited from his supervisor’s influence.) Gerald gives a detailed account of the design and implementation of his course, which aims to raise teacher awareness of the damage caused by continued racism and white supremacy, and follows it with interesting accounts of how some of the participants reacted.  


It seems clear to me that all English language teachers should accept the following tenets:

  • English acts as a lingua franca and as a powerful tool to protect and promote the interests of a capitalist class.
  • In the global ELT industry, teaching is informed by the monolingual fallacy, the native speaker fallacy and the subtractive fallacy (Phillipson, 2018).   
  • The ways in which English is privileged in education systems needs critical scrutiny, and policies that strengthen linguistic diversity are needed to counteract linguistic imperialism.
  • Racism permeates ELT. It results in expecting language-minoritized students to model their linguistic practices on inapproriate white speaker norms.
  • ELT practice must acknowledge bilingual’s fluent languaging practices and legitimise hybrid language uses.
  • ELT must encourage practices which explore the full range of users’ repertoires in creative and transformative ways.
  • Subtractive approaches to language education and deficit language policies must be resisted.

From what I’ve read, I’d say that there are many articles – Ian Cushing’s work, for example, always has a good references section – that deal with these issues much better than Gerald’s book does.

As for Gerald’s assorted assertions about the global ELT industry, they demonstrate a poor grasp of the literature on language learning, syllabus design, pedagogic procedures, assessment, and language policy. The one reference Gerald makes to SLA research is tellling. He says

the related field of second language acquisition expends considerable effort on boiling its namesake process down to formulas that hardly take the individuals and their identities involved into account, rendering any supposed struggles a more personal failing than they truly are.

Of course psycholinguistic studies of the SLA process include studies of the individual psychology of learners – their different feelings and responses to different types of tasks, their motivation, their perceptions of their L2 selves, and their anxieties, for example. To assert that SLA research makes a special effort to boil the process of L2 learning down to formulas is ridiculuous and false. And note that we have yet another example of a sentence whose last clause renders it incoherent: it makes the nonsensical claim that by reducing the process of L2 learning to formulas, and paying little attention to the learners’ identities, SLA scholars make “any supposed struggles a more personal failing than they truly are”.

Radical action is needed to combat racism and white supremacy in ELT, but that action needs to be part of a critique of ELT which includes a clear, comprehensive, practical alternative. Gerald aligns himself with teacher educators like Vaz Bauler and academics like García who adopt a relativist epistemology (sic) and embrace an “anything goes” approach to teaching, where error correction and assessment are seen as “harmful”, and where the construct of “language” itself is illegitimate. For these sociolinguistic vanguardistas, bilingual students’ language practices must not be separated into home language and school language (there are no such thing as “distinct language systems”), the construct of transfer must be abandoned, and in its place we must put “a conceptualization of integration of language practices in the person of the learner” (García & Wei, 2014, p. 80). The value of these arguments is dubious enough when they’re propounded by their authors, and I certainly don’t trust Gerald’s garbled version of them.   


Garcia, O. & Wei, L. (2014). Translanguaging: Language, Bilingualism, and Education. Palgrave MacMillan.

Gerald, JPG. (2022). Antisocial Language Learning: English and the Pervasive Pathology of Whiteness. Multilingual Matters.

Phillipson, R. (2018). Linguistic Imperialism. Routledge.

Empiricist Emergentism


Emergentism is an umbrella term referring to a fast growing range of usage-based theories of SLA which adopt “connectionist” and associative learning views, based on the premise that language emerges from communicative use. Many proponents of emergentism, not least the imaginative Larsen-Freeman, like to begin by pointing to the omnipresence of complex systems which emerge from the interaction of simple entities, forces and events. Examples are:

The chemical combination of two substances produces, as is well known, a third substance with properties different from those of either of the two substances separately, or both of them taken together. Not a trace of the properties of hydrogen or oxygen is observable in those of their compound, water. (Mill 1842, cited in O’Grady, 2021).

Bee hives, with their carefully arranged rows of perfect hexagons, far from providing evidence of geometrical ability in bees actually provides evidence for emergence – The hexagonal shape maximizes the packing of the hive space and the volume of each cell and offers the most economical use of the wax resource… The bee doesn’t need to “know” anything about hexagons. (Elman, Bates, Johnson, Karmiloff-Smith Parisi & Plunkett, 1996, cited in O’Grady, 2021).

Larsen-Freeman’s own favorite is a murmuration of starlings, as in the photo, above. In her plenary at the IATEFL 2016 conference, the eminent scholar seemed almost to float away herself, up into the rafters of the great hall, as she explained:

Instead of thinking about reifying and classifying and reducing, let’s turn to the concept of emergence – a central theme in complexity theory. Emergence is the idea that in a complex system different components interact and give rise to another pattern at another level of complexity.

A flock of birds part when approached by a predator and then they re-group. A new level of complexity arises, emerges, out of the interaction of the parts.

All birds take off and land together. They stay together as a kind of superorganism. They take off, they separate, they land, as if one.

You see how that pattern emerges from the interaction of the parts?

Personally, I fail to grasp the force of this putative supporting evidence for emergentism, which strikes me as unconvincing, not to say ridiculous. I find the associated claim that complex systems exhibit ‘higher-level’ properties which are neither explainable, nor predictable from ‘lower-level’ physical properties, but which, nevertheless have causal and hence explanatory efficacy slightly less ridiculous, but still unconvincing, and surely hard to square with empiricist principles. So, moving quickly on, let’s look at emergentist theories of language learning. Note that the discussion is mostly of Nick Ellis’ theory of emergentism, which he applies to SLA.

What Any Theory of SLA Must Explain

Kevin Gregg (1993, 1996, 2000, 2003) insists that any theory of SLA should do two things: (1) describe what knowledge is acquired (a property theory describing what language consists of and how it’s organised), and (2) explain how that knowledge is acquired (a causal transition theory ). Chomsky’s principles and parameters theory offers a very technical description of “Universal Grammar”, consisting of clear descriptions of grammar principles which make up the basic grammar of all natural languages, and the parameters which apply to particular languages. It describes what Chomsky calls “linguistic competence” and it has served as a fruitful property theory guiding research for more than 50 years. How is this knowledge acquired? Chomsky’s answer is contained in a transition theory that appeals to an innate representational system located in a module of the mind devoted to language, and by innate mechanisms which use that system to parse input from the environment, set parameters, and learn how the particular language works.

But UG has come under increasing criticism. Critics suggest that UG principles are too abstract, that Chomsky has more than once moved the goal posts, that the “Language Acquisition Device is a biologically implausible “black box”, that the domain is too narrow, and that we now have better ways to explain the phenomena that UG theory tackles. Increasingly, emergentist theories are regarded as providing better explanations.

Emergentist theories

There is quite a collection of emegentist theories, but we can distinguish between emergentists who rely on associative learning, and those who believe that “achieving the explanatory goals of linguistics will require reference to more just transitional probabilities” (O’Grady, 2008, p. 456). In this first post, I’ll concentrate on the first group, and refer mostly to the work of its leading figure, Nick Ellis. The reliance on associative learning leads to this group often being referred to as “empiricist emergentists”.

Empiricist emergentists insist that language learning can be satisfactorily explained by appeal to the rich input in the environment and simple learning processes based on frequency, without having to resort to abstract representations and an unobservable “Language Acquisition Device” in the mind.

Regarding the question of what knowledge is acquired, the emergentist case is summarised by Ellis & Wulff (2020, p. 64-65).

The basic units of language representation are constructions. Constructions are pairings of form and meaning or function. Words like squirrel are constructions: a form — that is, a particular sequence of letters or sounds — is conventionally associated with a meaning (in the case of squirrel, something like “agile, bushy-tailed, tree-dwelling rodent that feeds on nuts and seeds)”.

In Construction Grammar, constructions, are wide-ranging. Morphemes, idiomatic expressions, and even abstract syntactic frames are constructions:

sentences like Nick gave the squirrel a nut, Steffi gave Nick a hug, or Bill baked Jessica a cake all have a particular form (Subject-Verb-Object-Object) that, regardless of the specific words that realize its form, share at least one stable aspect of meaning: something is being transferred (nuts, hugs, and cakes).

Furthermore, some constructions have no meaning – they serve more functional purposes;

passive constructions, for example, serve to shift what is in attentional focus by defocusing the agent
of the action (compare an active sentence such as Bill baked Jessica a cake with its passive counterpart A cake was baked for Jessica).


constructions can be simultaneously represented and stored in multiple forms and at various levels of abstraction: table + s = tables; [Noun] + (morpheme -s) = “plural things”). Ultimately, constructions blur the traditional distinction between lexicon and grammar. A sentence is not viewed as the application of grammatical rules to put a number of words obtained from the lexicon in the right order; a sentence is instead seen as a combination of constructions, some of which are simple and concrete while others are quite complex and abstract. For example, What did Nick give the squirrel? comprises the following constructions:

• Nick, squirrel, give, what, do constructions
• VP, NP constructions
• Subject-Verb-Object-Object construction
• Subject-Auxiliary inversion construction

We can therefore see the language knowledge of an adult as a huge warehouse of constructions.

As to language learning, it is not about learning abstract generalizations, but rather about inducing general associations from a huge collection of memories: specific, remembered linguistic experiences.

The learner’s brain engages simple learning mechanisms in distributional analyses of the exemplars of a given form-meaning pair that take various characteristics of the exemplar into consideration, including how frequent it is, what kind of words and phrases and larger contexts it occurs with, and so on” (Ellis & Wulff, 2020, p. 66).

The “simple learning mechanisms” amount to associative learning. The constructions are learned through “the associative learning of cue-outcome contingencies” determined by factors relating to the form, the interpretation, the contingency of form and function; and learner attention. Language learning involves “the gradual strengthening of associations between co-occurring elements of the language”, and fluent language performance involves “the exploitation of this probabilistic knowledge” (Ellis, 2002, p. 173). Based on sufficiently frequent cues pairing two elements in the environment, the learner abstracts to a general association between the two elements.

Here’s how it works:

When a learner notices a word in the input for the first time, a memory is formed that binds its features into a unitary representation, such as the phonological sequence /wʌn/or the orthographic sequence one. Alongside this representation, a so-called detector unit is added to the learner’s perceptual system. The job of the detector unit is to signal the word’s presence whenever its features are present in the input. Every detector unit has a set resting level of activation and some threshold level which, when exceeded, will cause the detector to fire. When the component features are present in the environment, they send activation to the detector that adds to its resting level, increasing it; if this increase is sufficient to bring the level above threshold, the detector fires. With each firing of the detector, the new resting level is slightly higher than the previous one—the detector is primed. This means it will need less activation from the environment in order to reach threshold and fire the next time. Priming events sum to lifespan-practice effects: features that occur frequently acquire chronically high resting levels. Their resting level of activation is heightened by the memory of repeated prior activations. Thus, our pattern-recognition units for higher-frequency words require less evidence from the sensory data before they reach the threshold necessary for firing. The same is true for the strength of the mappings from form to interpretation. Each time /wʌn/ is properly interpreted as one, the strength of this connection is incremented. Each time /wʌn/ signals won, this is tallied too, as are the less frequent occasions when it forewarns of wonderland. Thus, the strengths of form-meaning associations are summed over experience. The resultant network of associations, a semantic network comprising the structured inventory of a speaker’s knowledge of language, is tuned such that the spread of activation upon hearing the formal cue /wʌn/ reflects prior probabilities of its different interpretations (Ellis & Wulff, 2020, p. 67).

The authors add that other additional factors need to be taken into account, and this one is particularly important:

..… the relationship between frequency of usage and activation threshold is not linear but follows a curvilinear “power law of practice” whereby the effects of practice are greatest at early stages of learning, but eventually reach asymptote.

Evidence supporting this type of emergentist theory is said to be provided by IT models of associative learning processes in the form of connectionist networks. For example, Lewis & Elman’s (2001) demonstration that a Simple Recurrent Network (SRN) can, among other things, simulate the acquisition of agreement in English from data similar to the input available to children, and the connectionist model reported in Ellis and Schmidt’s 1997 and 1998 papers is another.


There have been various criticisms of the empiricist version of emergentism as championed by Ellis, and IMHO, the articles by Eubank & Gregg (2002), and Gregg (2003) remain the most acute. I’ll use them as the basis for what follows.

a) Linguistic knowledge

Regarding their description of the linguistic knowledge acquired, Gregg (2003) points out that emergentists are yet to agree on any detailed description of linguistic knowledge, or even whether such knowledge exists. The doubt about whether or not there’s any such thing as linguistic knowledge is raised by extreme empiricists, such as the logical positivists and behaviourists discussed in my last post, and also the eliminativists involved in connectionist networks, who all insist that the only knowledge we have comes through the senses, representational knowledge of the sort required to explain linguistic competence is outlawed. Ellis and his colleagues don’t share the views of these extremists; they accept that linguistic representations – of some sort or other – are the basis of our language capacity, but they reject any innate representations, and therefore, they need to not just describe the organisation of the representations, but also to explain how the representations are learned from input from the environment.

O’Grady (2011) agrees with Gregg about the lack of consensus among emergentists as to what form linguistic knowledge takes; some talk of local associations and memorized chunks (Ellis 2002), others of a construction grammar (Goldberg 1999, Tomasello 2003), and others of computational routines (O’Grady 2001, 2005). Added to a lack of consensus is a lack of clarity and completeness. O’Grady’s discussion of Lewis & Elman’s (2001) Simple Recurrent Network (SRN), mentioned above, explains how it was able to mimic some aspects of language acquisition in children, including the identification of category-like classes of words, the formation of patterns not observed in the input, retreat from overgeneralizations, and the mastery of subject-verb agreement. However, O’Grady goes on to say that it raises the question of why the particular statistical regularities exploited by the SRN are in the input in the first place.

In other words, why does language have the particular properties that it does? Why, for example, are there languages (such as English) in which verbs agree only with subjects, but no language in which verbs agree only with direct objects?.

Networks provide no answer to this sort of question. In fact, if presented with data in which verbs agree with direct objects rather than subjects, an SRN would no doubt “learn” just this sort of pattern, even though it is not found in any known human language.

There is clearly something missing here. Humans don’t just learn language; they shape it. Moreover, these two facts are surely related in some fundamental way, which is why hypotheses about how linguistic systems are acquired need to be embedded within a more comprehensive theory of why those systems (and therefore the input) have the particular properties that they do. There is, simply put, a need for an emergentist theory of grammar. (O’Grady, 2011, p. 4).  

In conclusion, then, some leading emergentists themselves agree that emergentism has not, so far, offered any satisfactory description of the knowledge of the linguistic system that is required of a property theory. An unfinished construction grammar that is brought to bear on “a huge collection of memories, specific, remembered linguistic experiences”, seems to be as far as they’ve got.  

Associative learning

Whatever the limitations of the emergentists’ sketchy account of linguistic knowledge might be, their explanation of the process of language learning (which is, after all, their main focus) seems to have more to recommend it, not least its simplicity. In the case of empiricist emergentists, the explanation relies on associative learning: learners make use of simple cognitive mechanisms to implicitly recognise frequently-occurring associations among elements of language found in the input. To repeat what was said above, the theory states that constructions are learned through the associative learning of cue-outcome contingencies. Associations between co-occurring elements of language found in the input are gradually strengthened by successive encounters, and, based on sufficiently frequent cues pairing these two elements, the learner abstracts to a general association between them. To this simplest of explanations, a few other elements are attached, not least the “power law of practice”. In his 2002 paper on frequency effects in language processing, Ellis cites Kirsner (1994)’s claim that the strong effects of word frequency on the speed and accuracy of lexical recognition are explained by the power law of learning,

which is generally used to describe the relationships between practice and performance in the acquisition of a wide range of cognitive skills. That is, the effects of practice are greatest at early stages of learning, but they eventually reach asymptote. We may not be counting the words as we listen or speak, but each time we process one there is a reduction in processing time that marks this practice increment, and thus the perceptual and motor systems become tuned by the experience of a particular language (Ellis, 2002, p. 152).

Eubank & Gregg (2002, p. 239) suggest that there are many areas of language learning which the emergentist explanation can’t explain. For example:

Ellis aptly points to infants’ ability to do statistical analyses of syllable frequency (Saffran et al., 1996); but of course those infants haven’t learned that ability.  What needs to be shown is how infants uniformly manage this task:  why they focus on syllable frequency (instead of some other information available in exposure), and how they know what a syllable is in the first place, given crosslinguistic variation.  Much the same is true for other areas of linguistic import, e.g. the demonstration by Marcus et al. (1999) that infants can infer rules.  And of course work by Crain, Gordon, and others (Crain, 1991; Gordon, 1985) shows early grammatical knowledge, in cases where input frequency could not possibly be appealed to. .  All of which is to say, for starters, that such claims as that “learners need to have processed sufficient exemplars.” (p.40) are either outright false, or else true only vacuously (if “sufficient” is taken to range from as low a figure as 1).

Eubank & Gregg (2002, p. 240) also question emergentist use of key constructs. For example:

The Competition Model, for instance, relies heavily on the frequency (and reliability) of so-called “cues”.  The problem is that it is nowhere explained just what a cue is, or what could be a cue; which is to say that the concept is totally vacuous (Gibson, 1992).  In the absence of any principled characterization of the class of possible cues, an explanation of acquisition that appeals to cue-frequency is doomed to arbitrariness and circularity .  (The same goes, of course, for such claims as Ellis’s [p.54] that “the real stuff of language acquisition is the slow acquisition of form-function mappings,” in the absence of any criterion for what counts as a possible function and what counts as a possible form.)

In his (2003) article, Gregg has more to say about cues:   

The question then arises, What is a cue, that the environment could provide it? Ellis, for example, says, ‘in the input sentence “The boy loves the parrots,” the cues are: preverbal positioning (boy before loves), verb agreement morphology (loves agrees in number with boy rather than parrots), sentence initial positioning and the use of the article the)” (1998: 653). In what sense are these ‘cues’ cues, and in what sense does the environment provide them? What the environment can provide, after all, is only perceptual information, for example, the sounds of the utterance and the order in which they are made. (Emphasis added.) So in order for ‘ boy before loves’ to be a cue that subject comes before verb, the learner must already have the concepts subject and verb. But if subject is one of the learner’s concepts, on the emergentist view, he or she must have learned that; the concept subject must ’emerge from learners’ lifetime analysis of the distributional characteristics of the language input,’ as Ellis (2002a: 144) puts it (Gregg, 2003, p. 120).

Connectionist Models

Gregg (2003) goes to some length to critique the connectionist model reported in Ellis and Schmidt’s 1997 and 1998 papers. The model was made to investigate “adult acquisition of second language morphology using an artificial second language in which frequency and regularity were factorially combined” (1997, p. 149). The experiment was designed to test “whether human morphological abilities can be understood in terms of associative processes’ (1997, p. 145) and to show that “a basic principle  of  learning, the power  law of practice, also generates frequency by regularity  interactions” (1998, p. 309). The authors claimed that the network learned both the singular and plural forms for 20 nonce nouns, and also learned the ‘regular’ or ‘default’ plural prefix. In subsequent publications, Ellis claimed that the model gives strong support to the notion that acquisition of morphology is a result of simple associative learning principles and that the power law applies to the acquisition of morphosyntax. Gregg’s (2003) paper does a thorough job of refuting these claims.

Gregg begins by pointing out that connectionism itself is not a theory, but rather a method, “which in principle is neutral as to the kind of theory to which it is applied”. He goes on to point out the severe limitations of the Ellis and Schmidt experiment. In fact, the network didn’t learn the 20 nouns, or the 11 prefixes; it merely learned to associate the nouns with the prefixes (and with the pictures) – it started with the 11 prefixes, and was trained such that only one prefix was reinforced for any given word. Furthermore, the model was slyly given innate knowledge!   

Although Ellis accepts that linguistic representations – of some sort or other – are the basis of our language capacity, he rejects the nativist view that the representations are innate, and therefore he needs to explain how the representations are acquired. In the Ellis & Schmidt model, the human subjects were given pictures and sounds to associate, and the network was given analogous input units to associate with output units. But, while the human participants in the experiment were shown two pictures and were left to infer plurality (rather than, say, duality or repetition or some other inappropriate concept), the network was given the concept of plurality free as one of the input nodes (and was given no other concept). (Emphasis added.) Gregg comments that while nativists who adopt a UG view of linguistic knowledge can easily claim that the concept of plurality is innate, Ellis cannot do so, and thus he must explain how the concept of plurality has been acquired, not just make it part of the model’s structure. So, says Gregg, the model is “fudging; innate knowledge has sneaked in the back door, as it were”. Gregg continues:

Not only that, but it seems safe to predict that the human subjects, having learned to associate the picture of an umbrella with the word ‘broil’, would also be able to go on to identify an actual umbrella as a ‘broil’, or a sculpture or a hologram of an umbrella as representations of a ‘broil’. In fact, no subject would infer that ‘broil’ means ‘picture of an umbrella’. And nor would any subject infer that ‘broil’ meant the one specific umbrella represented by the picture. But there is no reason whatever to think that the network can make similar inferences (Gregg, 2003, p. 114).

Emergentism and Instructed SLA

Ellis and others who are developing emergentist theories of SLA stress that, at least for monolingual adults, the process of SLA is significantly affected by the experience of learning ones’ native language. Children learn their first language implicitly, through associative learning mechanisms acting on the input from the environment, and any subsequent learning of more lanaguages is similar in this respect. However, monolingual adult L2 learners “suffer” from the successful early learning of their L1, because the success results in implicit input processing mechanisms being set for the L1, and the knock-on effect is that the entrenched L1 processing habits work against them, leading them to apply entrenched habits to an L2 where they do not apply. Ellis argues that the filtering of L2 input to L1-established attractors leads to adult learners failing to acquire certain parts of the L2, which are referred to as its “fragile” features (a term coined by Goldin-Meadow, 1982, 2003). Fragile features are non-salient – they pass unnoticed – and they are identified as being one or more of infrequent, irregular, non-syllabic, string-internal, semantically empty, and communicatively redundant.

Ellis (2017) (supported by Long, 2015), suggests that teachers should use explicit teaching to facilitate implicit learning, and that the principle aim of explicit teaching should be to help learners modify entrenched automatic L1 processing routines, so as to alter the way subsequent L2 input is processed implicitly. The teacher’s aim should be to help learners to consciously pay attention to a new form, or form–meaning connection and to hold it in short-term memory long enough for it to be processed, rehearsed, and an initial representation stored in long-term memory. Nick Ellis (2017) calls this “re-setting the dial”: the new, better exemplar alters the way in which subsequent exemplars of the item in the input are handled by the default implicit learning process.

It’s interesting to see what Long (2015, p. 50) says in his major work on SLA and TBLT:

A plausible usage-based account of (L1 and L2) language acquisition (see, e.g., N.C. Ellis 2007a,b, 2008c, 2012; Goldberg & Casenhiser 2008; Robinson & Ellis 2008; Tomasello 2003), with implicit learning playing a major role, begins with initially chunk-learned constructions being acquired during receptive or productive communication, the greater processability of the more frequent ones suggesting a strong role for associative learning from usage. Based on their frequency in the constructions, exemplar-based regularities and prototypical morphological, syntactic, and other patterns – [Noun stem-PL], [Base verb form-Past], [Adj Noun], [Aux Adv Verb], and so on – are then induced and abstracted away from the original chunk-learned cases, forming the basis for attraction, i.e., recognition of the same rule-like patterns in new cases (feed-fed, lead-led, sink-sank-sunk, drink-drank-drunk, etc.), and for creative language use.

In sum, …….., while incidental and implicit learning remain the dominant, default processes, their reduced power in adults indicates an advantage, and possibly a necessity (still an open question), for facilitating intentional initial perception of new forms and form–meaning connections, with instruction (focus on form) important, among other reasons, for bringing new items to learners’ focal attention. Research may eventually show such “priming” of subsequent implicit processing of those forms in the input to be unnecessary. Even if that turns out to be the case, however, opportunities for intentional and explicit learning are likely to speed up acquisition and so becomes a legitimate component of a theory of ISLA, where efficiency, not necessity and sufficiency, is the criterion for inclusion.

It should be obvious from the earlier discussion above that I’m persuaded by the criticisms of Eubank, Gregg, O’Grady (and many others!) to reject empricist emergentism as a theory of SLA, and I confess to having felt surprised when I first read the quotation above. Never mind. What I think is interesting is that a different explanation of SLA – one which allows for innate knowledge, a “bootstrapping” view of the process of acquisition, and interlanguage development – has some important things in common with emergentism, which can be incorportated into a theory of ISLA (Instructed Second Language Acquisition). Such a theory needs to look more carefully at the effects of different syllabuses, materials and teacher interventions on students learning in different environments, in order to assess their efficacy, but I’m sure it will begin with the commonly accepted view among SLA scholars that, regardless of context, implicit learning drives SLA, and that explicit instruction can best be seen as a way of speeding up this implicit learning.


At the root of the problem of any empiricist account is the poverty of the stimulus argument. Gregg (2003, p. 101) summarises Laurence and Margolis’ (2001: 221) “lucid formulation” of it:

1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.

2. The correct set of principles need not be (and typically is not) in any pre-theoretic sense simpler or more natural than the alternatives.

3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.

4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.

5. Children do reliably arrive at the correct grammar for their language.

6. Therefore children are not empiricist learners. 

By adopting an associative learning model and an empiricist epistemology (where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations), emergentists have a very difficult job explaining how children come to have the linguistic knowledge they do. How can general conceptual representations acting on stimuli from the environment explain the representational system of language that children demonstrate? I don’t think they can.

In the next post, I’ll discuss William O’Grady’s version of emergentism.  


Bates, E., Elman, J., Johnson, M., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1998).  Innateness and emergentism. In W. Bechtel & G. Graham (Eds.), A companion to cognitive science (pp. 590-601).  Basil Blackwell.

Ellis, N. (2002) Frequency effects in language processing: A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in SLA, 24,2, 143-188.

Ellis, N. (2015) Implicit AND Explicit Language Learning: Their dynamic interface and complexity. In Rebuschat, P. (Ed.). (2015). Implicit and explicit learning of languages, (pp. 3-23). Amsterdam: John Benjamins.

Ellis, N., & Schmidt, R. (1997). Morphology and longer distance dependencies: Laboratory Research Illuminating the A in SLA. Studies in Second Language Acquisition, 19(2), 145-171

Ellis, N. & Wulff, S. (2020) Usage-based approaches to l2 acquisition. In (Eds) VanPatten, B., Keating, G., & Wulff, S. Theories in Second Language Acquisition: An Introduction. Routledge.

Eubank, L. and Gregg, K. R. (2002) News Flash – Hume Still Dead. Studies in Second Language Acquisition, 24, 2, 237-248.

Gregg, K. R. (1993). Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Gregg, K. R. (1996). The logical and developmental problems of second language acquisition.  In Ritchie, W.C. and Bhatia, T.K. (eds.) Handbook of second language acquisition. Academic Press. 

Gregg, K. R. (2000). A theory for every occasion: postmodernism and SLA.  Second Language Research 16, 4, 34-59.

Gregg, K. R. (2001). Learnability and SLA theory. In Robinson, P. (Ed.) Cognition and Second Language Instruction.  CUP.

Gregg, K. R. (2003) The State of Emergentism in Second Language Acquisition.  Second Language Research, 19, 2, 95-128. 

O’Grady, W., Lee, M. & Kwak, H. (2011) Emergentism and Second Language Acquisition. In W. Ritchie & T. Bhatia (eds.), Handbook of Second Language Acquisition. Emerald Press.

O’Grady, W.(2011).  Emergentism. In Hogan, P. (ed). The Cambridge Encyclopedia of Language Sciences, Cambridge University Press.

Seidenburg, M. and Macdonald, M. (1997) A Probabilistic Constraints Approach to Language Acquisition and Processing. Cognitive Science, 23, 4, 569–588.

What Is Empiricism?


Emergentist theories of language learning are now so prevelent that their effects are being seen in the ELT world, where leading teacher educators refer to various emergentist constructs (e.g., priming, constructions, associative learning) and increasingly adopt what they take to be an emergentist view of L2 learning. Within emergentism, there is an interesting difference of opinion between those (the majority, probably) who follow the “input-based” or “empiricist” emergentist approach as proposed by Nick Ellis, and those who support the “processor” approach of William O’Grady. In preparation for a revised post on emergentism, I here discuss empiricism.

Rationalism vs Empiricism

In Discourse on Method, Descartes (1969 [1637]) describes how he examined a piece of wax. It had a certain shape, colour, and dimension. It had no smell, it made a dull thud when struck against the wall, and it felt cold. When Descartes heated the wax, it started to melt, and everything his senses had told him about the wax turned to its opposite – the shape, colour and dimensions changed, it had a pungent odour, it made little sound and it felt hot.  How then, asked Descartes, do I know that it is still a piece of wax?  He adopted a totally sceptical approach, supposing that a demon was doing everything possible to delude him. Perhaps it wasn’t snowing outside,  perhaps it wasn’t cold, perhaps his name wasn’t Rene, perhaps it wasn’t Thursday.  Was there anything that could escape the Demon hypothesis?  Was there anything that Descartes could be sure he knew?  His famous conclusion was that the demon could not deny that he thought, that he asked the question “What can I know?”  Essentially, then, it was his capacity to think, to reason, that was the only reliable source of knowledge, and hence Descartes’ famous “Cogito ergo sum”, I think, therefore I am. Descartes based his philosophical system on the innate ability of the thinking mind to reflect on and understand our world.  We are, in Descartes’ opinion, unique in having the ability to reason, and it is this capacity to reason that allows us to understand the world. 

But equally important to the scientific revolution of the early 17th century was the empirical method championed by Francis Bacon.  In The Advancement of Learning (Bacon, 1974 [1605]), Bacon claimed that the crucial issue in philosophy was epistemological, i.e., reliable knowledge, and proposed that empirical observation and experiments should be recognised as the way to obtain such knowledge. (Note that empirical observation means observations of things in the world that we experience through our senses, not to be confused with the epistemological view adopted by empiricists – see below.) Bacon’s proposal is obviously at odds with Descartes’s argument: it claims that induction, not deduction, should guide our thinking. Bacon recommends a bottom-up approach to scientific investigation: carefully conducted empirical observations should be the firm base on which science is built. Scientists should dispassionately observe, measure, and take note, in such a way that, step by careful step, checking continuously along the way that the measurements are accurate and that no unwarranted assumptions have crept in, they accumulate such an uncontroversial mass of evidence that they cannot fail to draw the right conclusions from it. Thus they finally arrive at an explanatory theory of the phenomena being investigated whose truth is guaranteed by the careful steps that led to it.

In fact, if one actually stuck to such a strictly empirical programme, it would be impossible to arrive at any general theory, since there is no logical way to derive generalisations from facts (see Hume, below). Equally, it is impossible to develop a rationalist epistemology from Descartes’ “Cogito ergo sum”, since the existence of an external world does not follow. In both cases, compromises were needed, and, in fact, more “practical” inductive and deductive processes were both used in the development of scientific theories, although we can note the differences between the more conservative discoverers and the more radical inventors and “big theory” builders, throughout the development of modern science in general, and in the much more recent and restricted development of SLA theory in particular. Larsen-Freeman and Long (1991), for example, talk about two research traditions in SLA: “research then theory”, and “theory then research”, and these obviously correspond to the inductive and deductive approaches respectively.    

In linguistics, the division between “empiricist” and “rationalist” camps is noteworthy for its incompatibility. The empiricists, who held sway, at least in the USA, until the 1950s, and whose most influential member was Bloomfield, saw their job as field work: accompanied with tape recorders and notebooks, the researcher recorded thousands of hours of actual speech in a variety of situations and collected samples of written text. The data was then analysed in order to identify the linguistic patterns of a particular speech community. The emphasis was very much on description and classification, and on highlighting the differences between languages. We might call this the botanical approach, and its essentially descriptive, static, “naming of parts” methodology depended for its theoretical underpinnings on the language learning explanation provided by the behaviourists.


Behaviourism was first developed in the early twentieth century by the American psychologist John B. Watson, who attempted to make psychological research “scientific” by using only objective procedures, such as laboratory experiments which were designed to establish statistically significant results. Watson (see Toates and Slack, 1990: 252-253) formulated a stimulus-response theory of psychology according to which all complex forms of behaviour are explained in terms of simple muscular and glandular elements that can be observed and measured.  No mental “reasoning”, no speculation about the workings of any “mind”, were allowed. Thousands of researchers adopted this methodology, and from 1920 until the 1950s an enormous amount of research on learning in animals and in humans was conducted under this strict empiricist regime. In 1950 behaviourism could justly claim to have achieved paradigm status, and at that moment B.F. Skinner became its new champion. Skinner’s contribution to behaviourism was to challenge the stimulus-response idea at the heart of Watson’s work and replace it by a type of psychological conditioning known as reinforcement (see Skinner, 1957, and Toates and Slack, 1990: 268 – 278).  Note the same insistence on a strict empiricist epistemology (no “reasoning”, no “mind”, no appeal to mental processes), and the claim that language is learned in just the same way as any other complex skill is learned – by social interaction.  

In sharp contrast to the behaviourists and their rejection of “mentalistic” formulations is the approach to linguistics championed by Chomsky.  Chomsky (in 1959 and subsequently), argued that the most important thing about languages was the similarities they shared, what they have in common, not their differences. In order to study these similarities, Chomsky assumed the existence of unobservable mental structures and proposed a “nativist” theory to explain how humans acquire a certain type of knowledge.  A top-down, rationalist, deductive approach is evident here.        

The Empiricists

But let’s return to empiricism. In the second half of the eighteenth century, a new movement in philosophy, known as empiricism, appeared, the most influential proponents being Locke, Mill, and Hume. In a much more radical, more epistemologically-formulated, statement of Bacon’s views, the British empiricists argued that everything the mind knows comes through the senses. As Hume put it:  “The mind has never anything present to it but the perceptions.” (Hume, 1988 [1748]: 145). Starting from the premise that only “experience” (all that we perceive through our senses) can help us to judge the truth or falsity of factual sentences, Hume argued that reliable knowledge of things was obtained by observing the relevant quantitative, measurable data in a dispassionate way. This is familiar territory – Bacon again, we might say – but the argument continues in a way that has dire consequences for rationalism. 

If, as Hume claims, knowledge rests entirely on observation, then there is no basis for our belief in natural laws: we believe in laws and regularities only because of repetition. For example, we believe the sun will rise tomorrow because it has repeatedly done so every 24 hours, but the belief is an unwarranted inductive inference. As Hume so brilliantly insisted, we can’t logically go from the particular to the general: it is an elementary, universally accepted tenet of formal logic that no amount of cumulative instances can justify a generalisation. No matter how many times the sun rises in the East, or thunder follows lightening, or swans appear white, we will never know that the sun rises in the East, or that thunder follows lightning or that all swans are white. This is the famous “logical problem of induction”. To be clear, the empiricists don’t claim that we have empirical knowledge – they limit themselves to the claim that knowledge can only be gained, if at all, by experience. And if the rationalists are right to claim that experience cannot give us knowledge, the conclusion must be that we do not know at all. Hume’s position with regard to causal explanation is the same: such explanations can’t count as reliable knowledge, they are only presupposed to be true in virtue of a particular habit of our minds.

The positivists tried to solve Hume’s devastating critique.


Positivism refers to a particularly radical form of empiricism. Comte invented the term, arguing that each branch of knowledge passes through “three different theoretical states: the theological or fictitious state; the metaphysical or abstract state; and, lastly, the scientific or positive state.” (Comte, 1830, cited in Ryan, 1970:36)  At the theological stage, the will of God explains phenomena, at the metaphysical stage phenomena are explained by appealing to abstract philosophical categories, and at the scientific stage, any attempt at absolute explanations of causes is abandoned.  Science limits itself to how observational phenomena are related. Mach, the Austrian philosopher and physicist, headed the second wave, which rooted out the “contradictory” religious elements in Comte’s work, and took advantage of the further progress made in the hard sciences to insist on purging all metaphysics from the scientific method (see Passmore, 1968: 320-321).

The third wave of positivists, whose members were known as The Vienna Circle, included Schlick, Carnap, Godel, and others, and had Russell, Whitehead and Wittgenstein  as interested parties (see Hacking, 1983: 42-44). They developed a programme based on the argument that true science could only be achieved by:

  1. Completely abandoning metaphysical speculation and any form of theology. According to the positivists such speculation only proposed and attempted to solve “pseudo-problems” which lacked any meaning since they were not supported by observable, measurable, experimental data.
  2. Concentrating exclusively on the simple ordering of experimental data according to rules. Scientists should not speak of causes: there is no physical necessity forcing events to happen and all we have in the world are regularities between types of events. There is no room in science for unobservable or theoretical entities .

The programme was a complete fiasco, none of the objectives were realised, and the movement disbanded in the 1930s. “Positivism” in general, and as expounded in the writings of the Vienna Circle in particular, is, in my opinion, a good example of philosophers stubbornly marching up a blind alley. It’s a fundamentally mistaken project, as Popper (1959) demonstrated, and as Wittgenstein (1933) himself recognised. We may note that critics of psycholinguistic theories of SLA who label their opponents “positivists”, are either ignorant of the history of positivism or making a strawman case against what they consider to be a mistakenly “scientific” approach to research. We may also note that empiricism as an epistemological system, if taken to its extreme, leads to a dead end of radical scepticism and solipsism. Therefore, when looking at current discussions among scholars of SLA, it’s of the utmost importance to distinguish between a radical empiricist epistemology on the one hand, and an appeal to empirical evidence on the other.  

The start of the psycholinguistic study of SLA

To conclude, we’ll look briefly at how behaviourism was superseded by Chomsky’s UG, thus ending – for a while anyway! – the hold that empiricism had enjoyed over theories of language learning.  

Chomsky’s  Syntactic Structures (1957), followed by his review in 1959 of Skinner’s Verbal Behaviour (1957), marked the beginning of probably the fastest, biggest, most complete revolution in science that had been seen since the 1930s. Before Chomsky, as indicated above, the field of linguistics was dominated by a Baconian, empiricist methodology, where researchers saw their job almost exclusively as the collection of data.  All languages were seen as composed of a set of meaningful sentences, each composed of a set of words, each in turn composed of phonemes and morphemes. Each language also had a grammar which determined the ways in which words could be correctly combined to form sentences, and how the sentences were to be understood and pronounced. The best way to understand the over 2,500 languages said to exist on earth was to collect and sort data about them so that eventually the patterns characterising the grammar of each language would emerge, and that then, interesting differences among different languages, and even groups of languages, would also emerge. 

Chomsky’s revolutionary argument (Chomsky, 1957, 1965, 1986) was that all human beings are born with innate knowledge of grammar – a fixed set of mental rules that enables young children to relatively quickly understand the language(s) they’re exposed to and to create and utter sentences they’ve never heard before. Language consists of a set of abstract principles that characterise the core grammars of all natural languages, and learning language is simplified by reliance on an innate mechanism that constrains possible grammar formation. Children don’t have to learn key, universal features of the particular language(s) to which they are exposed because they know them already. The job of the linguist was now to describe this generative, or universal, grammar, as rigorously as possible.

The arguments for Universal Grammar (UG) start with the poverty of the stimulus argument: young children’s knowledge of their first language can’t be explained by appealing to the actual, attested language they are exposed to. On the basis of the input young children get, they produce language which is far more complex and rule-based than could be expected, and which is very similar to that of other adult native speakers of the same language variety, at an age when they have difficulty grasping abstract concepts. That their production is rule-based and not mere imitation, as the behaviourist view held, is shown by the fact that they frequently invent unique,  well-formed utterances of their own. That they have an innate capacity to discern well-formed utterances is supported by evidence from tens of thousands of studies (see, for example, Cook & Newson, 1996).

I won’t continue a “defence of UG”. Suffice it to say that Chomsky’s work inspired the development of a psycholinguistic approach which saw L2 learning as a process going on in the mind. Beginning with error analysis and the morpheme studies, this cognitive approach made uneven progress, but Selinker’s (1972) paper, arguing that L2 learners develop their own autonomous mental grammar (interlanguage grammar) with its own internal organising principles is an important landmark. I’ve done a series of posts on all this, Part 8 of which discusses emergentist theories. As indicated, I’m not happy with Part 8, and in the next post, I’ll offer a revised version, where Nick Ellis’ “empiricist” emergentism and William O’Grady’s “mentalist” emergentism will be discussed.   


Bacon, F. 1974 [1605] The Advancement of Learning: New Atlantis.  Ed. A. Johnston. Claredon.

Chomsky, N. 1957: Syntactic Structures. The Hague: Mouton.

Chomsky, N.  1965: Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.

Chomsky, N. (1976). Reflections on Language. Temple Smith.

Cook, V. J. & Newson, M. (1996). Chomsky’s Universal Grammar: An Introduction. Blackwell.

Descartes, R. 1969 [1637]. Discourse On Method. In Philosophical Works of Descartes, Volume 1. Trans. E. Haldane and G. Ross.  Cambridge University Press.

Ellis, N. C. (2011). The emergence of language as a complex adaptive system. In J. Simpson (Ed.), Handbook of Applied Linguistics (pp. 666–79), Routledge.

Hacking, I. (1983). Representing and Intervening.  Cambridge University Press

Hume, D. 1988 [1748] An Enquiry Concerning Human Understanding.  Promethius.

Larsen-Freeman, D. & Long, M. H. (1991). An introduction to second language acquisition research. Longman.

Popper, K. R. (1959). The Logic of Scientific Discovery.  Hutchinson.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10, 209-231.

Skinner, B. F. (1957). Verbal behavior.  Appleton-Century-Crofts.

Toates, F. and Slack, I. (1990). Behaviourism and its consequences.  In Roth, I. (ed.) Introduction to Psychology. Psychology Press. 

Wittgenstein, L. (1953) Philosophical Investigations. Translated by G.E.M. Anscombe. Basil Blackwell.