The Work of Penny Ur

In the last thirty years, Penny Ur has published more than 30 books: coursebooks, work books; grammar practice books, skills practice books and “How to Teach” books. In all that time she has never wavered in her support of the same approach to ELT that she was taught when she did a PGCE at Cambridge University all those years ago.

What’s remarkable is that today, Ur continues to recommend, as keenly as she ever did, the same carefully controlled, anodyne routines that the PGCE course recommended way back then. According to this view, teachers, wherever they happen to be in the world, should use a coursebook produced in London to deliver a synthetic, grammar-based syllabus by working their way steadily through a succession of Units where “language items” are “presented, practiced and tested”, until they come to the bit of the book at the back with no writing on it, when they should stop.

Given that coursebooks have been adopted around the world as the preferred way of implementing ELT since the early 1990s, we may say that Ur’s faith in the coursebook-driven approach has been vindicated. Certainly, her tireless, consistent promotion of the same cause has won her a fair amount of success, fame and recognition, which includes being awarded an OBE (Officer of the Order of the British Empire) for services to English Language Teaching in 2013.

I happen to think that the approach championed by Ur is wrong, partly because it’s based on false assumptions about how people learn a foreign language (see this post for more about these false assumptions), and partly because it represents a stifling orthodoxy which has gone hand in hand with the commodification of ELT in particular and education in general. It fits perfectly with the general drive towards the implementation of ‘adaptive learning’ programmes which reduce education to the learning of discrete units of testable ‘knowledge’, delivered with minimum mediation by teachers. The result is the de-skilling of teachers, the reconfiguring of learners as consumers, and, as Scott Thornbury so memorably put it, to Comfort. Complacency. Conformity. Professional atrophy. Institutional malaise. Student boredom. Slow death by mcnuggets.

Whichever side of the argument you’re on, you’ll surely agree that the powerfully entrenched, coursebook-driven model of ELT should at least be open to criticism, and that it’s a “good thing” for there to be open discussion about how best to help people learn English as an L2. Even if everybody were happy to implement the kind of ELT recommended by Penny Ur, in the name of  professionalism teachers should at least know something about on-going research into the English language, how people learn English as an L2, and how various teaching programmes have been evaluated.

And here’s where we hit a problem, because Penny Ur, apart from staunchly defending coursebook-driven ELT, also promotes herself as a mediator between the academic world of applied linguistics and the classroom teacher; able, she claims, to reliably inform teachers about what’s going on in academia, despite the fact that she has no credentials for such a job. Ur has never published an article in an academic journal, she shows few signs of knowledge of the SLA literature, and she consistently dismisses significant research findings when they challenge her own approach to teaching. Ur tells readers of the UK Guardian newspaper, and of the ELT Gazette, and all those who attend her teacher training courses and conference presentations about what’s going on in applied linguistics research, while at the same time admitting that she misses a lot of what’s published, and breezily dismissing the inconvenient mountain of data which point to the fact that students don’t learn what they’re taught if they’re subjected to a synthetic, grammar-based syllabus. When asked, for example, “Why don’t you mention the research findings on interlanguage development?”, Ur replies “We have no conclusive proof” (Ur, 2017a), as if all the evidence that we do have counts for nothing.

Ur’s claim to be able to mediate between the world of academic research on the one hand, and the world of the classroom teacher on the other, is not just unwarranted, it’s also misleading and unfair, especially to novice teachers who assume that Ur knows what she’s talking about when she tells them, for example, that there is no evidence that TBLT works, or that Pienemann’s teachability hypothesis has only very doubtful implications for teaching. In her article in the Guardian (2012) Ur makes her disdain for most of what passes for academic work perfectly clear. First, Ur says, academics concentrate “almost exclusively on language acquisition”; second, the studies reported on “are selected for reasons that have nothing to do with their usefulness to the practitioner”; third, “topics that are difficult to research, though possibly more valuable for the teacher, tend to be neglected”; and finally:

researchers are not practitioners. Many have very limited or nonexistent teaching experience so their ideas on the pedagogical implications of their results may not be very practical and need to be treated with caution.

Notice that while Ur has no doubts about her own ability to speak on academic matters, she cautions against giving any credence to academics’ ideas on teaching.

In her books on how to teach English as a foreign language, Ur spends very little time discussing the question of how people learn an L2, or encouraging teachers to take part in a critical evaluation of theoretical assumptions underpinning her practical teaching tips. The updated edition of her widely recommended A Course in Language Teaching includes a new sub-section where precisely half a page is devoted to describing theories of SLA. For the rest of the 300 pages, Ur expects readers to take her word for it when she says, as if she knew, that the findings of applied linguistics research have very limited relevance to teachers’ jobs. Nowhere in any of her books or articles or presentations does Ur attempt to seriously describe and evaluate arguments and evidence from academics whose work challenges her approach, and nowhere does she encourage teachers to do so.

Ur’s work is evidence of the distinction Richards (2008) makes between two broad streams in teacher education: the first at the certificate level, where trainees receive instruction in classroom skills, and the other, ‘teacher development’, where teachers learn more about second language acquisition. How can we expect teachers to be well-informed, critically acute professionals in the world of education if their training is restricted to instruction in classroom skills, and their on-going professional development gives them no opportunities to consider theories of language, theories of language learning, and theories of teaching and education? Can we really afford to agree with Ur’s view that there’s nothing broken in teacher training in ELT?

Here are a few excerpts from Ur’s books and articles.Note that the first 4 quotes are from the 1991 edition, updated in 2009, of  A Course in Language Teaching. 

1. Ur, P. (1991, p. 10) 

In principle, the teaching processes of presenting, practising and testing correspond to strategies used by many good learners trying to acquire a foreign language on their own.    …………….

In the classroom it is the teacher’s job to promote these three learning practices by the use of appropriate teaching acts.  

Comment: Notice the careful hedging of the first claim (“In principle”,  “strategies used by many good learners” ) and the sweeping non-sequitur that follows. This is a good example of Ur’s argumentation.

2. Ur, P. (1991, p. 12) 

The learners need to take the material into short-term memory; to remember it, that is, until later in the lesson when you and they have an opportunity to do further work to consolidate learning.

Comment: The duration of short-term memory is between 15 and 30 seconds.

3. Ur, P. (1991, p. 14)

Note than some learners remember better if it is seen, others if it is heard, yet others if it is associated with physical movement (visual, audio and kinaesthetic input)…..

Comment: There is, of course, no evidence to support the theory of NLP or the notion of learner styles; it’s all been thoroughly debunked.

4. Ur, P. (1991, p. 26) RE a Spelling Activity. 

The students remarked afterwards that the activity had helped to fix the spellings in their minds and the teacher noticed that this was borne out by their subsequent performance in free writing.

Comment: Any doubts about the weight of this “evidence” will no doubt come from academics whose opinion can be safely ignored, since they know nothing about real classroom practice.

5. Ur, P. (2012)

Teaching grammar proactively through traditional focus on formS is effective.

Comment: No ifs, no buts, it’s effective. So there.

6. Ur, P. (2017b)

There is no evidence that TBLT works.  

Comment: There have been over 60 studies of TBLT published in academic journals in the last 15 years. The vast majority of them report an overall positive and strong effect for TBLT implementation on a variety of learning outcomes. Furthermore, both the quantitative and qualitative data show positive stakeholder perceptions towards TBLT programmes.

7. Ur. P. (quoted by Thornbury, 2017)

It’s certainly possible to write helpful and valid professional guidance for teachers with no research references whatsoever.

Comment: There you have it.


Richards, J. (2008) Second Language Teacher Education Today. RELC Journal, 39,2.

Ur, P. (1991) A Course in Language Teaching. CUP.

Ur, P. (2012) How useful is TESOL academic research? The Guardian.

Ur, P. (2017a) And What about the research?

Ur, P. (2017b) The Future of Professional Development. IATEFL Conference.

Thornbury, S. /(2017) Writing methodology texts: bridging the research/practice gap. IATEFL Conference.

Encounters with Noticing Part 3

How do we help people learn an L2? A major finding of SLA research is that learners of an L2 cannot be taught what they’re not ready to learn, because they’re all at some particular point in the development of interlanguages which are impervious to instruction. This suggested to many that we should help learners along their trajectory by finding out what their needs in the L2 are and then engaging them in relevant communicative tasks. Some brief, carefully-measured attention, now and then, to relevant aspects of the grammar is seen as an important way to speed up the development.

Then, along comes Schmidt and suggests that consciously ‘noticing’ formal features of L2 input is a necessary condition for learning, and this is taken by proponents of synthetic syllabuses which deliver bits of grammar or an endless succession of lexical chunks to mean that lots of explicit grammar and/or vocabulary teaching will help learners to ‘notice’ and to ‘notice the gap’.

In Parts 1 and 2 I’ve voiced some reservations about Schmidt’s Noticing Hypothesis, and here I’ll try to round things off.

Where were we?

Recall that in his original 1990 paper, Schmidt claimed that “intake” was the sub-set of  input which is noticed, and that the parts of input that aren’t noticed are lost. Thus, Schmidt’s Noticing Hypothesis, in its 1990 version, claims that noticing is the necessary condition for learning an L2.

‘Noticing’ is said to be the first stage of the process of converting input into implicit knowledge. It takes place in short-term memory (where, according to the original claim, the noticed ‘feature’ is compared to features produced as output) and it is triggered by these factors: instruction, perceptual salience, frequency, skill level, task demands, and comparing.

Criticisms of Schmidt’s hypothesis:

1.  It fails to distinguish carefully enough between attention and awareness

In reply to Schmidt’s argument that attention research supports the claim that consciousness is necessary for learning, Truscott (1998) points out that such claims are “difficult to evaluate and interpret”. He cites a number of scholars and studies to support the view that the notion of attention is “very confused”, and that it’s “very difficult to say exactly what attention is and to determine when it is or is not allocated to a given task. Its relation to the notoriously confused notion of consciousness is no less problematic”. He concludes (1998, p. 107) “The essential point is that current  research and theory on attention, awareness and learning are not clear enough to  support any strong claims about relations among the three.”

2.  Empirical support for the Noticing Hypothesis is weak

  • Truscott (1998) points out that the reviews by Brewer (1974) and Dawson and Schell (1987), cited by Schmidt, 1990), dealt with simple conditioning experiments and that, therefore, inferences regarding learning an L2 were not legitimate. Brewer specifically notes that his conclusions do not apply to the acquisition of syntax, which probably occurs ‘in a relatively unconscious ,  automatic fashion’ (p . 29).
  • Truscott further points out that while most current research on unconscious learning is plagued by continuing controversy, “one can safely conclude that the  evidence does not show that awareness of the information to be acquired is necessary for learning” (p. 108).
  • Altman (1990) gathered data in a similar way to Schmidt (1986) in studying her learning of Hebrew over a five-year period. Altman found that while half her verbalisation of Hebrew verbs could be traced to diary entries of noticing, it was not possible to identify the source of the other half and they may have become intake subconsciously.
  • Alanen’s (1992) study of Finnish L2 learning found no significant statistical difference between an enhanced input condition group and the control group.
  • Robinson’s (1997) study found mixed results for noticing under implicit, incidental, rule-search and instructed conditions.

3. Studies of ‘noticing’ suffer from serious methodological problems   

  • The studies are not comparable due to variations in focus and in the conditions operationalized.
  • The level of noticing in the studies may have been affected by variables which casts doubt on the reliability of the findings.
  • Cross (20o2) notes that “only Schmidt and Frota’s (1986) and Altman’s (1990) research considers how noticing target structures positively relates to their production as verbal output (in a communicative sense), which seems to be the true test of whether noticing has an effect on second language acquisition. A dilemma associated with this is that, as Fotos (1993) states, there is a gap of indeterminate length between what is noticed and when it appears as output, which makes data collection, analysis and correlation problematic.”
  • Ahn (2014) points to a number of problems that have been identified in eye-tracking studies, especially those using heat map analyses. (See Ahn (2014) for the references that follow.)Heat maps are only “exploratory” (p. 239), and they cannot provide temporal information on eye movement, such as regression duration, “the duration of the fixations when the reader returns to the lookzone” (Simard & Foucambert, 2013, p. 213), which might tempt researchers to rush into a conclusion that favors their own predictions. Second, as Godfroid et al. (2013) accurately noted, the heat map analyses in Smith (2012) could not control the confounding effects of “word length, word frequency, and predictability, among other factors” (p. 490). This might have yielded considerable confounding effects as well. As we can infer from the analyses shown in Smith (2012), currently the utmost need in the field is for our own specific guidelines for using eye-tracking methodology to conduct research focusing on L2 phenomena (Spinner, Gass, & Behney, 2013). Because little guidance is available, the use of eye tracking is often at risk of misleading researchers into making unreliable interpretations of their results.
  • Think aloud protocols are also questioned, since perhaps thinking aloud itself can affect learners’ cognitive processes.


Schmidt re-formulated his Noticing Hypothesis in 2001. He begins by saying that to minimise confusion, he will use ‘noticing’ as a technical term equivalent to what Gass (1988) calls  “apperception”, what Tomlin and Villa (1994) call “detection within selective attention,” and what Robinson’s (1995) calls “detection plus rehearsal in short term memory.”  What is noticed are now “elements of the surface structure of utterances in the input, instances of language” and not “rules or principles of which such instances may be exemplars”. Noticing does not refer to comparisons across instances or to reflecting on what has been noticed.

In the section “Can there be learning without attention?”, Schmidt admits there can, with the L1 as a source that helps learners of an L2 being an obvious example. Schmidt says that it’s “clear that successful second language learning goes beyond what is present in input”. Schmidt presents evidence which, he admits, “appears to falsify the claim that attention is necessary for any learning whatsoever”, and this prompts him to propose the weaker version of the Noticing Hypothesis, namely “the more noticing, the more learning”.


As was mentioned, Schmidt (2001) says that he is using ‘noticing’ as a technical term equivalent to Gass’ apperception. True to dictionary definitions of apperception, Gass defines apperception as “the process of understanding by which newly observed qualities of an object are initially related to past experiences”. The light goes on, the learner realises that something new needs to be learned. It’s “an internal cognitive act in which a linguistic form is related to some bit of existing knowledge (or gap in knowledge)”. It shines a spotlight on the identified form and prepares it for further analysis. To me, this clashes with Schmidt’s insistence that noticing does not refer to comparisons across instances or to reflecting on what has been noticed, and in any case, it is not at all clear to me how the subsequent stages of Gass’ model convert apperceptions into implicit knowledge of the L2 grammar.

Schmidt says that ‘noticing’ is also equivalent to what Tomlin and Villa (1994) call “detection within selective attention.” But it seems to me that ‘noticing’ isn’t at all equivalent to what Tomlin and Villa really wanted to talk about – detection that does not require awareness. According to Tomlin and Villa, the three components of attention are alertness, orientation, and detection, but only detection is essential for further processing and awareness plays no important role in L2 learning.

In the 2010 paper, Schmidt confirms the concessions which amount to saying that ‘noticing’ is not needed for all L2 learning, but that the more you notice the more you learn. He also confirms that noticing does not refer to reflecting on what is noticed.

You can’t notice grammar 

Finally, we get a glimpse of an answer to Gregg’s crucial question about how we get from ‘noticing’ to the acquisition of linguistic competence in Schmidt’s 2010 paper, where he deals with Suzanne Carroll’s objection to his hypothesis. Schmidt succinctly summarises Carroll’s view that attention to input plays little role in L2 learning because most of what constitutes linguistic knowledge is not in the input to begin with. She argues that Krashen, Schmidt and Gass all see “input” as observable sensory stimuli in the environment from which forms can be noticed,

whereas in reality the stuff of acquisition (phonemes, syllables, morphemes, nouns, verbs, cases, etc.) consists of mental constructs that exist in the mind and not in the environment at all. If not present in the external environment, there is no possibility of noticing them.

Schmidt’s answer is:

In general, ideas about attention, noticing, and understanding are more compatible with instance-based, construction-based and usage-based theories (Bley-Vroman, 2009; Bybee & Eddington, 2006; Goldberg, 1995) than with generative theories.

Which is not much better than no answer at all. Carroll effectively answers Gregg’s question by saying that all those who start with input, following Krashen, get things backwards. I offered this quote from Carroll (2001, p. 11) at the end of Part 2:

The view that input is comprehended speech is mistaken  and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!” 

Learners do not attend to things in the input as such, they respond to speech-signals by attempting to parse the signals, and failures to do so trigger attention to parts of the signal.  Carroll’s Autonomous Induction Theory is too complicated for me to offer a brief summary of, but in my opinion, Carroll’s assertions that it is possible to have speech-signal processing without attention-as-noticing or attention-as-awareness are persuasive. She argues that learners may unconsciously and without awareness detect, encode and respond to linguistic sounds; that learners don’t always notice their own processing of segments and the internal organization of their own conceptual representations; that the processing of forms and meanings are often not noticed; and that attention is the result of processing not a prerequisite for processing.

In brief:

  1. The Noticing Hypothesis even in its amended version does not clearly describe the construct of ‘noticing’.
  2. The empirical support claimed for the Noticing Hypothesis is not as strong as Schmidt (2010) claims.
  3. A theory of SLA based on noticing a succession of forms faces the impassable obstacle that, as Schmidt seemed to finally admit, you can’t notice rules or principles of grammar.
  4. “Noticing the gap” is not sanctioned by Schmidt’s ammended Noticing Hypothesis.
  5. The way that so many writers and ELT trainers use “noticing” to justify all kinds of explicit grammar and vocabulary teaching demonstrates that Scmidt’s Noticing Hypothesis is widely misunderstood and misused.

There we are then. My attempt to understand Schmidt’s propositions remind me of Wittgenstein’s famous conclusion to his Tractatus

My Propositions serve as elucidations in the following way: anyone who understands me eventually recognizes them as nonsensical, when he has used them — as steps — to climb beyond them.  (He must, so to speak, throw away the ladder after he has climbed up it.)


Ahn, J.I. (2014) Attention, Awareness, and Noticing in SLA: A Methodological ReviewMSU Working Papers in SLS, Vol. 5.

Carroll, S. (2001) Input and Evidence: The Raw Material of Second Language Acquisition. Amsterdam, Benjamins.

Cross, J. (2002) ‘Noticing’ in SLA: Is it a valid concept? Downloaded from

Schmidt,R.W. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. and Frota, S.N.  (1986) Developing  basic  conversational  ability in  a  second language:  a  case  study of an adult learner of Portuguese . In Day , R.R., editor,  Talking to learn: conversation in second language acquisition. Rowley, MA: Newbury.

Schmidt, R. (2010) Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J.W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies.

Encounters with Noticing Part 2

Just to reassure those who might be unduly swayed by the likes of Penny Ur (and Scott Thornbury on a bad day) into thinking that they shouldn’t worry their heads with all all this theoretical stuff (just trust your instincts and polish your presentation skills), my motivation for sniffing around this particular theoretical stuff is to check on the foundations of our teaching. It’s a terrible job, the pay’s lousy, but somebody’s got to do it, right? Somebody’s got to check, that is, to see whether ‘noticing’ justifies all the explicit teaching done in its name. I suspect that the influential teacher trainers who rely on ‘noticing’ to justify their encouragement of everything from teaching a grammar-based syllabus to teaching as many lexical chunks as you can cram into a 90 minute class are talking baloney, and it should be made clear that their advice gets no support from any good research. On the face of it ‘noticing’ encourages bad teaching practice, and so needs to be carefully examined.

So here we go with Part 2.  In the comments that followed Part 1, one particular problem was highlighted by Kevin Gregg, who said:

You can’t notice what is not in the input; and rules, for instance, or functions, are not in the input.

This prompted Thom to ask:

In what other way can anybody learn grammar if it is not by way of input?

Kevin’s on-going tussle with time (trains to catch, letters to write, shopping to do) prevented him from replying, so I’ll try.

Well it depends where you’re coming from, as they say. Empiricists, or rather, “‘empiricist’ emergentists” as Gregg calls them would say that input is the sufficient condition for learning an L2, and they’d probably caution against listening to any talk of mental grammars. Empiricists like Nick Ellis see all knowledge as coming from the information we get through our senses during our interaction with the environment, and with reference to language learning, the emergentists argue that we aren’t born with linguistic knowledge of any sort because we don’t need it. General learning devices (capable of making generalisations based on exemplars found in the input, for example) are all we need. In Nick Ellis’ words:

massively parallel systems of artificial neurons use simple learning processes to statistically abstract information from masses of input data. What evidence is there in the input stream from which simple learning mechanisms might abstract generalizations? The Saussurean linguistic sign as a set of mappings between phonological forms and conceptual meanings or communicative intentions gives a starting point. Learning to understand a language involves parsing the speech stream into chunks which reliably mark meaning. 

…  in the first instance, important aspects of language learning must concern the learning of phonological forms and the analysis of phonological sequences: the categorical units of speech perception, their particular sequences in particular words and their general sequential probabilities in the language…. 

In this view, phonology, lexis and syntax develop hierarchically by repeated cycles of differentiation and integration of chunks of sequences.

On the other hand, nativists like Kevin Gregg, specially those who accept Chomsky’s principles and parameters UG theory, point to the knowledge young children have of language to argue that SLA is the result of an innate representational system specific to the language faculty acting on input in such a way that an L2 grammar is created. We are born with knowledge of various linguistic rules, constraints and principles. In interaction with the environment, which exposes us to ‘primary linguistic data’, we acquire a new, expanded body of linguistic knowledge, namely, knowledge of a specific language like English. This final state of the language faculty constitutes our ‘linguistic competence’, essential, but not sufficient for our ability to speak and understand a language. Additional knowledge about actual language use is acquired through other general learning mechanisms.

Whatever view we take of the SLA process, the question of how it starts (input) is obviously critical, but re-visiting Schmidt’s Noticing Hypothesis has led me to appreciate that the question of how it ends up is equally important. What finally gets acquired? To answer this question we need what Gregg calls a “property” theory of SLA – a theory of language, or, more precisely, of linguistic knowledge of the L2. What is the knowledge that is acquired when someone learns a second language?  O’Grady (2005) notes that while the UG camp talk about problems sorting out categories and structures, the emergentists talk about sorting out words and their meanings, and this leads him to suggest that the disagreement about how we learn an L2 stems from a deeper disagreement about “the nature of language itself”. O’Grady (2005, p. 164) explains:

On the one hand, there are linguists who see language as a highly complex formal system that is best described by abstract rules that have no counterparts in other areas of cognition. …. Not surprisingly, there is a strong tendency for these researchers to favor the view that the acquisition device is designed specifically for language. On the other hand, there are many linguists who think that language has to be understood in terms of its communicative function. According to these researchers, strategies that facilitate communication – not abstract formal rules – determine how language works. Because communication involves many different types of considerations … this perspective tends to be associated with a bias toward a multipurpose acquisition device.

This excellent comment is echoed by Susanne Carroll (2001, p. 47), who distinguishes between

  • Classical structural theories of information processing which claim that mental processes are sensitive to structural distinctions encoded in mental representations. Input is a mental representation which has structure.
  • Classical connectionist approaches to linguistic cognition which deny the relevance of structural representations to linguistic cognition. For them, linguistic knowledge is encoded as activated neural nets and is only linked to acoustic events by association.

Carroll comments:

Anyone who is convinced that the last 100 years of linguistic research demonstrate that linguistic cognition is structure dependent — and not merely patterned— cannot adopt a classical connectionist approach to SLA.

O’Grady’s and Carroll’s remark remind me that the majority of scholars who are currently looking closely at how input ends up as knowledge don’t articulate a coherent answer to the crucial question: “What is the linguistic knowledge that is acquired?”. Many years ago, I myself made some effort to kick this question into the long grass. Gregg’s repeated insistence on the need for a property theory of SLA which describes what is acquired, prompted me to say in a book and in an article for Applied Linguistics that researchers could perfectly well get on with developing a theory of SLA without worrying about the damn property theory. In a short reply (I think he had a bus to catch that time), Gregg effortlessly dealt with my bleatings (the bus and, I like to think, our friendship saved me from the full Gregg treatment) and I’m now fully persuaded that he’s right to demand a property theory.

I think it’s the absence of a well-articulated property theory that makes it so difficult for Schmidt and others to explain how information from the environment ends up as linguistic knowledge of the L2.  They accept that the knowledge acquired includes linguistic knowledge of, for example, the structure of an English verb phrase, and they insist that learning this knowledge depends on ‘noticing’ things in the input” But how, we must ask again, does ‘noticing’ audio stimuli from the environment lead to the acquisition of the linguistic knowledge demonstrated by proficient L2 users?  Let’s take a quick look at the history of SLA research.

The shift from a behaviouristic to a mentalist view of language learning (sparked by Chomsky’s rebuttal of Skinner in 1957) prompted scholars in the field of psycholinguistics to see language learning as a process which goes on inside the brain and involves the workings of some kind of acquisition device. The, as yet unobservable, “black box” that we can refer to as an acquisition device is almost certainly not located in one particular part of the brain, might or might not be dedicated exclusively to language learning, might or might not make use of innate linguistic knowledge, but certainly does (somehow) enable us to receive, organise, store and retrieve, and manipulate ‘input so as to facilitate learning the L2.

And there it is: ‘input’. The Merriam-Webster dictionary says that the term was first used in 1953, in the context of computer design, to refer to data sent to a computer for processing. In the study of SLA, Corder (1967) was the first to suggest that we acquire the rules of language in a predictable way, and that the order is independent of the order in which rules are taught in language classes. This led Corder to suggest that there was a difference between input and intake.

The simple fact of presenting a certain linguistic form to a learner in the classroom does not necessarily qualify it for the status of input, for the reason that input is ‘what goes in’ not what is available for going in, and we may reasonably suppose that it is the learner who controls this input, or more properly his intake. This may well be determined by the characteristics of his language acquisition mechanism. (p. 165).

Here, input is what’s available, and intake is what the learner decides to take in. It’s not clear to me what either ‘input’ or ‘intake’ refer to, and anyway, as Schmidt (1990) points out, Corder contradicts himself by saying in the first sentence that the learner controls intake, and by then saying in the second sentence that his language acquisition mechanism does. More importantly for our hunt, Schmidt goes on to say that it’s not clear whether intake is the subset of input that makes it into short term memory, or whether it’s that part of input that has been sufficiently processed to now form part of the learner’s interlanguage system. The way Schmidt expresses this second point is instructive. Schmidt says that Corder’s treatment of intake does not make any clear distinction between that part of input used to comprehend messages and that part used “for the learning of form” (Schmidt, 1990, p. 139). Schmidt also endorses Slobin’s (1985) distinction between processes involved in converting input into stored data for the construction of language, and processes used to organise stored data into linguistic systems. Schmidt is obviously aware (sorry) of the problem of clearly identifying not just the level of conscious attention /awareness involved in noticing, but also the problems of clearly defining what is noticed and what (if any) processing goes on when learners notice whatever it is they notice.

Moving on to Krashen, his input hypothesis draws on the “natural order” of L2 acquisition that Corder drew attention to, and supposes that learners progress along a pre-determined learning trajectory which is impervious to instruction and controlled by a language acquisition device. Acquisition, Krashen says, is triggered by receiving L2 input that is one step beyond their current stage of linguistic competence. If a learner is at a stage ‘i‘, then acquisition takes place when he/she is exposed to ‘Comprehensible Input’ which belongs to level ‘i + 1‘. In Krashen’s model, learners only need comprehensible input and a low affective filter to acquire the L2, because once the i+1 input is received, Chomsky’s LAD does the rest. Almost needless to say, the trouble with Krashen’s input hypothesis is that he nowhere explains what comprehensible input consists of, or tells us how to recognise it.

Unsurprisingly, Schmidt’s not very impressed with Krashen’s badly-defined hypothesis, but it’s not just the lack of definition that Schmidt objects to; crucially, Schmidt insists that  SLA is triggered by conscious attention. Krashen’s comprehensible input is, says Schmidt, much better seen as intake, itself defined as that part of the input which is ‘noticed’. Because what learners actually do is consciously attend to, notice, certain parts of the input, and the noticed parts becomes intake. Furthermore, since the parts of the input which aren’t ‘noticed’ are lost, it follows that noticing is the necessary condition for learning an L2. In his 1990 paper, at least, the claim is not, as so many now want to interpret the Noticing Hypothesis, “More noticing leads to more learning”, but rather, the much stronger claim “Learning can’t take place without noticing”.

In the next post, I intend to look at processing models and try to pin down Schmidt’s “technical” definition of ‘noticing’, which he says is “equivalent” to Gass’ ‘apperception’.  Hmmm. I’ll also look at Suzanne Carroll’s very different view of input. She says:

The view that input is comprehended speech is mistaken  and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. …… Comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn it’s grammatical properties. Krashen got it backwards!”  

To be continued.


Carroll, S. (2001) Input and Evidence. Amsterdam; Benjamins.

Corder, P. (1967) The significance of learners’ errors. International Review of Applied Linguistics, 5, 161-169

Ellis, N. (1998) Emergentism, Connectionism and Language Learning. Language Learning 48:4,  pp. 631–664.

O’Grady, W. (2005) How Children learn language. CUP.


Encounters with Noticing Part 1

Note: This is an edited version of the original.

Dr. Conti claims that parallel texts, are good because they encourage ‘noticing’. Dr. Conti explains:

According to Schmidt’s (1990) ‘Noticing hypothesis’ the learning of a foreign language grammar structure cannot occur unless the learner ‘notices’ the gap between the way that structure is used in the target language and his/her own L1. In my classroom experience I have witnessed many a time that Eureka moment when a student said, almost thinking aloud, “Oh, I get it! ‘I went’ in French is actually ‘I am gone’. That would be an occurrence of ‘noticing’

Well, but would it?  Surely Schmidt’s noticing hypothesis makes no such claim; surely noticing the gap is more a trigger for noticing than noticing itself, and anyway, surely it’s not a question of noticing the gap between the L1 and the L2 but between features in input and output? Isn’t it?

Truscott (1998) suggests that we ignore the “noticing the gap” claim altogether:

Proponents of noticing also give much attention to noticing the gap – learners’ awareness of a  mismatch between the input and their current interlanguage (see especially Schmidt and Frota, 1986). It is important to avoid confusion between this idea, which necessarily involves awareness, and the more general notion of a comparison between input and interlanguage. Theories of unconscious acquisition naturally hypothesize an unconscious comparison process. Thus, arguments that learners must compare input to their interlanguage grammar (e.g., Ellis, 1994b) are not arguments for noticing.    

Since Schmidt says that the conscious comparison of input and interlanguage triggers noticing, Truscott surely contradicts Schmidt by claiming that ‘noticing the gap’ is nothing to do with ‘noticing’, doesn’t he?  On the other hand, isn’t Truscott right to challenge the claim that the only way L2 learners make progress in interlanguage development is through consciously attending to new features of the L2 that are present in the input?

It’s important to try to clarify what Schmidt’s noticing hypothesis says, and then to evaluate its claimsw because these days it’s being used  to support all manner of explicit teaching practices. Whether it’s presenting the present perfect in a grammar box, or  making the explicit teaching of lexical chunks the number one priority in teaching, or using a red pen to indicate errors in a composition, it’s all OK because the Noticing Hypothesis says that bringing things to learners’ attention is a good thing. Schmidt’s construct has been watered down so much that it now means no more that noticing in the everyday meaning of the word.

Schmidt (1990) says

subliminal language learning is impossible …… noticing is the necessary and sufficient condition for converting input into intake (Schmidt, 1990:  130).

This seems to say that input can’t get processed without being noticed. If it does, then all second language learning is conscious, a claim which is either trivially true (by adopting some very weak definition of ‘conscious’ or ‘learning’), or obviously false.

Schmidt says that the term ‘unconscious’ is used in three distinct senses:

  1. to describe learning without ‘intention’,
  2. to describe learning without metalinguistic ‘understanding’,
  3. to describe learning without attention and ‘awareness’.

He goes on to assert that although L2 learning without intention or metalinguistic understanding is clearly possible, there can be no learning without attention, accompanied by the subjective experience of being aware of – that is of ‘noticing’ – aspects of the surface structure of the input. Intake is

that part of the input which the learner notices … whether the learner notices a form in linguistic input because he or she was deliberately attending to form, or purely inadvertently.  If noticed, it becomes intake (Schmidt, 1990: 139).

So it seems that you can notice things purely inadvertently, without paying attention, but with focal awareness. What can this mean?

In the 1990 article, Schmidt says that in order to learn something new about the target language,

  • we notice = we pay attention = we are aware = we are focally aware
  • we deliberately attend to form
  • we can notice purely inadvertently – we perceive competing stimuli and may notice them if we choose
  • our focus of attention is on surface structures in the input
  • storage without conscious awareness is impossible

The primary evidence for the claim that noticing is a necessary condition for storage comes from studies in which the focus of attention is experimentally controlled. The basic finding, that memory requires attention and awareness, was established at the very beginning of research within the information processing model.

The 3 main sources I refer to are: Schmidt and Frota (1986), Schmidt, 1990, and Schmidt 2001.

Schmidt claims that ‘noticing’ can be operationally defined as “the availability for verbal report”, “subject to various conditions”.  He adds that these conditions are discussed at length in the verbal report literature,  but he does not discuss the issue of operationalisation further until 2001, and even there he fails to provide any reliable way of knowing if and when ‘noticing’ is being used.

In the 2001 article Schmidt says that ‘noticing’ is related to attention. Attention as a psychological construct refers to a variety of mechanisms or subsystems (including alertness, orientation, detection within selective attention, facilitation, and inhibition) which control information processing and behaviour when existing skills and routines are inadequate. Hence, learning is “largely, perhaps exclusively a side effect of attended processing”. (Schmidt, 2001: 25). But what does “attended processing” refer to? Is it ‘noticing’? Is attention the same as awareness?  Recall Truscott:

current research and theory on attention, awareness and learning are not clear enough to  support any strong claims about relations among the three. … they do not offer any basis for strong claims of the sort embodied in the Noticing Hypothesis (Truscott, (1998, p. 106).

Start again. ‘Noticing’ is part of the first stage of the process of converting input into implicit knowledge. Learners notice language features in the input, absorb them into their short-term memories, and compare them to features produced as output. Noticing takes place inside short-term memory, triggered by different influences, namely instruction, perceptual salience, frequency, skill level, task demands, and comparing.

So second language acquisition is a process that starts with input going through a necessary stage in short-term memory where “language features” are noticed. Surely not! All language features in the L2 shuffle through short-term memory and if unnoticed have to re-present themselves? No! As Gregg said in a comment on this blog:

Noticing is a perceptual act; you can’t perceive what is not in the senses, so far as I know. Connections, relations, categories, meanings, essences, rules, principles, laws, etc. are not in the senses.

Schmidt can’t expect us to accept that our knowledge of language is the result of noticing things in the input.

And how had the Noticing Hypothesis come to be accepted as an explanation of how input becomes intake, prior to processing and availability for integration into a learner’s developing interlanguage system? I found R. Ellis’ diagram, which is reproduced all over the place:

It appears to suggest that the 3 constructs of ‘noticing’, comparing and integrating are what turn input into output and explain IL development. Can it really be making such a claim? Where’s the noticing supposed to take place according to the figure? And what  is short/medium-term memory? Anyway, as Cross (2002) points out, Ellis (1994, 1997), Lewis (1993), Skehan (1998), Gass (1988), Batstone (1994), Lynch (2001), Sharwood-Smith (1981), Rutherford (1987) and McLaughlin (1987) all agree that noticing a feature in the input is an essential first step in language processing.

Long (2015) SLA and TBLT, says:

With Nick Ellis and others, what I claim is that explicit learning (not necessarily as a result of explicit instruction) involves a new form or form–meaning connection being held in short-term memory long enough for it to be processed, rehearsed, and an initial representation stored in long-term memory, thereafter altering the operation of the way additional exemplars of the item in the input are handled by the default implicit learning process. It is analogous to setting a radio dial to a new frequency. The listener has to pay close attention to the initial crackling reception. Once the radio is tuned to the new frequency, he or she can sit back, relax, and listen to the broadcast with minimal effort. Ellis identifies what he calls the general principle of explicit learning in SLA: “Changing the cues that learners focus on in their language processing changes what their implicit learning processes tune” (Ellis 2005, p. 327). The prognosis improves for both simple and complex grammatical features, including fragile features, and for acquisition in general, if adult learners’ attention is drawn to problems, so that they are noticed (Schmidt 1990 and elsewhere). This is the first of four or five main stages in the acquisition process (Chaudron 1985; Gass 1997), in which what is noticed is held and processed in short-term, or working, memory long enough for it to be compared with what is in storage in long-term memory, and, as a result, a sub-set of input becomes intake.

A couple of pages on:

Noticing in Schmidt’s sense, where the targets are the subject of focal attention, facilitates the acquisition of new items, especially non-salient ones, and as Schmidt maintains, and as demonstrated by 20 years of studies, from Schmidt and Frota (1986) to Mackey (2006), “more noticing leads to more learning” (Schmidt 1994, p. 18).

And then:  

Crucially, however, as claimed by Gass (1997), and as embodied in the tallying hypothesis (N.C. Ellis 2002a,b), once a new form or structure has been noticed and a first representation of it established in long-term memory, Gass’ lower-level automatic apperception, and Tomlin and Villa’s detection, can take over, with incidental and implicit learning as the default process.

Gass claims that apperception is “the process of understanding by which newly observed qualities of an object are related to past experiences”. It “serves as selective cueing for the very first step of converting input into intake”. It “relates to the potentiality of comprehension of input, but does not guarantee that it will result in intake”.

What does “apperception relates to the potentiality of comprehension of input” mean? Perhaps it relates to Long’s statement

whether detection without prior noticing is sufficient for adult learning of new L2 items is still unclear – perhaps one of the single most critically important issues, for both SLA theory and LT, awaiting resolution in the field.

Long goes on to say:

So the first representation in long-term memory primes the learner to unconsciously perceive subsequent instances in the input. The big question is of course whether noticing is necessary for any representation to be established in long-term memory: is consciously attending to and detecting a form or form-meaning connection in the input the necessary first stage in the process of acquiring some features and form–meaning connections?  Long calls this “perhaps one of the single most critically important issues, for both SLA theory, and language teaching, awaiting resolution in the field”. 

Rather than seeing ‘noticing’ as the necessary and sufficient condition of SLA, Long says that incidental and implicit learning are still the main ways adults learn an L2, and that while noticing might facilitate the acquisition of “new items”, it’s still an open question as to whether it’s a necessary condition for acquisition.


To be continued.


Cross, J. (2002) ‘Noticing’ in SLA: Is it a valid concept? Downloaded from

Ellis, R. (1997) SLA Research and Language Teaching. OUP

Krashen, S. (1994) The input hypothesis and its rivals. In N. Ellis (Ed.), Implicit and explicit learning of language, (pp. 45-77). London: Academic Press.

Long, M.H. (2015) Second Language Acquisition and Task Based Language Learning. Wiley.

Schmidt,R.W. (1990) The role of consciousness in second language learning. Applied Linguistics 11, 129–58.

Schmidt, R. (2001) Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.3-32). Cambridge University Press.

Schmidt, R. and Frota, S.N.  (1986) Developing  basic  conversational  ability in  a  second language:  a  case  study of an adult learner of Portuguese . In Day , R.R., editor , Talking to learn: conversation in second language acquisition. Rowley, MA: Newbury.

Truscott, J. (1998). Noticing in second language acquisition: A critical review. SLA Research 14, 103-135.