Empiricist Emergentism

Introduction

Emergentism is an umbrella term referring to a fast growing range of usage-based theories of SLA which adopt “connectionist” and associative learning views, based on the premise that language emerges from communicative use. Many proponents of emergentism, not least the imaginative Larsen-Freeman, like to begin by pointing to the omnipresence of complex systems which emerge from the interaction of simple entities, forces and events. Examples are:

The chemical combination of two substances produces, as is well known, a third substance with properties different from those of either of the two substances separately, or both of them taken together. Not a trace of the properties of hydrogen or oxygen is observable in those of their compound, water. (Mill 1842, cited in O’Grady, 2021).

Bee hives, with their carefully arranged rows of perfect hexagons, far from providing evidence of geometrical ability in bees actually provides evidence for emergence – The hexagonal shape maximizes the packing of the hive space and the volume of each cell and offers the most economical use of the wax resource… The bee doesn’t need to “know” anything about hexagons. (Elman, Bates, Johnson, Karmiloff-Smith Parisi & Plunkett, 1996, cited in O’Grady, 2021).

Larsen-Freeman’s own favorite is a murmuration of starlings, as in the photo, above. In her plenary at the IATEFL 2016 conference, the eminent scholar seemed almost to float away herself, up into the rafters of the great hall, as she explained:

Instead of thinking about reifying and classifying and reducing, let’s turn to the concept of emergence – a central theme in complexity theory. Emergence is the idea that in a complex system different components interact and give rise to another pattern at another level of complexity.

A flock of birds part when approached by a predator and then they re-group. A new level of complexity arises, emerges, out of the interaction of the parts.

All birds take off and land together. They stay together as a kind of superorganism. They take off, they separate, they land, as if one.

You see how that pattern emerges from the interaction of the parts?

Personally, I fail to grasp the force of this putative supporting evidence for emergentism, which strikes me as unconvincing, not to say ridiculous. I find the associated claim that complex systems exhibit ‘higher-level’ properties which are neither explainable, nor predictable from ‘lower-level’ physical properties, but which, nevertheless have causal and hence explanatory efficacy slightly less ridiculous, but still unconvincing, and surely hard to square with empiricist principles. So, moving quickly on, let’s look at emergentist theories of language learning. Note that the discussion is mostly of Nick Ellis’ theory of emergentism, which he applies to SLA.

What Any Theory of SLA Must Explain

Kevin Gregg (1993, 1996, 2000, 2003) insists that any theory of SLA should do two things: (1) describe what knowledge is acquired (a property theory describing what language consists of and how it’s organised), and (2) explain how that knowledge is acquired (a causal transition theory ). Chomsky’s principles and parameters theory offers a very technical description of “Universal Grammar”, consisting of clear descriptions of grammar principles which make up the basic grammar of all natural languages, and the parameters which apply to particular languages. It describes what Chomsky calls “linguistic competence” and it has served as a fruitful property theory guiding research for more than 50 years. How is this knowledge acquired? Chomsky’s answer is contained in a transition theory that appeals to an innate representational system located in a module of the mind devoted to language, and by innate mechanisms which use that system to parse input from the environment, set parameters, and learn how the particular language works.

But UG has come under increasing criticism. Critics suggest that UG principles are too abstract, that Chomsky has more than once moved the goal posts, that the “Language Acquisition Device is a biologically implausible “black box”, that the domain is too narrow, and that we now have better ways to explain the phenomena that UG theory tackles. Increasingly, emergentist theories are regarded as providing better explanations.

Emergentist theories

There is quite a collection of emegentist theories, but we can distinguish between emergentists who rely on associative learning, and those who believe that “achieving the explanatory goals of linguistics will require reference to more just transitional probabilities” (O’Grady, 2008, p. 456). In this first post, I’ll concentrate on the first group, and refer mostly to the work of its leading figure, Nick Ellis. The reliance on associative learning leads to this group often being referred to as “empiricist emergentists”.

Empiricist emergentists insist that language learning can be satisfactorily explained by appeal to the rich input in the environment and simple learning processes based on frequency, without having to resort to abstract representations and an unobservable “Language Acquisition Device” in the mind.

Regarding the question of what knowledge is acquired, the emergentist case is summarised by Ellis & Wulff (2020, p. 64-65).

The basic units of language representation are constructions. Constructions are pairings of form and meaning or function. Words like squirrel are constructions: a form — that is, a particular sequence of letters or sounds — is conventionally associated with a meaning (in the case of squirrel, something like “agile, bushy-tailed, tree-dwelling rodent that feeds on nuts and seeds)”.

In Construction Grammar, constructions, are wide-ranging. Morphemes, idiomatic expressions, and even abstract syntactic frames are constructions:

sentences like Nick gave the squirrel a nut, Steffi gave Nick a hug, or Bill baked Jessica a cake all have a particular form (Subject-Verb-Object-Object) that, regardless of the specific words that realize its form, share at least one stable aspect of meaning: something is being transferred (nuts, hugs, and cakes).

Furthermore, some constructions have no meaning – they serve more functional purposes;

passive constructions, for example, serve to shift what is in attentional focus by defocusing the agent
of the action (compare an active sentence such as Bill baked Jessica a cake with its passive counterpart A cake was baked for Jessica).

Finally,

constructions can be simultaneously represented and stored in multiple forms and at various levels of abstraction: table + s = tables; [Noun] + (morpheme -s) = “plural things”). Ultimately, constructions blur the traditional distinction between lexicon and grammar. A sentence is not viewed as the application of grammatical rules to put a number of words obtained from the lexicon in the right order; a sentence is instead seen as a combination of constructions, some of which are simple and concrete while others are quite complex and abstract. For example, What did Nick give the squirrel? comprises the following constructions:


• Nick, squirrel, give, what, do constructions
• VP, NP constructions
• Subject-Verb-Object-Object construction
• Subject-Auxiliary inversion construction

We can therefore see the language knowledge of an adult as a huge warehouse of constructions.

As to language learning, it is not about learning abstract generalizations, but rather about inducing general associations from a huge collection of memories: specific, remembered linguistic experiences.

The learner’s brain engages simple learning mechanisms in distributional analyses of the exemplars of a given form-meaning pair that take various characteristics of the exemplar into consideration, including how frequent it is, what kind of words and phrases and larger contexts it occurs with, and so on” (Ellis & Wulff, 2020, p. 66).

The “simple learning mechanisms” amount to associative learning. The constructions are learned through “the associative learning of cue-outcome contingencies” determined by factors relating to the form, the interpretation, the contingency of form and function; and learner attention. Language learning involves “the gradual strengthening of associations between co-occurring elements of the language”, and fluent language performance involves “the exploitation of this probabilistic knowledge” (Ellis, 2002, p. 173). Based on sufficiently frequent cues pairing two elements in the environment, the learner abstracts to a general association between the two elements.

Here’s how it works:

When a learner notices a word in the input for the first time, a memory is formed that binds its features into a unitary representation, such as the phonological sequence /wʌn/or the orthographic sequence one. Alongside this representation, a so-called detector unit is added to the learner’s perceptual system. The job of the detector unit is to signal the word’s presence whenever its features are present in the input. Every detector unit has a set resting level of activation and some threshold level which, when exceeded, will cause the detector to fire. When the component features are present in the environment, they send activation to the detector that adds to its resting level, increasing it; if this increase is sufficient to bring the level above threshold, the detector fires. With each firing of the detector, the new resting level is slightly higher than the previous one—the detector is primed. This means it will need less activation from the environment in order to reach threshold and fire the next time. Priming events sum to lifespan-practice effects: features that occur frequently acquire chronically high resting levels. Their resting level of activation is heightened by the memory of repeated prior activations. Thus, our pattern-recognition units for higher-frequency words require less evidence from the sensory data before they reach the threshold necessary for firing. The same is true for the strength of the mappings from form to interpretation. Each time /wʌn/ is properly interpreted as one, the strength of this connection is incremented. Each time /wʌn/ signals won, this is tallied too, as are the less frequent occasions when it forewarns of wonderland. Thus, the strengths of form-meaning associations are summed over experience. The resultant network of associations, a semantic network comprising the structured inventory of a speaker’s knowledge of language, is tuned such that the spread of activation upon hearing the formal cue /wʌn/ reflects prior probabilities of its different interpretations (Ellis & Wulff, 2020, p. 67).

The authors add that other additional factors need to be taken into account, and this one is particularly important:

..… the relationship between frequency of usage and activation threshold is not linear but follows a curvilinear “power law of practice” whereby the effects of practice are greatest at early stages of learning, but eventually reach asymptote.

Evidence supporting this type of emergentist theory is said to be provided by IT models of associative learning processes in the form of connectionist networks. For example, Lewis & Elman’s (2001) demonstration that a Simple Recurrent Network (SRN) can, among other things, simulate the acquisition of agreement in English from data similar to the input available to children, and the connectionist model reported in Ellis and Schmidt’s 1997 and 1998 papers is another.

Discussion

There have been various criticisms of the empiricist version of emergentism as championed by Ellis, and IMHO, the articles by Eubank & Gregg (2002), and Gregg (2003) remain the most acute. I’ll use them as the basis for what follows.

a) Linguistic knowledge

Regarding their description of the linguistic knowledge acquired, Gregg (2003) points out that emergentists are yet to agree on any detailed description of linguistic knowledge, or even whether such knowledge exists. The doubt about whether or not there’s any such thing as linguistic knowledge is raised by extreme empiricists, such as the logical positivists and behaviourists discussed in my last post, and also the eliminativists involved in connectionist networks, who all insist that the only knowledge we have comes through the senses, representational knowledge of the sort required to explain linguistic competence is outlawed. Ellis and his colleagues don’t share the views of these extremists; they accept that linguistic representations – of some sort or other – are the basis of our language capacity, but they reject any innate representations, and therefore, they need to not just describe the organisation of the representations, but also to explain how the representations are learned from input from the environment.

O’Grady (2011) agrees with Gregg about the lack of consensus among emergentists as to what form linguistic knowledge takes; some talk of local associations and memorized chunks (Ellis 2002), others of a construction grammar (Goldberg 1999, Tomasello 2003), and others of computational routines (O’Grady 2001, 2005). Added to a lack of consensus is a lack of clarity and completeness. O’Grady’s discussion of Lewis & Elman’s (2001) Simple Recurrent Network (SRN), mentioned above, explains how it was able to mimic some aspects of language acquisition in children, including the identification of category-like classes of words, the formation of patterns not observed in the input, retreat from overgeneralizations, and the mastery of subject-verb agreement. However, O’Grady goes on to say that it raises the question of why the particular statistical regularities exploited by the SRN are in the input in the first place.

In other words, why does language have the particular properties that it does? Why, for example, are there languages (such as English) in which verbs agree only with subjects, but no language in which verbs agree only with direct objects?.

Networks provide no answer to this sort of question. In fact, if presented with data in which verbs agree with direct objects rather than subjects, an SRN would no doubt “learn” just this sort of pattern, even though it is not found in any known human language.

There is clearly something missing here. Humans don’t just learn language; they shape it. Moreover, these two facts are surely related in some fundamental way, which is why hypotheses about how linguistic systems are acquired need to be embedded within a more comprehensive theory of why those systems (and therefore the input) have the particular properties that they do. There is, simply put, a need for an emergentist theory of grammar. (O’Grady, 2011, p. 4).  

In conclusion, then, some leading emergentists themselves agree that emergentism has not, so far, offered any satisfactory description of the knowledge of the linguistic system that is required of a property theory. An unfinished construction grammar that is brought to bear on “a huge collection of memories, specific, remembered linguistic experiences”, seems to be as far as they’ve got.  

Associative learning

Whatever the limitations of the emergentists’ sketchy account of linguistic knowledge might be, their explanation of the process of language learning (which is, after all, their main focus) seems to have more to recommend it, not least its simplicity. In the case of empiricist emergentists, the explanation relies on associative learning: learners make use of simple cognitive mechanisms to implicitly recognise frequently-occurring associations among elements of language found in the input. To repeat what was said above, the theory states that constructions are learned through the associative learning of cue-outcome contingencies. Associations between co-occurring elements of language found in the input are gradually strengthened by successive encounters, and, based on sufficiently frequent cues pairing these two elements, the learner abstracts to a general association between them. To this simplest of explanations, a few other elements are attached, not least the “power law of practice”. In his 2002 paper on frequency effects in language processing, Ellis cites Kirsner (1994)’s claim that the strong effects of word frequency on the speed and accuracy of lexical recognition are explained by the power law of learning,

which is generally used to describe the relationships between practice and performance in the acquisition of a wide range of cognitive skills. That is, the effects of practice are greatest at early stages of learning, but they eventually reach asymptote. We may not be counting the words as we listen or speak, but each time we process one there is a reduction in processing time that marks this practice increment, and thus the perceptual and motor systems become tuned by the experience of a particular language (Ellis, 2002, p. 152).

Eubank & Gregg (2002, p. 239) suggest that there are many areas of language learning which the emergentist explanation can’t explain. For example:

Ellis aptly points to infants’ ability to do statistical analyses of syllable frequency (Saffran et al., 1996); but of course those infants haven’t learned that ability.  What needs to be shown is how infants uniformly manage this task:  why they focus on syllable frequency (instead of some other information available in exposure), and how they know what a syllable is in the first place, given crosslinguistic variation.  Much the same is true for other areas of linguistic import, e.g. the demonstration by Marcus et al. (1999) that infants can infer rules.  And of course work by Crain, Gordon, and others (Crain, 1991; Gordon, 1985) shows early grammatical knowledge, in cases where input frequency could not possibly be appealed to. .  All of which is to say, for starters, that such claims as that “learners need to have processed sufficient exemplars.” (p.40) are either outright false, or else true only vacuously (if “sufficient” is taken to range from as low a figure as 1).

Eubank & Gregg (2002, p. 240) also question emergentist use of key constructs. For example:

The Competition Model, for instance, relies heavily on the frequency (and reliability) of so-called “cues”.  The problem is that it is nowhere explained just what a cue is, or what could be a cue; which is to say that the concept is totally vacuous (Gibson, 1992).  In the absence of any principled characterization of the class of possible cues, an explanation of acquisition that appeals to cue-frequency is doomed to arbitrariness and circularity .  (The same goes, of course, for such claims as Ellis’s [p.54] that “the real stuff of language acquisition is the slow acquisition of form-function mappings,” in the absence of any criterion for what counts as a possible function and what counts as a possible form.)

In his (2003) article, Gregg has more to say about cues:   

The question then arises, What is a cue, that the environment could provide it? Ellis, for example, says, ‘in the input sentence “The boy loves the parrots,” the cues are: preverbal positioning (boy before loves), verb agreement morphology (loves agrees in number with boy rather than parrots), sentence initial positioning and the use of the article the)” (1998: 653). In what sense are these ‘cues’ cues, and in what sense does the environment provide them? What the environment can provide, after all, is only perceptual information, for example, the sounds of the utterance and the order in which they are made. (Emphasis added.) So in order for ‘ boy before loves’ to be a cue that subject comes before verb, the learner must already have the concepts subject and verb. But if subject is one of the learner’s concepts, on the emergentist view, he or she must have learned that; the concept subject must ’emerge from learners’ lifetime analysis of the distributional characteristics of the language input,’ as Ellis (2002a: 144) puts it (Gregg, 2003, p. 120).

Connectionist Models

Gregg (2003) goes to some length to critique the connectionist model reported in Ellis and Schmidt’s 1997 and 1998 papers. The model was made to investigate “adult acquisition of second language morphology using an artificial second language in which frequency and regularity were factorially combined” (1997, p. 149). The experiment was designed to test “whether human morphological abilities can be understood in terms of associative processes’ (1997, p. 145) and to show that “a basic principle  of  learning, the power  law of practice, also generates frequency by regularity  interactions” (1998, p. 309). The authors claimed that the network learned both the singular and plural forms for 20 nonce nouns, and also learned the ‘regular’ or ‘default’ plural prefix. In subsequent publications, Ellis claimed that the model gives strong support to the notion that acquisition of morphology is a result of simple associative learning principles and that the power law applies to the acquisition of morphosyntax. Gregg’s (2003) paper does a thorough job of refuting these claims.

Gregg begins by pointing out that connectionism itself is not a theory, but rather a method, “which in principle is neutral as to the kind of theory to which it is applied”. He goes on to point out the severe limitations of the Ellis and Schmidt experiment. In fact, the network didn’t learn the 20 nouns, or the 11 prefixes; it merely learned to associate the nouns with the prefixes (and with the pictures) – it started with the 11 prefixes, and was trained such that only one prefix was reinforced for any given word. Furthermore, the model was slyly given innate knowledge!   

Although Ellis accepts that linguistic representations – of some sort or other – are the basis of our language capacity, he rejects the nativist view that the representations are innate, and therefore he needs to explain how the representations are acquired. In the Ellis & Schmidt model, the human subjects were given pictures and sounds to associate, and the network was given analogous input units to associate with output units. But, while the human participants in the experiment were shown two pictures and were left to infer plurality (rather than, say, duality or repetition or some other inappropriate concept), the network was given the concept of plurality free as one of the input nodes (and was given no other concept). (Emphasis added.) Gregg comments that while nativists who adopt a UG view of linguistic knowledge can easily claim that the concept of plurality is innate, Ellis cannot do so, and thus he must explain how the concept of plurality has been acquired, not just make it part of the model’s structure. So, says Gregg, the model is “fudging; innate knowledge has sneaked in the back door, as it were”. Gregg continues:

Not only that, but it seems safe to predict that the human subjects, having learned to associate the picture of an umbrella with the word ‘broil’, would also be able to go on to identify an actual umbrella as a ‘broil’, or a sculpture or a hologram of an umbrella as representations of a ‘broil’. In fact, no subject would infer that ‘broil’ means ‘picture of an umbrella’. And nor would any subject infer that ‘broil’ meant the one specific umbrella represented by the picture. But there is no reason whatever to think that the network can make similar inferences (Gregg, 2003, p. 114).

Emergentism and Instructed SLA

Ellis and others who are developing emergentist theories of SLA stress that, at least for monolingual adults, the process of SLA is significantly affected by the experience of learning ones’ native language. Children learn their first language implicitly, through associative learning mechanisms acting on the input from the environment, and any subsequent learning of more lanaguages is similar in this respect. However, monolingual adult L2 learners “suffer” from the successful early learning of their L1, because the success results in implicit input processing mechanisms being set for the L1, and the knock-on effect is that the entrenched L1 processing habits work against them, leading them to apply entrenched habits to an L2 where they do not apply. Ellis argues that the filtering of L2 input to L1-established attractors leads to adult learners failing to acquire certain parts of the L2, which are referred to as its “fragile” features (a term coined by Goldin-Meadow, 1982, 2003). Fragile features are non-salient – they pass unnoticed – and they are identified as being one or more of infrequent, irregular, non-syllabic, string-internal, semantically empty, and communicatively redundant.

Ellis (2017) (supported by Long, 2015), suggests that teachers should use explicit teaching to facilitate implicit learning, and that the principle aim of explicit teaching should be to help learners modify entrenched automatic L1 processing routines, so as to alter the way subsequent L2 input is processed implicitly. The teacher’s aim should be to help learners to consciously pay attention to a new form, or form–meaning connection and to hold it in short-term memory long enough for it to be processed, rehearsed, and an initial representation stored in long-term memory. Nick Ellis (2017) calls this “re-setting the dial”: the new, better exemplar alters the way in which subsequent exemplars of the item in the input are handled by the default implicit learning process.

It’s interesting to see what Long (2015, p. 50) says in his major work on SLA and TBLT:

A plausible usage-based account of (L1 and L2) language acquisition (see, e.g., N.C. Ellis 2007a,b, 2008c, 2012; Goldberg & Casenhiser 2008; Robinson & Ellis 2008; Tomasello 2003), with implicit learning playing a major role, begins with initially chunk-learned constructions being acquired during receptive or productive communication, the greater processability of the more frequent ones suggesting a strong role for associative learning from usage. Based on their frequency in the constructions, exemplar-based regularities and prototypical morphological, syntactic, and other patterns – [Noun stem-PL], [Base verb form-Past], [Adj Noun], [Aux Adv Verb], and so on – are then induced and abstracted away from the original chunk-learned cases, forming the basis for attraction, i.e., recognition of the same rule-like patterns in new cases (feed-fed, lead-led, sink-sank-sunk, drink-drank-drunk, etc.), and for creative language use.

In sum, …….., while incidental and implicit learning remain the dominant, default processes, their reduced power in adults indicates an advantage, and possibly a necessity (still an open question), for facilitating intentional initial perception of new forms and form–meaning connections, with instruction (focus on form) important, among other reasons, for bringing new items to learners’ focal attention. Research may eventually show such “priming” of subsequent implicit processing of those forms in the input to be unnecessary. Even if that turns out to be the case, however, opportunities for intentional and explicit learning are likely to speed up acquisition and so becomes a legitimate component of a theory of ISLA, where efficiency, not necessity and sufficiency, is the criterion for inclusion.

It should be obvious from the earlier discussion above that I’m persuaded by the criticisms of Eubank, Gregg, O’Grady (and many others!) to reject empricist emergentism as a theory of SLA, and I confess to having felt surprised when I first read the quotation above. Never mind. What I think is interesting is that a different explanation of SLA – one which allows for innate knowledge, a “bootstrapping” view of the process of acquisition, and interlanguage development – has some important things in common with emergentism, which can be incorportated into a theory of ISLA (Instructed Second Language Acquisition). Such a theory needs to look more carefully at the effects of different syllabuses, materials and teacher interventions on students learning in different environments, in order to assess their efficacy, but I’m sure it will begin with the commonly accepted view among SLA scholars that, regardless of context, implicit learning drives SLA, and that explicit instruction can best be seen as a way of speeding up this implicit learning.

Conclusion

At the root of the problem of any empiricist account is the poverty of the stimulus argument. Gregg (2003, p. 101) summarises Laurence and Margolis’ (2001: 221) “lucid formulation” of it:

1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.

2. The correct set of principles need not be (and typically is not) in any pre-theoretic sense simpler or more natural than the alternatives.

3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.

4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.

5. Children do reliably arrive at the correct grammar for their language.

6. Therefore children are not empiricist learners. 

By adopting an associative learning model and an empiricist epistemology (where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations), emergentists have a very difficult job explaining how children come to have the linguistic knowledge they do. How can general conceptual representations acting on stimuli from the environment explain the representational system of language that children demonstrate? I don’t think they can.

In the next post, I’ll discuss William O’Grady’s version of emergentism.  

References

Bates, E., Elman, J., Johnson, M., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1998).  Innateness and emergentism. In W. Bechtel & G. Graham (Eds.), A companion to cognitive science (pp. 590-601).  Basil Blackwell.

Ellis, N. (2002) Frequency effects in language processing: A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in SLA, 24,2, 143-188.

Ellis, N. (2015) Implicit AND Explicit Language Learning: Their dynamic interface and complexity. In Rebuschat, P. (Ed.). (2015). Implicit and explicit learning of languages, (pp. 3-23). Amsterdam: John Benjamins.

Ellis, N., & Schmidt, R. (1997). Morphology and longer distance dependencies: Laboratory Research Illuminating the A in SLA. Studies in Second Language Acquisition, 19(2), 145-171

Ellis, N. & Wulff, S. (2020) Usage-based approaches to l2 acquisition. In (Eds) VanPatten, B., Keating, G., & Wulff, S. Theories in Second Language Acquisition: An Introduction. Routledge.

Eubank, L. and Gregg, K. R. (2002) News Flash – Hume Still Dead. Studies in Second Language Acquisition, 24, 2, 237-248.

Gregg, K. R. (1993). Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.

Gregg, K. R. (1996). The logical and developmental problems of second language acquisition.  In Ritchie, W.C. and Bhatia, T.K. (eds.) Handbook of second language acquisition. Academic Press. 

Gregg, K. R. (2000). A theory for every occasion: postmodernism and SLA.  Second Language Research 16, 4, 34-59.

Gregg, K. R. (2001). Learnability and SLA theory. In Robinson, P. (Ed.) Cognition and Second Language Instruction.  CUP.

Gregg, K. R. (2003) The State of Emergentism in Second Language Acquisition.  Second Language Research, 19, 2, 95-128. 

O’Grady, W., Lee, M. & Kwak, H. (2011) Emergentism and Second Language Acquisition. In W. Ritchie & T. Bhatia (eds.), Handbook of Second Language Acquisition. Emerald Press.

O’Grady, W.(2011).  Emergentism. In Hogan, P. (ed). The Cambridge Encyclopedia of Language Sciences, Cambridge University Press.

Seidenburg, M. and Macdonald, M. (1997) A Probabilistic Constraints Approach to Language Acquisition and Processing. Cognitive Science, 23, 4, 569–588.

What Is Empiricism?

Introduction

Emergentist theories of language learning are now so prevelent that their effects are being seen in the ELT world, where leading teacher educators refer to various emergentist constructs (e.g., priming, constructions, associative learning) and increasingly adopt what they take to be an emergentist view of L2 learning. Within emergentism, there is an interesting difference of opinion between those (the majority, probably) who follow the “input-based” or “empiricist” emergentist approach as proposed by Nick Ellis, and those who support the “processor” approach of William O’Grady. In preparation for a revised post on emergentism, I here discuss empiricism.

Rationalism vs Empiricism

In Discourse on Method, Descartes (1969 [1637]) describes how he examined a piece of wax. It had a certain shape, colour, and dimension. It had no smell, it made a dull thud when struck against the wall, and it felt cold. When Descartes heated the wax, it started to melt, and everything his senses had told him about the wax turned to its opposite – the shape, colour and dimensions changed, it had a pungent odour, it made little sound and it felt hot.  How then, asked Descartes, do I know that it is still a piece of wax?  He adopted a totally sceptical approach, supposing that a demon was doing everything possible to delude him. Perhaps it wasn’t snowing outside,  perhaps it wasn’t cold, perhaps his name wasn’t Rene, perhaps it wasn’t Thursday.  Was there anything that could escape the Demon hypothesis?  Was there anything that Descartes could be sure he knew?  His famous conclusion was that the demon could not deny that he thought, that he asked the question “What can I know?”  Essentially, then, it was his capacity to think, to reason, that was the only reliable source of knowledge, and hence Descartes’ famous “Cogito ergo sum”, I think, therefore I am. Descartes based his philosophical system on the innate ability of the thinking mind to reflect on and understand our world.  We are, in Descartes’ opinion, unique in having the ability to reason, and it is this capacity to reason that allows us to understand the world. 

But equally important to the scientific revolution of the early 17th century was the empirical method championed by Francis Bacon.  In The Advancement of Learning (Bacon, 1974 [1605]), Bacon claimed that the crucial issue in philosophy was epistemological, i.e., reliable knowledge, and proposed that empirical observation and experiments should be recognised as the way to obtain such knowledge. (Note that empirical observation means observations of things in the world that we experience through our senses, not to be confused with the epistemological view adopted by empiricists – see below.) Bacon’s proposal is obviously at odds with Descartes’s argument: it claims that induction, not deduction, should guide our thinking. Bacon recommends a bottom-up approach to scientific investigation: carefully conducted empirical observations should be the firm base on which science is built. Scientists should dispassionately observe, measure, and take note, in such a way that, step by careful step, checking continuously along the way that the measurements are accurate and that no unwarranted assumptions have crept in, they accumulate such an uncontroversial mass of evidence that they cannot fail to draw the right conclusions from it. Thus they finally arrive at an explanatory theory of the phenomena being investigated whose truth is guaranteed by the careful steps that led to it.

In fact, if one actually stuck to such a strictly empirical programme, it would be impossible to arrive at any general theory, since there is no logical way to derive generalisations from facts (see Hume, below). Equally, it is impossible to develop a rationalist epistemology from Descartes’ “Cogito ergo sum”, since the existence of an external world does not follow. In both cases, compromises were needed, and, in fact, more “practical” inductive and deductive processes were both used in the development of scientific theories, although we can note the differences between the more conservative discoverers and the more radical inventors and “big theory” builders, throughout the development of modern science in general, and in the much more recent and restricted development of SLA theory in particular. Larsen-Freeman and Long (1991), for example, talk about two research traditions in SLA: “research then theory”, and “theory then research”, and these obviously correspond to the inductive and deductive approaches respectively.    

In linguistics, the division between “empiricist” and “rationalist” camps is noteworthy for its incompatibility. The empiricists, who held sway, at least in the USA, until the 1950s, and whose most influential member was Bloomfield, saw their job as field work: accompanied with tape recorders and notebooks, the researcher recorded thousands of hours of actual speech in a variety of situations and collected samples of written text. The data was then analysed in order to identify the linguistic patterns of a particular speech community. The emphasis was very much on description and classification, and on highlighting the differences between languages. We might call this the botanical approach, and its essentially descriptive, static, “naming of parts” methodology depended for its theoretical underpinnings on the language learning explanation provided by the behaviourists.

Behaviourism

Behaviourism was first developed in the early twentieth century by the American psychologist John B. Watson, who attempted to make psychological research “scientific” by using only objective procedures, such as laboratory experiments which were designed to establish statistically significant results. Watson (see Toates and Slack, 1990: 252-253) formulated a stimulus-response theory of psychology according to which all complex forms of behaviour are explained in terms of simple muscular and glandular elements that can be observed and measured.  No mental “reasoning”, no speculation about the workings of any “mind”, were allowed. Thousands of researchers adopted this methodology, and from 1920 until the 1950s an enormous amount of research on learning in animals and in humans was conducted under this strict empiricist regime. In 1950 behaviourism could justly claim to have achieved paradigm status, and at that moment B.F. Skinner became its new champion. Skinner’s contribution to behaviourism was to challenge the stimulus-response idea at the heart of Watson’s work and replace it by a type of psychological conditioning known as reinforcement (see Skinner, 1957, and Toates and Slack, 1990: 268 – 278).  Note the same insistence on a strict empiricist epistemology (no “reasoning”, no “mind”, no appeal to mental processes), and the claim that language is learned in just the same way as any other complex skill is learned – by social interaction.  

In sharp contrast to the behaviourists and their rejection of “mentalistic” formulations is the approach to linguistics championed by Chomsky.  Chomsky (in 1959 and subsequently), argued that the most important thing about languages was the similarities they shared, what they have in common, not their differences. In order to study these similarities, Chomsky assumed the existence of unobservable mental structures and proposed a “nativist” theory to explain how humans acquire a certain type of knowledge.  A top-down, rationalist, deductive approach is evident here.        

The Empiricists

But let’s return to empiricism. In the second half of the eighteenth century, a new movement in philosophy, known as empiricism, appeared, the most influential proponents being Locke, Mill, and Hume. In a much more radical, more epistemologically-formulated, statement of Bacon’s views, the British empiricists argued that everything the mind knows comes through the senses. As Hume put it:  “The mind has never anything present to it but the perceptions.” (Hume, 1988 [1748]: 145). Starting from the premise that only “experience” (all that we perceive through our senses) can help us to judge the truth or falsity of factual sentences, Hume argued that reliable knowledge of things was obtained by observing the relevant quantitative, measurable data in a dispassionate way. This is familiar territory – Bacon again, we might say – but the argument continues in a way that has dire consequences for rationalism. 

If, as Hume claims, knowledge rests entirely on observation, then there is no basis for our belief in natural laws: we believe in laws and regularities only because of repetition. For example, we believe the sun will rise tomorrow because it has repeatedly done so every 24 hours, but the belief is an unwarranted inductive inference. As Hume so brilliantly insisted, we can’t logically go from the particular to the general: it is an elementary, universally accepted tenet of formal logic that no amount of cumulative instances can justify a generalisation. No matter how many times the sun rises in the East, or thunder follows lightening, or swans appear white, we will never know that the sun rises in the East, or that thunder follows lightning or that all swans are white. This is the famous “logical problem of induction”. To be clear, the empiricists don’t claim that we have empirical knowledge – they limit themselves to the claim that knowledge can only be gained, if at all, by experience. And if the rationalists are right to claim that experience cannot give us knowledge, the conclusion must be that we do not know at all. Hume’s position with regard to causal explanation is the same: such explanations can’t count as reliable knowledge, they are only presupposed to be true in virtue of a particular habit of our minds.

The positivists tried to solve Hume’s devastating critique.

Positivism

Positivism refers to a particularly radical form of empiricism. Comte invented the term, arguing that each branch of knowledge passes through “three different theoretical states: the theological or fictitious state; the metaphysical or abstract state; and, lastly, the scientific or positive state.” (Comte, 1830, cited in Ryan, 1970:36)  At the theological stage, the will of God explains phenomena, at the metaphysical stage phenomena are explained by appealing to abstract philosophical categories, and at the scientific stage, any attempt at absolute explanations of causes is abandoned.  Science limits itself to how observational phenomena are related. Mach, the Austrian philosopher and physicist, headed the second wave, which rooted out the “contradictory” religious elements in Comte’s work, and took advantage of the further progress made in the hard sciences to insist on purging all metaphysics from the scientific method (see Passmore, 1968: 320-321).

The third wave of positivists, whose members were known as The Vienna Circle, included Schlick, Carnap, Godel, and others, and had Russell, Whitehead and Wittgenstein  as interested parties (see Hacking, 1983: 42-44). They developed a programme based on the argument that true science could only be achieved by:

  1. Completely abandoning metaphysical speculation and any form of theology. According to the positivists such speculation only proposed and attempted to solve “pseudo-problems” which lacked any meaning since they were not supported by observable, measurable, experimental data.
  2. Concentrating exclusively on the simple ordering of experimental data according to rules. Scientists should not speak of causes: there is no physical necessity forcing events to happen and all we have in the world are regularities between types of events. There is no room in science for unobservable or theoretical entities .

The programme was a complete fiasco, none of the objectives were realised, and the movement disbanded in the 1930s. “Positivism” in general, and as expounded in the writings of the Vienna Circle in particular, is, in my opinion, a good example of philosophers stubbornly marching up a blind alley. It’s a fundamentally mistaken project, as Popper (1959) demonstrated, and as Wittgenstein (1933) himself recognised. We may note that critics of psycholinguistic theories of SLA who label their opponents “positivists”, are either ignorant of the history of positivism or making a strawman case against what they consider to be a mistakenly “scientific” approach to research. We may also note that empiricism as an epistemological system, if taken to its extreme, leads to a dead end of radical scepticism and solipsism. Therefore, when looking at current discussions among scholars of SLA, it’s of the utmost importance to distinguish between a radical empiricist epistemology on the one hand, and an appeal to empirical evidence on the other.  

The start of the psycholinguistic study of SLA

To conclude, we’ll look briefly at how behaviourism was superseded by Chomsky’s UG, thus ending – for a while anyway! – the hold that empiricism had enjoyed over theories of language learning.  

Chomsky’s  Syntactic Structures (1957), followed by his review in 1959 of Skinner’s Verbal Behaviour (1957), marked the beginning of probably the fastest, biggest, most complete revolution in science that had been seen since the 1930s. Before Chomsky, as indicated above, the field of linguistics was dominated by a Baconian, empiricist methodology, where researchers saw their job almost exclusively as the collection of data.  All languages were seen as composed of a set of meaningful sentences, each composed of a set of words, each in turn composed of phonemes and morphemes. Each language also had a grammar which determined the ways in which words could be correctly combined to form sentences, and how the sentences were to be understood and pronounced. The best way to understand the over 2,500 languages said to exist on earth was to collect and sort data about them so that eventually the patterns characterising the grammar of each language would emerge, and that then, interesting differences among different languages, and even groups of languages, would also emerge. 

Chomsky’s revolutionary argument (Chomsky, 1957, 1965, 1986) was that all human beings are born with innate knowledge of grammar – a fixed set of mental rules that enables young children to relatively quickly understand the language(s) they’re exposed to and to create and utter sentences they’ve never heard before. Language consists of a set of abstract principles that characterise the core grammars of all natural languages, and learning language is simplified by reliance on an innate mechanism that constrains possible grammar formation. Children don’t have to learn key, universal features of the particular language(s) to which they are exposed because they know them already. The job of the linguist was now to describe this generative, or universal, grammar, as rigorously as possible.

The arguments for Universal Grammar (UG) start with the poverty of the stimulus argument: young children’s knowledge of their first language can’t be explained by appealing to the actual, attested language they are exposed to. On the basis of the input young children get, they produce language which is far more complex and rule-based than could be expected, and which is very similar to that of other adult native speakers of the same language variety, at an age when they have difficulty grasping abstract concepts. That their production is rule-based and not mere imitation, as the behaviourist view held, is shown by the fact that they frequently invent unique,  well-formed utterances of their own. That they have an innate capacity to discern well-formed utterances is supported by evidence from tens of thousands of studies (see, for example, Cook & Newson, 1996).

I won’t continue a “defence of UG”. Suffice it to say that Chomsky’s work inspired the development of a psycholinguistic approach which saw L2 learning as a process going on in the mind. Beginning with error analysis and the morpheme studies, this cognitive approach made uneven progress, but Selinker’s (1972) paper, arguing that L2 learners develop their own autonomous mental grammar (interlanguage grammar) with its own internal organising principles is an important landmark. I’ve done a series of posts on all this, Part 8 of which discusses emergentist theories. As indicated, I’m not happy with Part 8, and in the next post, I’ll offer a revised version, where Nick Ellis’ “empiricist” emergentism and William O’Grady’s “mentalist” emergentism will be discussed.   

References

Bacon, F. 1974 [1605] The Advancement of Learning: New Atlantis.  Ed. A. Johnston. Claredon.

Chomsky, N. 1957: Syntactic Structures. The Hague: Mouton.

Chomsky, N.  1965: Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.

Chomsky, N. (1976). Reflections on Language. Temple Smith.

Cook, V. J. & Newson, M. (1996). Chomsky’s Universal Grammar: An Introduction. Blackwell.

Descartes, R. 1969 [1637]. Discourse On Method. In Philosophical Works of Descartes, Volume 1. Trans. E. Haldane and G. Ross.  Cambridge University Press.

Ellis, N. C. (2011). The emergence of language as a complex adaptive system. In J. Simpson (Ed.), Handbook of Applied Linguistics (pp. 666–79), Routledge.

Hacking, I. (1983). Representing and Intervening.  Cambridge University Press

Hume, D. 1988 [1748] An Enquiry Concerning Human Understanding.  Promethius.

Larsen-Freeman, D. & Long, M. H. (1991). An introduction to second language acquisition research. Longman.

Popper, K. R. (1959). The Logic of Scientific Discovery.  Hutchinson.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10, 209-231.

Skinner, B. F. (1957). Verbal behavior.  Appleton-Century-Crofts.

Toates, F. and Slack, I. (1990). Behaviourism and its consequences.  In Roth, I. (ed.) Introduction to Psychology. Psychology Press. 

Wittgenstein, L. (1953) Philosophical Investigations. Translated by G.E.M. Anscombe. Basil Blackwell.

How difficult is reactive feedback?

Introduction

In my previous post about Dellar’s comments on Dogme, I suggested that there are two big differences between using the Dogme approach and using a coursebook. First, in Dogme lessons there’s no planned explicit teaching of grammar or pronunciation, and second, there’s a lot more classroom time spent on engaging students in spontaneous conversation and oral exchanges. These differences are the result of fundamental differences in two distinct approaches to ELT, which I’ve discussed in lots of posts in this blog. They turn on the use of synthetic and analaytic syllabuses.

Coursebooks implement a synthetic syllabus, where the role of speaking activities is to consolidate the explicit teaching of pre-selected language “items” which are the focus of each Unit of the coursebook. For example, if the second conditional and tourism are among the items covered in Unit 3, then one of the speaking activities in that unit might be to discuss how you’d react to missing your flight, or losing your passport, or having your credit card declined in a restaurant. The speaking activity is designed to practice the pre-taught “second conditional” and vocabulary about tourism. The speaking activity doesn’t last long, and the language that students are expected to use is as predictable as it is unlikely to happen – “If my credit card was/were declined,  I’d pay cash”, for example, is an unlikely student utterance.

I n the post, I argued that Dellar’s description of “what teachers need to be doing” while students engage in the tasks that typify a Dogme lesson is based on a misunderstanding of the extended communicative tasks that lie at the heart of Dogme, where the assumption (made by those who use an analytic syllabus) is that most of the learning goes on implicitly while students do the task, and thus, the attention to explicit teaching that he demands is largely unnecessary. However, I added that reactive feedback (what Long calls “focus on form”) done during the task, and a feedback session after the task, were important parts of the Dogme approach. 

A Twitter Exchange

The post sparked comments on Twitter about giving reactive feedback which took me rather by surprise. Matt Bury tweeted:   

I think one criticism is valid: It puts more emphasis on dynamic scaffolding (highly demanding) & less on planned scaffolding (minimally demanding) so considering the high teaching workload that most ELTs work under…

He continued:

I mean that dynamic scaffolding (i.e. During the lesson, Dellar’s “big ask”) is more demanding than planned scaffolding (i.e. Coming into the classroom having thought through the scaffolding your students will need to successfully use the language).  Planned scaffolding also enables ELTs to re-use it (systematically recorded & adaptable resources) & plan more strategically across several lessons at a time, thereby increasing the quality of instruction.

And he concluded:

In the long run, planned scaffolding tends to be more efficient, more effective & less work than dynamic scaffolding.

Chris Jones later made a similar point. He tweeted:

Nothing against Dogme but you assume teachers at any level of experience and in any situation can easily tune in to what students say, pick up on errors, decide where to focus and then provide recasts. I don’t think that is easy or actually realistic in many cases.

Asked to clarify “tune in” Jones replied:

Yes, tune in to the language students are using in order to help them with it. Are they not coming to class for that? Even if this way were practical, you know that you need experience to have some idea of how to scaffold and what to recast and what to leave.

Scott Thornbury chipped in:

Fair point Chris. But you don’t get good at it if you don’t try it.

to which Sue Leather responded:

Yes, that’s true. But… though I’m very much a supporter of dogme, as a trainer I have found it a hard sell in certain (cultural) contexts. Perhaps when teachers don’t have full confidence in their own English and/or pedagogic skills….

My initial reaction to these comments was one of surprise. I had assumed that teachers could learn how to give reactive feedback to students while they’re engaged in tasks, and how to conduct a subsequent feedback session quite quickly, without much difficulty. Well, it seems I might have seriously underestimated the difficulties. Let me explain my view.

My view of reactive feedback

In order to do the Dogme approach, teachers need to understand the purpose of reactive feedback (or “focus on form” (Long, 2015, pp. 27-28) as it is widely referred to) and to appreciate that it’s a radical alternative to “focus on forms”. Focus on forms makes the explicit teaching of a pre-selected series of linguistic items the main content of the syllabus. In contrast, Dogme adopts a “focus on form” approach which involves the brief, reactive use of a variety of pedagogic procedures, ranging from recasts to provision of simple grammar “rules” and explicit error correction, designed to draw learners’ attention, in context, to target items that are proving problematic. Focus on form has as its objective to draw students’ attention to items which they might otherwise neither detect nor notice for a long time, thereby speeding up the learning process. Furthermore,  following the research of Nick Ellis (2005, 2006) and colleagues who adopt an emergentist theory of SLA, focus on form can create and store a first impression, or “trace” as Ellis calls it, of the item in long-term memory, thereby increasing the likelihood that it will subsequently be detected when examples are encountered during subsequent implicit input processing.

Focus of form is one of the principal ways that an analytic syllabus deals with formal aspects of the L2. An analytic syllabus expects learners to work out how the target language works for themselves, through exposure to input and using the language to perform communicative tasks. There is no overt or covert linguistic syllabus; more attention is paid to message and pedagogy than to language. The assumption, supported by research findings in SLA, is that, much in the way children learn their L1, adults can best learn a L2 implicitly, through using it. Analytic syllabuses are implemented using spoken and written activities and texts, modified for L2 learners, chosen for their content, interest value, and comprehensibility. Classroom language use is predominant, while grammar presentations and drills are seldom employed.

So, the first step for teachers wanting to adopt a Dogme approach is to share the view of most SLA scholars that language learning is predominantly a matter of learning by doing, of implicit learning, and that, therefore, reactive feedback should play an important, but relatively minor, role in L2 learning. Now, I appreciate that most second language teacher education (SLTE) today, particularly pre-service “training” courses like CELTA, fails to give proper attention to how people learn an L2, and that, as a result, most teachers are unaware of the prime importance of implicit learning and of how the efficacy of explicit teaching is determined by the learners’ readiness to learn, i.e., by the current state of their dynamic interlanguage trajectory. Most pre-service teachers are taught how to use coursebooks, and, as a result, they’re encouraged to wrongly assume that explaining bits of the L2 is a necessary prior step to practicing them. The solution to this dire problem is obvious – a module devoted to how people learn an L2 should be a necessary part of SLTE.

Once teachers understand the psychological process involved in L2 learning, and the role of reactive feedback, they need to get the hang of using it. As I said in the previous post, Dellar’s list of what teachers “need to be doing” while students carry out a Dogme task is based on ignorance of the SLA literature and on what he thinks teachers should do during the speaking activities found in coursebooks. In fact, teachers with an understanding of how people learn an L2, who consequently opt to use an analytic syllabus, including Dogme and many TBLT syllabuses, rarely find it difficult to get students speaking; or to move from group to group listening in on what they’re saying, or to notice gaps in their language. And they don’t, pace Dellar, have to think about what they’re going  to gap on the board, or what questions they’re going to ask about it, or how they’re going “to get the students to use some of that language”.

The Problem

Nevertheless, during the task, teachers have to use a variety of pedagogic procedures in reaction to breakdowns in communication and certain errors, and they have to lead feedback sessions afterwards.

So how difficult do teachers find these pedagogic procedures?

The problem here is that, as Chris Jones said in his tweets, there’s very little empirical evidence from studies of Dogme classes to help us answer that question. He’s quite right, but In disagreement with Jones, I think that teachers implementing many TBLT syllabuses (including the types described by Long, Skehan, N. Ellis, R. Ellis, Robinson and Dave and Jane Willis  – see Ellis & Shintanti (2016) for a review) engage in the same kind of reactive feedback and feedback sessions as Dogme teachers, and thus we do have considerable evidence from studies on the effects of these pedagogic interventions in TBLT. Nevertheless, we don’t have much evidence about how difficult teachers feel it is to do this kind of teaching.

I’ve taught English as an L2 for over thirty years, rarely used a coursebook, and I don’t remember ever thinking that giving reactive feedback or leading feedback sessions was any more difficult than other elements of the job. That’s mostly because I taught in contexts where I got a lot of support from bosses (who, in the 1980s and early 1990s, organised and/or paid for courses on using analytic syllabuses) and from colleagues who participated with me in on-gong CPD where we honed the skills needed to do learner-centred teaching where explicit (grammar) teaching took a back seat. Those years are long gone. As I said above, ELT is currently dominated by coursebooks, with the result that teachers don’t get the training, practice and support they need to switch to a different type of teaching.  

The Answer   

I think the best way for teachers to learn how to incorporate reactive feedback and follow-up feedback sessions into their teaching is to read up on it (I recommend the section on corrective feedback in Ellis & Shintani (2016) Exploring Language Pedagogy through Second Language Acquisition Research as a good starting place), watch experienced teachers, and then get experience doing it, ideally with the support of colleagues. Of course it takes time and practice to get good at making brief interventions during a task, at taking notes of interesting bits of emerging language, and at using these notes to lead a follow-up feedback session. But from what Scott Thornbury tells me about the teachers who’ve done Dogme courses, and from what I know about the teachers who’ve done TBLT courses, they learn fast, they feel it’s worth the effort, and they feel that it helps them to become more effective teachers.   

Context

Scott Thornbury makes a point (particularly when discussing ELT with me, I can assure you!) of emphasizing just how much context affects how we teach. He’s right to do so, but I think he’d agree with me that arguments suggesting that certain contexts are not “suitable” for Dogme or TBLT are mostly bogus. It is simply not true that Dogme or TBLT are not “appropriate” for certain cultural contexts, or for certain government regimes, or for big classes, or for certain types of learners – beginners, young learners, the elderly, etc. Many studies (see, e.g., the meta-analyses of  Cobb (2010) and Bryfonski, & McKay (2019)), show that TBLT gets excellent results in a very wide variety of contexts, and it seems to me reasonable to argue that Dogme and TBLT can be adapted to any context.

“Non-native” Teachers

Sue Leather’s suggestion that Dogme meets resistance among “teachers who don’t have full confidence in their own English and/or pedagogic skills” is, I think, extremely important. I should preface this discussion by making it clear that I condemn any discrimination against teachers of English as an L2 based on the fact that English is not their L1. There are countless great teachers of English as an L2 whose L1 is not English, and I support onging attempts to outlaw any school or university which demands that teachers of English as an L2 have English as their L1. I’ll now discuss the problem of the many non-native speaker teachers whose lack of proficiency affects their teaching, summarising part of Chapter 10 of Jordan & Long (2022).   

More than 90% of those currently teaching English as a foreign language are non-native English speakers (British Council, 2015). Most non-native English speaker teachers work in their own countries, where the government’s Ministry of Education produces a curriculum and stipulates the entry level qualifications required to work as an English teacher.

In China, for example, the Chinese Ministry of Education launched a nationwide BA program in TEFL in 2003 which became the recognized Pre-Service English Teacher Education program for those wishing to teach English as a Foreign language in primary, secondary and tertiary education in China. Studies by Zhan (2008), Hu (2003, 2005) and Yan (2012) revealed that many of the student teachers had considerable difficulties in expressing themselves clearly and fluently in English, and that their lack of confidence in speaking English contributed significantly to the subsequent ‘mismatch’ between the objectives of the course and the ways student teachers subsequently did their jobs in their local contexts. The course’s promotion of communicative language teaching failed to change the type of teaching the student teachers subsequently carried out: in their classrooms “the tyranny of the prescribed textbook” was still in evidence (Zhan, 2008, p. 62).  Studies by (Hu (2003) and Yan, (2012) support the general view that, despite being told of the value of CLT, and despite stating in their answers to researchers’ questions that they firmly believed in the value of communicative activities, when the teachers’ classes were observed, it became obvious that their lessons were teacher-fronted, and that the vast majority of the time was spent using a coursebook to instill knowledge about English grammar and vocabulary.

Similar results were found in studies carried out in other countries. Regarding the language problem, a 1994 study by Reves & Medgyes asked 216 native speaker and non-native speaker English teachers from 10 countries (Brazil, former Czechoslovakia, Hungary, Israel, Mexico, Nigeria, Russia, Sweden, former Yugoslavia, and Zimbabwe) about their experiences as teachers. The overwhelming majority of participants were non-native speakers of English, and in their responses, 84% of the non-native speaker subjects said that they had serious difficulties using English and that their teaching was adversely affected by these difficulties. Difficulties with vocabulary and fluency were most frequently mentioned, followed by pronunciation, and listening comprehension. It’s ironic that the  problem goes back to failures in their own teachers’ ability to implement a CLT approach.

Cultural factors play their part in how teachers go about their job, of course, but I reject the suggestion that Chinese teachers, for example, are so imbued with a Zen cultural heritage that they find it impossible to abandon centuries-old teaching practices. Appealing to cultural stereotypes in this way is surely offensive and ignores the real experiences of Chinese teachers, many of whom sincerely desire change. Likewise, in other parts of the world where a mismatch between pre-service English teaching courses and outcomes has been found, explanations which stress differences in cultures and in teachers’ subjective ‘knowledge bases’ fail to give enough attention to the restraints imposed by objective factors, including that many teachers lack confidence in their English.

Just to complete the picture, we should appreciate that the state-run English teaching courses offered in China and elsewhere are based on interpretations of ideas about CLT which stem not so much from the ideas which emerged in the 1970s, but rather from more recent ideas, promoted by those working for commercial ELT companies who all work to maximize profits, and who are all therefore keen to package ELT into a number of marketable products. The CELTA course is a good example: it is an easily marketable, highly profitable product in itself, and it involves the use of other, related, well-packaged, marketable products, including coursebooks and exams. It is only to be expected that SLTE courses designed by Cambridge English should encourage coursebook-driven ELT, and it is equally predictable that the British Council, with its own chain of English language schools and close ties with Cambridge English, should do the same. Likewise, when overseas ministries of education turn to Cambridge English and other such providers for help in introducing a communicative approach to ELT, it is to be expected that these providers recommend using their own products, coursebooks and tests among them. We could hardly expect them to encourage the implementation of Dogme or TBLT! Furthermore, we cannot expect the Chinese Ministry of Education (or the Turkish, or Vietnamese, or Brazilian, etc., etc., ministries) to appreciate the differences between “real” CLT and what the British Council, Cambridge University Press, and others say it is.

Conclusion

It’s been salutary for me to read the comments made on Twitter by Matt Bury, Chris Jones and others which highlight the difficulties of pedagogic procedures which I had assumed weren’t particularly challenging. There’s no doubt that we need more research investigating how teachers actually carry out reactive feedback and follow-up feedback sessions, and that more attention should be paid to Instructed Second Language Acquisition. We also need more discussion among teachers, and I think the Twitter exchanges show that, despite their limitations, they can be very useful in raising important concerns.

I end with Matt’s final comment:

 To be clear, I share Geoff Jordan’s criticisms of how ELT coursebooks are typically designed; I think the gap between Instructed SLA theory vs coursebook instruction couldn’t feasibly be bigger.

References

British Council (2015). The English Effect Report. Retrieved March 15, 2021 from  https://www.britishcouncil.org/sites/default/files/english-effect-report-v2.pdf

Bryfonski, L., & McKay, T. H. (2019). TBLT implementation and evaluation: A meta-analysis. Language Teaching Research, 23(5), 603–632.

Cobb, M. (2010). Meta-analysis of the effectiveness of task-based interaction in form-focused instruction of adult learners in foreign and second language teaching. Doctoral Dissertations. 389. https://repository.usfca.edu/diss/389.

Ellis, R. & Shintani. N. (2016). Exploring Language Pedagogy through Second Language Acquisition Research. Routledge.

Hu, G. (2003). English Language Teaching in China: Regional Differences and Contributing Factors. Journal of Multilingual and Multicultural Development, 24, 4, 290–318

Hu, G. (2005). English language education in China: Policies, progress, and problems. Language policy, 4, 5-24.

Jordan, G. & Long, M. (2022). ELT: Now and How It Could Be. Cambridge Scholars.

Long, M. (2015). SLA and TBLT. Wiley.

Reves,T. & Medgyes, P. (1994). The non-native English speaking EFL/ESL teacher’s self-image: An international survey. System, 22, 3, 353-367.

Yan, C. (2012). ‘We can only change in a small way’: A study of secondary English teachers’ implementation of curriculum reform in China. Journal of Educational Change, 13, 431 – 447.

Zhan, S. (2008). Changes to a Chinese pre‐service language teacher education program: analysis, results and implications, Asia‐Pacific Journal of Teacher Education, 36, 1, 53-70.

Is Dogme “really bloody difficult”?

I’ve just come across a video by Dmitriy Fedorov called Teaching Unplugged: Scott Thornbury versus Hugh Dellar where we see both Thornbury and Dellar talking about Dogme. Federov ends by siding with Dellar’s view, which is that teaching unplugged is “a huge ask” and “really bloody difficult”. The Dellar clip used in Fedorov’s video is from a 2017 presentation: Teaching House Presents – Hugh Dellar on Speaking, during which he offers a short rant against Dogme as evidence to support his view that speaking activities need careful preparation, including anticipating what students are likely to say, so as to avoid being caught “on the spot”, unable to offer the required pedagogic support. I’ll argue that Dellar’s “evidence” puts the limitations of his own view of ELT on show, and unfairly dismisses Dogme.

Here’s the clip. To start, click on the arrow, and to stop, click anywhere inside the video frame.

Here’s a transcript:

In Dogme teaching you’re kind of working from what the students say.

Seems a lovely idea, but it’s really bloody difficult to do because what you need to be doing is

  1. getting the students speaking
  2. listening to them all as they’re speaking and wandering around cajoling those who aren’t speaking
  3. noticing gaps in their language
  4. thinking about how to say those things better in a more sophisticated way
  5. getting that language on the board while they’re still talking
  6. thinking about what you’re going to gap on the board, what questions you’re going to ask about it, how you’re going to get the students to use some of that language

And you’re going to have to do all of that on the spot. It’s a huge ask and it’s one of the reasons why Dogme doesn’t exist outside of Scott Thornbury’s head.   

Let’s start at the end. Dellar’s jokey remark that Dogme doesn’t exist outside Scott Thornbury’s head was made five years ago, by which time Dogme was already famous. Today, a Google search on “Dogme and language teaching” gives approx. 327.000 results in half a second. Thousands of articles, blog posts, podcasts, and discussion groups attest to the growing popularity of Dogme among language teachers around the world. Among the first fifty results of the Google search I did, I found these:

  • Nguyen & Bui Phu’s (2020) article “The Dogme Approach: A Radical Perspective in Second Language Teaching in the Post-Methods Era”, which gives an interesting discussion of Dogme,
  • Coşkun’s (2017) article “Dogme: What do teachers and students think?”, which presents the findings of a study carried out at a Turkish school exploring the reactions of EFL teachers and their students to three experimental Dogme ELT lessons prepared for the study. Coşkun’s study includes detailed accounts of the lessons, and it serves to highlight the poverty of Dellar’s description of Dogme as “kind of working from what the students say”.

It’s important to note that, in a 2020 interview, Thornbury said he thought it had been a mistake to make conversation part of the  three pillars of Dogme. “What really should be said, is that Dogme is driven not by conversations, but by texts… texts meaning both written and spoken”. Meddings and Thornbury have made clear in a number of publications and interviews that the Dogme approach does not, pace Dellar, involve teachers strolling unprepared into class and asking students what they fancy talking about. It involves planning and extensive use of a wide variety of oral/ written / multimodal texts, some created by the students and some provided by the school. It also includes a lot of attention to different kinds of feedback, including attention to vocab., lexical chunks, pronunciation and grammar.

Dellar’s dismissal of Dogme as too bloody difficult stems from viewing it through the distorting lens of his own approach to teaching. He thinks that if teachers don’t have a coursebook to lean on, a coursebook that organises speaking activities around pre-selected bits of the language, provides lead-ins and warm-ups and post-speaking follow-up and consolidation work, then they’ll have to do all this stuff “on the spot” – which is, he thinks, “a huge ask”. In their book “Teaching Lexically”, Dellar & Walkley recommend working with a coursebook – one of the Outcomes series, for example – which provides a syllabus made up of activities designed to teach pre-selected bits of language (“items” as they call them) in a pre-determined sequence. The teacher uses the coursebook to lead students through multiple activities in each lesson, few of them lasting for more than 20 minutes and even fewer giving students opportunities to talk to each other at any length. The English language is presented to learners, bit-by-bit. via various types of grammar and vocab. summary boxes, plus carefully-designed oral and written texts. Activities include studying these language summaries, comprehension checks, fill-the-gap, multiple choice and matching exercises, pattern drills, and carefully-monitored speaking activities. The special thing about Dellar & Walkley’s coursebooks is that they pay particular attention to lexis, collocations and lexical chunks, and the special thing about Dellar is that he’s particularly enthusiastic about explicitly teaching as many lexical chunks as possible. The upshot of this approach is that the majority of classroom time is devoted to explicit teaching, i.e., to the teacher telling the students about the target language.

Dellar treats education as the transmission of information, a traditional view which is challenged by the principles of learner-centred teaching, as argued by educators such as Paul Friere, and supported in the ELT field by Thornbury, Meddings and other progressive educators. Compare this transmission of information view of education (the “banking” view as Friere called it) to the Dogme approach, where education involves learning by doing. Dynamic interaction among the teacher and the students and the negotiation of meaning are key aspects of language teaching. Students often chose for themselves the topics that they deal with and they contribute and create their own texts; most of classroom time is given over to tasks which involve using the language communicatively and spontaneously; the teacher reacts to linguistic problems as they arise rather than introducing, explaining and practicing pre-selected bits of the language.   

Dogme teachers reject the view that each lesson should specify in advance what items of the language will be taught, and they reject the view that some explanation of the new items is a necessary first step. Instead, they use a task -> feedback sequence, where working through multi-phased communicative tasks involves pair and group work which takes up at least half of classroom time. The unplanned language that emerges during the interaction among students and teachers as they work through tasks includes errors and communication breakdowns. Teachers use recasts and other types of punctual intervention to help students express themselves, and they subsequently provide more lengthy, explicit information about the lexis and the pronunciation and grammar issues which arose during the task peformance in the feedback session.

An example of a Dogme lesson is given in Meddings & Thornbury (2009, p. 41)

Slices of life

  1. Teacher draws a pie chart on the board and splits it in three: like, don’t like, don’t mind.
  2. Students ask teacher about their likes / dislikes. Teacher replies and students put things into the three categories depending on the response.
  3. Students then work in pairs repeating the same activity with each other, while the teacher moves around from one pair to the next, helping students with their language.
  4. The whole class comes together and different students’ likes and dislikes are compared.
  5. Teacher gives language feedback.

Also, see Coskun’s (2017) description of the 3 lessons involved in her study (the article is free to download).  

If we look again at the list of all the things that Dellar thinks a teacher “needs to be doing” when “working from what the students say”, it clearly reflects his belief in the importance of explicit instruction; it indicates that he’s thinking about the sort of speaking activities you find in coursebooks like Outcomes; and it suggests that he has little grasp of what a Dogme approach entails. Why should it be so difficult to deal with students speaking? When the class is together – in Part 2 of the Slices of life task above, for example – students speak one at a time, and the teacher can deal quickly with language problems as they arise, through prompts and recasts, putting new vocabulary and short grammar notes on the board. During pair or group work – Parts 3 & 4 of the example – the teacher moves from group to group, listening in, giving help with vocabulary and pronunciation, and making some quick comments on errors – through recasts, for example. The teacher takes notes of useful vocabulary and of pronunciation and grammar issues and these can be written on the board while the students finish up their discussion by going back over the main points. When the whole class comes back together to report on how they did the task and discuss their different likes and dislikes – Part 5 – the teacher reacts to what they say as in Part 2. In the final part of the lesson, the teacher goes through the points that have been highlighted during the session and makes a few final remarks.

There is a fast-growing collection of literature contradicting Dellar’s insinuation that a Dogme approach makes unreasonable demands on teachers. The evidence shows that increasing numbers of teachers find the Dogme approach not just more stimulating and enjoyable than using a coursebook, but also less complicated and less stressful.  When their students are engaged in a communicative task, Dogme teachers don’t report getting stressed out trying to think of “better”, “more sophisticated” ways of expressing what the students are saying, because they don’t share Dellar’s dedication to explicit teaching. While students work together in groups talking about a problem or topic they’ve been asked to discuss, Dogme teachers don’t wander around the classroom trying to think of the most appropriate language to fill the gaps they’ve noticed, or what gapped sentences they should write on the board, or what questions they should ask, or how they’re going to get the students to use the language they come up with. In other words, while the students are doing a task, Dogme teachers are not doing all the things that Dellar thinks they need to be doing.

The communicative tasks which make up a Dogme lesson don’t have the same aim as the speaking activities found in coursebooks. While the speaking activities in coursebooks are attempts to automate previously taught declarative knowledge, the communicative tasks that provide the backbone of Dogme teaching aim to give rise to unpredictable, spontaneous, emergent language which pushes the students’ developing interlanguage. Using current language rescources to carry out these tasks is how most of the learning happens; it’s the key to interlanguage development. It’s learning by doing – learning how to use the language by using texts and participating in authentic communicative exchanges, not by being told about it. This implicit learning leads directly to the procedural knowledge needed for listening comprehension, spontaneous speech, and fluency. So while Dellar’s question “How am I going to get the students to use this language?” is an important one for teachers using coursebooks, it’s a redundant question for Dogme teachers.

Still, we know that certain types of teacher intervention can speed up the rate of interlanguage development, and that it’s not enough to just get students talking about things in class. To do Dogme well, teachers need their bosses’ support: it’s not the teacher’s job to design and provide the curriculum. So they need access to a materials bank which includes a variety of texts to provide rich input, and a variety of tasks suitable to the varying needs and current levels of proficiency of the students. They also need experience in scaffolding tasks and giving feedback, and that calls for some expert training, on-going PD, including collaboration among colleagues, and lots of practice. But no teacher should be disuaded from putting down the coursebook and trying Dogme just because Dellar thinks it’s all “a huge ask” and too bloody difficult.  

References        

Coşkun, A. (2017). Dogme: What do teachers and students think? International Journal of Research Studies in Language Learning, 6(2), 33-44. http://doi.org/10.5861/ijrsll.20.

Dellar, H. & Walkley, A. (2016) Teaching Lexically. Delta.

Meddings, L, & Thornbury, S. (2009). Teaching Unplugged: Dogme in English Language Teaching. Delta.

Nguyen, N.Q., & Bui Phu, H. (2020). The Dogme Approach: A Radical Perspective in Second Language Teaching in the Post-Methods Era. Journal of Language and Education, 6(3), 173-184. https://doi.org/10.17323/jle.2020.10563

Thornbury, S. (2020) Interview -go to the Wikipedia page on Dogme and click on Fottnote 8.