This is a parenthesis. I’d planned to devote Part 4 of ‘Carroll’s AIT’ to assessing how successfully it bridges the gap between nativist and emergentist views of SLA, but I pause so as to answer Scott Thornbury’s question on Twitter (18th May), which amounts to asking why a theory of SLA has to start by describing what it wants to explain.
Really! I mean, you’d have thought a bright, well-read chap like Scott already knew, right? But he doesn’t, and the problem is that he’s a very influential voice in ELT, likely to have a potentially damaging influence on his legions of followers, some of whom might not detect when – just now and then – he talks baloney. Chomsky’s work seems to be a particular blind spot for Scott; he has a very bad record indeed when it comes to writing and talking about Universal Grammar. I’m told by my friend Neil that he’s not much better when it comes to appreciating Lacan’s paradigm shifting critique of Breton’s Surrealist Manifestos, either (see McMillan Mirroring Despair Among the ELT Ungodly, in press). Nemo Sine Vitiis Est, eh what.
In Part 3, resuming the story so far, I said “A theory of SLA must start from a theory of grammar”. Soon afterwards, Scott Thorbury tweeted, on that highest, silliest horse of his, which, thankfully, he reserves for his discussion of Chomsky:
“A theory of SLA must start from a theory of grammar.” Why? Who said? I’d argue the reverse ‘A theory of grammar must start from a theory of [S/F]LA’. Grammar is the way it is because of the way languages are learned and used.
Other gems in his tweets included:
The theory comes later. Induction, dear boy. Empirical linguistics, by another name.
Chomsky’s error was to start with a theory of grammar and then extrapolate a theory of language acquisition to fit. Cart before horse.
“The theory of UG…is an innate property of the human mind. In principle, we should be able to account for it in terms of human biology.” In principle, maybe. In practice, we can’t. That’s cart-before-horsism.
UG a powerful theory?’ Based on made-up and often improbable data? Incapable of explaining variability & change? Creationism is a powerful theory, too – if you ignore the fossil evidence.
Luckily, I was sitting in a comfortable chair, joyfully making my way through a crate of chilled, cheeky young Penedes rosé wines (delivered to me by an illegal supplier using drones “lent” to him by Catalan extremists intent on breaking the Madrid inspired lockdown), when I read these extraordinary remarks. Otherwise, I might have stopped breathing. Anyway, let’s try to answer Scott’s tweets.
White (1996: 85) points out:
A theory of language acquisition depends on a theory of language. We cannot decide how something is acquired without having an idea of what that something is.
Carroll (2001) and Gregg (1993) agree: a theory of SLA has to consist of two parts:
1) the explanandum: a property theory which describes WHAT is learned. It describes what the learner knows; what makes up a learner’s mental grammar. It consists of various classifications of the components of the language, the L2, and how they work together.
2) the explanans: a transition theory which explains HOW that knowledge is learned.
So WHAT is human language? I’ve dealt with this in a previous post, so suffice it to say here that it’s a system of form-meaning mappings, where meaningless elementary structures of language combine to make meaning; the sounds of a language combinine to make words, and the words combine to make sentences. There are various property theories, but let’s focus on just two: UG and Construction Grammar, the second being the description of language offered by emergentists, who take a Usage-based (UB) appproach to a theory of SLA.
Chomsky’s model of language distinguishes between competence and performance; between the description of underlying knowledge, and the use of language. Chomsky refers to the underlying knowledge of language which is acquired as “I-Language”, and distinguishes it from “E-Language”, which is everyday speech – performance data of the sort you get from a corpus of oral texts.
“I-Language” obeys rules of Universal Grammar, among which are structure dependency, C-command and government theory, and binding theory. These are among the principles of UG grammar and they operate with certain open parameters which are fixed as the result of input to the learner. As the parameters are fixed, the core grammar is established. The principles are universal properties of syntax which constrain learners’ grammars, while parameters account for cross-linguistic syntactic variation, and parameter setting leads to the construction of a core grammar where all relevant UG principles are instantiated.
UB Construction Grammar
The basic units of language representation are Constructions, which are form-meaning mappings. They are symbolic: their deﬁning properties of morphological, syntactic, and lexical form are associated with particular semantic, pragmatic, and discourse functions. Constructions comprise concrete and particular items (as in words and idioms), more abstract classes of items (as in word classes and abstract constructions), or complex combinations of concrete and abstract pieces of language (as mixed constructions). Constructions may be simultaneously represented and stored in multiple forms, at various levels of abstraction (e.g., concrete item: table+s = tables and [Noun] + (morpheme +s) = plural things). Linguistic constructions (such as the caused motion construction, X causes Y to move Z path/loc [Subj V Obj Obl]) can thus be meaningful linguistic symbols in their own right, existing independently of particular verbs. Nevertheless, constructions and the particular verb tokens that occupy them resonate together, and grammar and lexis are inseparable (Ellis and Cadierno, 2009).
So HOW do we learn an L2?
UG offers no L2 transition theory, “y punto”, as they say in Spanish. UG says that all human beings are born with an innate grammar – a fixed set of mental rules that enables children to create and utter sentences they have never heard before. Thus, language learning is faciliated by innate knowledge of a set of abstract principles that characterise the core grammars of all natural languages. This knowledge constrains possible grammar formation in such a way that children do not have to learn those features of the particular language to which they are exposed that are universal, because they know them already. This “boot-strapping” device, sometimes referred to as the LanguageAcquisition Device” (LAD) is the best explanation of how children know so much more about their L1 than can be got from the language they are exposed to. The ‘poverty of the stimulus argument’ is summed up by White:
Despite the fact that certain properties of language are not explicit in the input, native speakers end up with a complex grammar that goes far beyond the input, resulting in knowledge of grammaticality, ungrammaticality, ambiguity, paraphrase relations, and various subtle and complex phenomena, suggesting that universal principles must mediate acquisition and shape knowledge of language (White 1989: 37).
Note that this refers to L1 acquisition. Those who take a UG view of SLA concentrate on the re-setting of parameters to partly explain how the L2 is learnt.
Just by the way, I suppose I should deal with Scott’s accusations that UG is “based on made-up and often improbable data” which is “incapable of explaining variability & change”. First, the data pertaining to UG are contained in more than 60 years of research studies, hundreds of thousands of them, the results of which scholars (including UB theorists such as Nick Ellis, Tomasello and Larsen-Freeman) acknowledge as having contributed more to the advancement of science than those motivated by any other linguist in history. Second, that UG is incapable of explaining variability and change is hardly surprising, since it doesn’t attempt to, any more than Darwin’s theory attempts to explain tsunamis. To coin a phrase: It’s a question of domains, dear boy.
The usage-based theory of language learning is based on associative learning – “acquisition of language is exemplar based”. (Ellis, 2002: 143). “A huge collection of memories of previously experienced utterances” underlies the fluent use of language. Thus, language learning as “the gradual strengthening of associations between co-occurring elements of the language”, and fluent language performance as “the exploitation of this probabilistic knowledge” (Ellis, 2002: 173).
Note that this is part of a general learning theory. Those who take a UB view of SLA see it as affected by L1 learning.
To paraphrase Gregg (2003), nativist (UG) theories posit an innate representational system specific to the language faculty, and non-associative mechanisms, as well as associative ones, for bringing that system to bear on input to create an L2 grammar. UB theories deny both the innateness of linguistic representations and any domain-specific language learning mechanisms. For UB, input from the environment, plus elementary processes of association, are enough to explain SLA.
Clearing up the muddle about UG
Gregg (2003) discusses four “red herrings” used by those arguing against UG. I’ll paraphrase two of them, because they address Scott’s remarks.
The Argument from Vacuity
The argument is: calling a property ‘innate’ does not solve anything: it simply calls a halt to investigation of the property.
First, innate properties are generally assumed in science – circulation of the blood is one such property. As for language, the jury is still out. Thus it is question-begging to argue that calling UG innate prevents us from investigating how language is learned.
Second, criticising the ‘innateness hypothesis’, often rests on a caricature of the argument from the Poverty of the Stimulus (POS), viz., ‘Property P cannot be learned; therefore it is innate’. But in fact the POS argument is more nuanced:
1) An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.
2) The correct set of principles need not be (and typically is not) in any pretheoretic sense simpler or more natural than the alternatives.
3) The data that would be needed for choosing among these sets of principles are in many cases not the sort of data that are available to an empiricist learner.
4) So if children were empiricist learners, they could not reliably arrive at the correct grammar for their language.
5) Children do reliably arrive at the correct grammar for their language.
6) Therefore, children are not empiricist learners.
Emergentists need to show that the POS argument for language acquisition is false, by showing that empiricist learning suffices for language acquisition. In other words, they need to show that the stimuli are not impoverished, that the environment is indeed rich enough, and rich in the right ways, to bring about the emergence of linguistic competence (Gregg, 2003, p. 101).
The Argument from Neuroscience
As MacWhinney puts it, ‘Nativism is wrong because it makes untestable assumptions about genetics and unreasonable assumptions about the hard-coding of complex formal rules in neural tissue’ (2000: 728) (Gregg, 2003, p.103).
As Gregg says, the claim that there is an innate UG is a claim about the mind, not about the brain. If brain science could show that it is impossible to instantiate UG in a brain, the claim of an innate UG would clearly fail, but brain science has not shown any such impossibility; indeed, brain science has not yet been able to show how any cognitive capacity is instantiated. Thus, pace Scott Thornbury, to date, neuroscience cannot support any emergentist claim about the development of interlanguages. Furthermore, Scott seems to think that neurological explanations are somehow more ‘real’ or ‘basic’ than cognitive explanations; which is, in Gregg’s opinion (2003, p. 104) “a serious mistake”.
It is not simply that the current state of the art does not yet permit us to propose theories of language acquisition at the neural level; it is rather that the neural level is likely not ever to be the appropriate level at which to account for cognitive capacities like language, any more than the physical level is the appropriate level at which to account for the phenomena of economics…. There is no reason whatever for thinking that the neural level is now, or ever will be, the level where the best explanation of language competence or language acquisition is to be found. In short, whether there is a UG, and if there is, whether it is innate, are definitely open questions; but they cannot be answered in the negative merely by appealing to neural implausibility.
Scott has expressed his enthusisam for UB theories for some time now, but he has done little to support this enthusiasm with rational argument. I’ve commented on the limitations of his grasp of the issues in other posts (see, for example, Thornbury on Performance, and Thornbury Part 1 so it’s enough to say here that he fails to adequately address the criticisms of UB theories made by many scholars, including Carroll and Gregg for example. The basic problems that UB theories face are these:
Property theory: UB theory suggests that SLA is explained by the learner’s ability to do distributional analyses and to remember the products of the analyses. So why do they accept the validity of the linguist’s account of grammatical structure? And what bits do they accept? Ellis accepts NOUN, PHRASE STRUCTURE and STRUCTURE- DEPENDENCE, for example. As Gregg comments “Presumably the linguist’s descriptions simply serve to indicate what statistical associations are relevant in a given language, hence what sorts of things need to be shown to ‘emerge’.
Transition Theory: Language is acquired through associative learning, through, what Nick Ellis calls ‘learners’ lifetime analysis of the distributional characteristics of the input’ and the ‘piecemeal learning of thousands of constructions and the frequency-biased abstraction of regularities’. To borrow from Scott’s screaming protests about claims for UG “Where’s the evidence?” . Well the only evidence is the models of associative learning processes provided by connectionist networks. But, as Gregg (2003) so persusively demonstrates, these connectionist models provide very little evidence to support the emergentist transition theory. Scott has made no attempt to reply to Gregg’s criticisms. Let me just give one part of Gregg’s argument, the part that deals with the connectionist claim that their models are ‘neurally inspired’.
The use of the term ‘neural network’ to denote connectionist models is perhaps the most successful case of false advertising since the term ‘pro-life’. Virtually no modeller actually makes any specific claims about analogies between the model and the brain, and for good reason: As Marinov says, ‘Connectionist systems, … have contributed essentially no insight into how knowledge is represented in the brain’ (Marinov, 1993: 256) Christiansen and Chater, who are themselves connectionists, put it more strongly: ‘But connectionist nets are not realistic models of the brain . . ., either at the level of individual processing unit, which drastically oversimplifies and knowingly falsifies many features of real neurons, or in terms of network structure, which typically bears no relation to brain architecture’ (1999: 419). In particular, it should be noted that backpropagation, which is the learning algorithm almost universally used in connectionist models of language acquisition is also universally recognized to be a biological impossibility; no brain process known to science corresponds to backpropagation (Smolensky, 1988; Clark, 1993; Stich, 1994; Marcus, 1998b).
I challenge Scott to answer the arguments so clearly laid out in Gregg’s (2003) article.
Finally, I can’t resist a quote from Eubank and Gregg’s (2002) article.
“And of course it is precisely because rules have a deductive structure that one can have instantaneous learning, without the trial and error involved in connectionist learning. With the English past tense rule, one can instantly determine the past tense form of “zoop” without any prior experience of that verb, let alone of “zooped” (unlike, say, Ellis & Schmidt’s model, which could only approach the “correct” plural form for the test item, and only after repeated exposures to the singular form followed by repeated exposures to the plural form, along with back-propagated comparisons). If all we know is that John zoops wugs, then we know instantaneously that John zoops, that he might have zooped yesterday and may zoop tomorrow, that he is a wug-zooper who engages in wug-zooping, that whereas John zoops, two wug-zoopers zoop, that if he’s a Canadian wug-zooper he’s either a Canadian or a zooper of Canadian wugs (or both), etc. We know all this without learning it, without even knowing what “wug” and “zoop” mean. A frequency / regularity account would need to appeal to a whole congeries of associations, between a large number of pairs like “rum-runner/runs rum, piano-tuner/tunes pianos, …” but not like “harbor-master/masters harbors, pot-boiler/boils pots, kingfisher/fishes kings, …”, or a roughly equal number of pairs like “purple people-eater” meaning purple people or purple eater, etc.”
Carroll.S. (2001) Input and Evidence. Amstedam, Benjamins.
Carroll, S. (2002) I-learning. EUROSLA Yearbook 2, 7–28.
Ellis, N. (2002) Fequency effects in language processing. SSLA, 24,2.
Eubank, L., & Gregg, K. (2002) Nnews flash—hume still dead. Studies in Second Language Acquisition, 24(2), 237-247.
Gregg, K. R. (1993) Taking explanation seriously; or, let a couple of flowers bloom. Applied Linguistics 14, 3, 276-294.
Gregg, K. R. (2003) The state of emergentism in second language acquisition. Second Language Research 19,2 (2003); pp. 95–128