Table 1

Outline structure and hyperlinks.


2 WHY? How small a thought it take to fill a whole life.


4 LANGUAGE LEARNING involves generic cognitive and associative mechanisms

    4.1 Explicit learning

    4.2 Implicit learning

    4.3 Explicit AND implicit learning

5 The ASSOCIATIVE LEARNING of Linguistic Constructions

    5.1 Associative learning depends upon contingency

    5.2 Associative learning depends upon salience

    5.3 The routine determinants of construction learning

    5.4 Connectionist leaning and the frequency by regularity interaction

    5.5 The Associative-Cognitive CREED


    6.1 High frequency of use leads to chunking and formulaic patterns

    6.2 High frequency of use leads to shortening — Zipf’s law

    6.3 Zipf’s law particularly impacts grammatical functors

    6.4 High frequency of use also leads to homophony and ambiguity

    6.5 Grammaticalization

    6.6 The Linguistic Cycle as a Panchronic Principle

7 LANGUAGE MEETS LEARNING in conspiring to make morphology especially difficult to learn

    7.1 Morphology – low contingency

    7.2 Morphology – low salience

    7.3 Redundancy

    7.4 Enough, though there is more


    8.1 Blocking

    8.2 Experimental demonstrations of blocking

    8.3 Blocking and Language-Specific Transfer Effects

    8.4 Learned attention and transfer in L2 morphology


    9.1 Form-focused instruction (FFI)

    9.2 How FFI overcomes blocking: Process and processing analyses at the interface

    9.3 Explicit AND Implicit L2 morphosyntax acquisition


    10.1 Automatic online processing in listening and speaking

    10.2 Controlled conscious processing for written composition

    10.3 Morphemes are better processed with lemmas reliably conjugated in this form in the language

    10.4 Why Reliability, particularly?

    10.5 MEANING. A morpheme shall be known by the company it keeps.


1 Second language learners have especial difficulty in learning morphosyntax

Second language (L2) speakers have especial difficulty using morphosyntax and grammatical functors. L2 learning of grammatical morphology is slow, piecemeal, and often incomplete. Milestone descriptions of this phenomenon in the SLA literature include.

Schmidt’s () case study of Wes, a naturalistic L2 learner, who was described as very fluent, with high levels of strategic competence, but low levels of grammatical accuracy: “using 90% correct in obligatory contexts as the criterion for acquisition, none of the grammatical morphemes counted has changed from unacquired to acquired status over a five year period” (). Many adult EFL learners never acquire total control of L2 morphology, even after tens of years of English immersion (e.g., ; ).

The European Science Foundation crosslinguistic and longitudinal research project () examined how 40 adult learners picked up the language of their social environment (Dutch, English, French, German or Swedish) by everyday communication. Analysis of the interlanguage of these L2 learners resulted in its being described as the ‘Basic Variety’. All learners, independent of source language and target language, developed and used it, with about one-third of them fossilizing at this level in that although they learned more words, they did not further complexify their utterances in respects of morphology or syntax. In this Basic Variety, most lexical items stem from the target language, but they are uninflected. “There is no functional morphology. By far most lexical items correspond to nouns, verbs and adverbs; closed-class items, in particular determiners, subordinating elements, and prepositions, are rare, if present at all… Note that there is no functional inflection whatsoever: no tense, no aspect, no mood, no agreement, no case marking, no gender assignment” ().

Input Processing theory () considers how learners process linguistic input as they try to comprehend it. The theory includes “The Primacy of Content Words Principle” that learners process content words in the input before they process other linguistic features, along with its corollary “Lexical Preference Principle” that learners will depend on lexical items to extract meaning as opposed to grammatical form.

The longitudinal study of ESL children by Jia and Fuse () demonstrated that the acquisition of a morpheme such as the third-person singular present-tense -s can take five years or more to go from zero to 80% provision in obligatory contexts.

Slabakova’s () comparison of L2 learners’ abilities in inflectional morphology, syntax, the syntax-semantics interface, the syntax-discourse interface, and the semantics-pragmatics interface led her to propose a “Bottleneck Hypothesis” whereby it is difficulties in acquiring functional morphology that limits the rate of L2 acquisition. This reflects, of the two alternatives considered by White (), “morphology-before-syntax” rather than “syntax-before-morphology”.

These L2 difficulties with morphsyntax can affect language change: “Languages are ‘streamlined’ when history leads them to be learned more as second languages than as first ones, which abbreviates some of the more difficult parts of their grammars” (). Around the world, English is now spoken more as an L2 than it is an L1: Graddol () estimated there to be 375 million L1 speakers compared to 750 million EFL and 375 million ESL speakers. The preponderance of L2 learning of English changes the nature of its varieties. When Seidlhofer () catalogued such changes in English as a Lingua Franca, she observed first and foremost “‘dropping’ the third person present tense -s (as in “She look very sad”).” These large-scale descriptions have now been supported by detailed experimental evidence in support of the hypothesis that iterated imperfect learning leads to language simplification (; ).

Grammatical morphemes pervade language usage: each day of experience provides tens to thousands of receptive experiences of functional morphemes, and tens to thousands of contexts requiring their productive use, yet, as these studies show, L2 provision is far from consistent. Especial L2 difficulty in morphosyntax is ubiquitous (albeit to varying degrees) across languages, learners, and usage. It is so robust a finding that it warrants status as a Law of SLA.

A Law of SLA:    L2 learners have especial difficulty in learning morphosyntax.

2 Why? How small a thought it take to fill a whole life

Why is it so? This question has filled much of my scholarly life. In what follows, I will describe a variety of studies and their conclusions. As you will see, I have at times found myself focusing upon ‘as small a thought’ as the English third person present tense -s. It serves as ‘a canary in coal mine’—a tell-tale indicator that illuminates the flow of the English language and the fragility of morphology in second language acquisition wherever it sings or falls silent.

Some laws follow simple cause-effect rules, where the effect is the outcome of a necessary and sufficient cause, as with a binary switch or parameter-setting. I have not found L2 difficulties in acquiring morphosyntax to be of this type where the regularities are rule-driven: there are no simple mechanisms for top-down governance nor are there plausible innate linguistic universals with well-specified mechanisms that could coordinate such phenomena. Instead, L2 difficulty in morphosyntax emerges from the dynamics of the ecology of language usage (; ; , ; ; ). As complex adaptive systems, languages emerge, evolve, and change over time as they are honed by social discourse. They adapt to their speakers, to their contexts, to natural laws of cognition, learning, social interaction, cooperation, and communication in the ecology of language as a whole (; ; ).

In what follows here in trying to understand L2 difficulty with morphosyntax, we will engage with several of these domain-specific laws and see how each plays a part. The structure of the argument is shown in Table 1 where the headers serve as hyperlinks. In order to illustrate the big picture, each section is sketched in broad brushstrokes along with references to one or two articles giving further detail. I tend to cite myself because these articles mark milestones in my thinking, I know intimately how the relevant ~10,000 words of the argument go, and each of these papers cites >100 significant scholars who have seriously influenced me. I hope I have properly followed their leads. Ideas also adapt to their speakers, to their contexts, to natural laws of cognition, learning, social interaction, cooperation, and communication in the ecology of scholarship as a whole.

3 Cognitive linguistics and construction grammar

Learning a language involves the learning of its constructions – the symbolic form-meaning mappings that are conventionalized in its speech community. Constructions include morphemes – the smallest pairing of form and meaning in language – as well as words, phrases, and syntactic frames (; ). Simple morphemes such as the English noun plural –s morpheme are constructions in the same way as simple words like word, idioms like to break one’s word, and abstract syntactic frames like the Subject-Verb-Object-Object verb-argument construction (which signals that something is being transferred to someone, as realized in sentences as diverse as she gave him her word, the company sent the complainant a lengthy denial, etc.). Including abstract syntactic frames admits that not all constructions carry meaning in the traditional sense, but rather serve a functional or meaningful purpose, as with the passive construction which encourages a shift of attentional focus from the agent of the action to the patient undergoing the action (compare the passive A cake was baked for the birthday-child with its active counterpart They baked the birthday-child a cake). Ellis, Römer, and O’Donnell () present a monograph on the learning, processing, and lexical-syntactic-semantic interactions of verb-argument constructions in L1 and L2 speakers and large corpora of usage.

Constructions are stored simultaneously in multiple forms that differ in their level of complexity and abstraction. For instance, the word word and the plural –s morpheme are both simple constructions that are also stored as constituent parts of the more complex construction words [word + plural -s] as well as the completely schematic [Noun + plural -s]. While the vast majority of base words are free to stand on their own, as free morphemes, the smaller number of bound morphemes like the English third person present tense -s and the past-tense -ed can only appear attached to base words. Different levels of constructional abstraction/schematization are similarly evident as we go from the fully lexicalized formula F-word, through the strongly associated collocation four-letter word, the partially schematized slot-and-frame greeting pattern [Good + (time of day)] which renders lexicalized phrases like Good afternoon and Good evening, up to the completely schematic [Adjective + Noun Phrase] construction.

This wide definition of constructions blurs the traditional division between lexicon and grammar, or what generative approaches labeled words and rules: from a construction grammar perspective, a sentence is not the product of applying a rule that strings several words into a particular order, but the product of combining a number of constructions – some simple, some complex, some lexically specific, some abstract – in a particular way. A sentence like ‘What did the company send the complainant?’, for instance, potentially combines the following constructions:

  • complain (v) and -ant (suffix [= one who (v)erbs]) morphological constructions
  • company, complainant, send, what, did, the lexical constructions
  • what did, the company collocational constructions
  • VP, NP constructions
  • Subject–Verb–Object–Object construction
  • Subject–Auxiliary inversion question construction

Cognitive linguistics investigates the ways in which constructions overlap in making meaning (= conspire in a speaker’s memories of meaningful language usage) and the indivisible interplay of language function, structure, learning and usage. Robinson and Ellis () provided an early book-length edited introduction to Cognitive Linguistics and SLA; for more recent collections see Dabrowska and Divjak () and Trousdale and Hoffmann (). An adult’s knowledge of their language(s) is a huge collection of constructions (the ‘constructicon’), which vary in terms of complexity and abstraction. Constructions have emergent properties that specify if and how they can combine with other constructions; these properties are mostly semantically and/or functionally motivated such that constructions can only be combined if their meanings/functions are compatible, or at least can temporarily attain compatibility in a specific context or discourse situation (). Constructional compatibility is crucially solidified by the frequency with which they are used (and therefore, heard) together: the more often they co-occur, the more entrenched that particular arrangement becomes. Likewise, L2 learners will acquire constructions first in the contexts of the constructions that they most often co-occur with in the input before they gradually expand the repertoire of combinations to less frequent combinations and even acceptable novel combinations (; ). The system emerges from usage and its rich structure rationally reflects language usage. This is why language learning can be profitably understood as statistical learning, aka associative learning, or rational contingency learning ().

4 Language learning involves generic cognitive and associative mechanisms

Usage-based theories hold that domain-general cognitive mechanisms drive the learning of linguistic constructions and the emergence of generalizations (). Human cognition rests upon separable complementary systems for implicit and explicit learning (for reviews see ; ; ).

4.1 Explicit learning

Explicit learning is essential in consolidating new explicit memories, particularly episodic or declarative memories that allow us to learn a new vocabulary form and its referent or to recognize new constructions and the objects or events playing out in concurrent perceptual/imaginal world and to bind their features cross-modally (think of the conscious processes involved in your initial fast-mapping of the concept of COVID-19, images of the spikes on the coronavirus, the spoken word-form as you heard the morning news, …). Ellis () analyzes the factors that engender Representation Quality as a result of Embodied, Enacted, Embedded, and Extended (4E) Cognition. Explicit learning requires attention, Schmidt’s () “noticing”, conscious processing of multimodal representations in working memory, neural explicit memory systems involving the hippocampus – “explicit cognitive mediation” (, ).

4.2 Implicit learning

Contrast implicit learning which is essential in tuning our knowledge to attain the competence, fluency, prediction, and idiomaticity of language expertise. Implicit learning occurs in various of our perceptual and motor systems for language – “the implicit ins and outs” (). Ellis () reviewed the evidence of frequency effects in the learning and processing of all levels of language representation: phonology and phonotactics, reading, spelling, lexis, morphosyntax, formulaic language, language comprehension, grammaticality, sentence production, and syntax. Given that we never consciously count our use of different linguistic constructions (I’ve said “a” n million times, “aardvark” thirty seven times, “abandon all hope” three hundred and seven times, …), these effects of usage frequency upon language learning must reflect implicit (i.e., unconscious) learning. Implicit learning rationally tunes us to the likelihood of perceiving things in the world and the things that likely co-occur with them. It gives us rational cognition () – rational in the sense that it allows us to make predictions about what to expect next (“once upon a …”, “they all lived happily ever …”). Usage-based linguistics shows that language cognition rests on thousands of hours of implicit learning from usage from which emerge language-relevant representations and their associations.

4.3 Explicit AND implicit learning

Implicit and explicit knowledge are dissociable but cooperative. Both are necessary. Neither is sufficient. Without implicit learning there is no proficiency nor integration of a construction into the language system (). Without explicit noticing there is no chance (or at least significantly reduced chance) of further implicit learning (). Ellis (, ) reviews various psychological and neurobiological processes by which explicit knowledge of form-meaning associations impacts upon implicit language learning. The interface is dynamic: It happens transiently during conscious processing, but the influence upon implicit cognition endures thereafter. Explicit consolidation of a construction and its subsequent repeated usage/practice/processing/prediction results in better learning outcomes in terms of accuracy/entrenchment/automatization/fluency/breadth/depth/richness/precision/idiomaticity and nativelike selection in collocation and phraseology/proficiency/pragmatic competence.

5 The associative learning of linguistic constructions

Constructions are the symbolic form-meaning mappings that are conventionalized in a speech community. Learning constructions is the learning of these form-meaning associations. A century of research into associative learning has elucidated a number of laws that are as relevant to language learning in humans as they are to classical conditioning and reinforcement learning in animals.

5.1 Associative learning depends upon contingency

Learning associations between cues (forms) and outcomes (meanings) depends upon the contingency of the relationship. In classical conditioning it is the reliability of the bell as a predictor of food that determines the ease of acquisition of this association (). In language learning it is the reliability of the form as a predictor of an interpretation that determines its acquisition and processing (; ; ). Psychological investigations into human sensitivity to the contingency between cues and outcomes () demonstrates that when given sufficient exposure to a relationship, people’s judgments match the contingency specified by ΔP (the one-way dependency statistic, ) which measures the directional association between a cue and an outcome, as illustrated in Table 2.

Table 2

A contingency table showing the four possible combinations of events showing the presence or absence of a target Cue and an Outcome.



No cuecd

a, b, c, d represent frequencies, so, for example, a is the frequency of conjunctions of the cue and the outcome, and c is the number of times the outcome occurred without the cue.

ΔP is the probability of the outcome given the cue P(O|C) minus the probability of the outcome in the absence of the cue P(OC), calculated using this formula:

ΔP=P(O|C)P(O|¬C)= aa+b  cc+d

When the outcome is just as likely when the cue is present as when it is not, there is no covariation between the two events and ΔP = 0. ΔP approaches 1.0 as the presence of the cue increases the likelihood of the outcome. A learnable cue is one where when the cue is there, the outcome is there, and when the cue is not there, neither is the outcome, i.e., where a and d are large and b and c are small.

There are rarely 1:1 mappings between constructional forms and their interpretations. The less reliably a form is associated with a function or interpretation, the more difficult learning becomes (). Cues with multiple interpretations are ambiguous and so hard to resolve; cue-outcome associations of high contingency are reliable and readily processed. Consider how, in the learning of the category of birds, while eyes and wings are equally frequently experienced features in the exemplars, it is wings which are distinctive in differentiating birds from other animals. Wings are important features to learning the category of birds because they are reliably associated with class membership while being absent from outsiders. Raw frequency of occurrence (a, b, c, d in Table 1 when considered independently) is therefore less informative than the contingency between cue and interpretation (a, b, c, d when considered in interaction as ΔP). But of course, one cannot assess contingency without first tallying the raw frequencies.

Low contingency of association will prove to be central to the issue of why L2 learners have especial difficulty in learning morphosyntax: as I explain in section 6.4, it comes as a natural consequence of language change.

5.2 Associative learning depends upon salience

Learnability also depends on salience: less salient cues are less readily learned than highly salient ones (; ). Salience refers to the property of a stimulus to stand out from the rest. Salient items or features are more likely to be perceived, to be attended to, and more likely to enter into subsequent cognitive processing and learning. Salience can be independently determined by physics and the environment, and by our knowledge of the world.

  1. The physical world, our embodiment, and our sensory systems come together to cause certain sensations to be more intense (louder, brighter, heavier, etc.) than others.
  2. As we experience the world, we learn from it, and our resultant knowledge values some associations higher than others. These associations can make a stimulus cue “dear” (Hi Kim, Gabriel, Aspen, Tanner!). A loved one stands out from the crowd, as does a stimulus with weighty associations ($5.00 vs. $0.05, however similar the amount of pixels, characters, or ink in their sensation), or one which matches a motivational state (a meal when hungry but not when full). The units of perception are influenced by prior association (). Psychological salience is experience-dependent: hotdog, sushi, and 寿司 mean different things to people of different cultural and linguistic experience. This is why, contra sensation, the units of perception cannot simply be measured in physical terms. They are subjective. Hence Miller’s definition of the units of perception and short-term memory as “chunks”: “We are dealing here with a process of organizing or grouping the input into familiar units or chunks, and a great deal of learning has gone into the formation of these familiar units” ().

Rescorla and Wagner () presented a formal model of learning which expresses the capacity of any cue (Conditioned Stimulus, CS, for example a bell in Pavlovian conditioning) to become associated with an outcome (Unconditioned Stimulus, US, for example food in Pavlovian conditioning) on any given experience of their pairing. This formula summarized over eighty years of research in associative learning; it elegantly encapsulates the three factors of physical salience, psychological salience, and surprisal. The role of US surprise and of CS and US salience in the process of conditioning can be summarized as follows:


The associative strength of the US to the CS is referred to by the letter V and the change in this strength which occurs on each trial of conditioning is called dV. On the right-hand side, a is the salience of the CS, b is the salience of the US, and L is the amount of processing given to a completely unpredicted, surprising, US. Thus both the salience of the cue (a) and the psychological importance of the outcome (b) are essential factors in any associative learning. As for (LV), the more a CS is associated with a US, the less additional association the US can induce. As Beckett () put it: “habit is a great deadener”. Alternatively, with novel associations where V is close to zero, there is much surprisal, and consequently much learning: first impressions, first love, first time…

This is arguably the most influential formula in the history of learning theory. Physical salience, psychological salience, and surprisal interactively affect what we learn from our experiences of the world. I recognize that some of this content concerning classical conditioning and associative learning theory might seem foreign to some readers from a pure linguistics background, but to understand the A in SLA, there is value to be had in open-minded exploration of the parallels between cues and outcomes (the CSs and USs of learning theory) and form-meaning relations (constructions in theories of cognitive linguistics and construction grammar; signifiers and signifieds in Saussurian linguistics).

Low salience will prove to be central to the issue of why L2 learners have especial difficulty in learning morphosyntax: as I explain in sections 6.2 and 6.3, it comes as a natural consequence of language change.

5.3 The routine determinants of construction learning

Contingency, salience, and surprisal affect the learning of all types of association in all cognitive species. Linguistic constructions have particular content that modulates these factors, as has been extensively demonstrated, for example, in the learning of words. An early study of the psycholinguistic factors involved in foreign-language vocabulary learning by Ellis and Beaton () showed contributions of phonological regularity, semantic content, word class, imageability of concept, word frequency, meaningfulness, orthographic factors, word length, and familiarity of grapheme-to-phoneme mappings. Recent large-scale studies (; ; ) support the relevance of these factors and add others such as contextual diversity and contextual distinctiveness, age of acquisition, and neighborhood density. Phonological and orthographic factors and word-length contribute to the salience of the word form; while meaningfulness, concreteness, imageability (), grounding in perceptual symbol systems (), and contextual factors contribute to the salience of the function. These play out in each and every experience of their pairing.

Usage provides many cumulative experiences of related constructions, and these experiences conspire in the emergence of the language system. The frequency and variability of these experiences determines how well they are learned and entrenched as constructions and concepts (). There are well-established laws relating the effects of frequency upon acquisition (‘the power law of practice’, ), as well as the effects of time interval upon retention (‘the forgetting function’, ). Likewise, we understand how the balance of type- and token-frequency and the proportion of ‘friends: enemies’ affects the balance of item-learning and generalization: high token-frequency leads to item knowledge (which allows the survival of high-frequency ‘irregular’ items that go against the flow); high-type frequency leads to generalization and productivity (; ).

5.4 Connectionist leaning and the frequency by regularity interaction

Many of these regularities have been modelled using connectionist models in which generalizations emerge from the conspiracy of usage events (; ; ; ; ). For example, Ellis and Schmidt () investigated adult learning of morphosyntax in a novel language where frequency and regularity were factorially combined. They were particularly interested in frequency by regularity interactions in the learning and processing of morphosyntax – effects which had become a crux issue in the debate between single-system connectionist accounts and dual-mechanism hybrid accounts which posit that regular inflections are computed by an affixation rule in a neurally based symbol-manipulating syntactic system, while irregular verbs are retrieved from an associative memory (). Ellis and Schmidt demonstrated frequency effects for both regular and irregular forms early in the acquisition process. However, as learning progressed, the frequency effect on regular items diminished whereas it remained for irregular items. Performance of a simple connectionist system, when trained on the same materials, showed a close correspondence to the human acquisition data. Ellis and Schmidt showed that the regularity by frequency interaction is a natural consequence of the power law of practice, and thus is entirely consistent with associative learning processes.

5.5 The associative-cognitive CREED

In sum, there are well-attested laws of associative learning and cognition which have reliable and robust consequences for the learning of lexical and morphosyntactic constructions. Ellis () gathered these ideas into a framework, the ‘Associative-Cognitive CREED’, which promotes that language acquisition is Construction-based, Rational, Exemplar-driven, Emergent, and Dialectic. Subsequent research has demonstrated the relevance of this approach to the learning and processing of vocabulary, morphosyntax, multi-word sequences and formulaic language, and verb-argument and other syntactic constructions in L1 and L2 (; ; ; ). Just as these phenomena affect everyday psycholinguistic processing, so they affect the shape of language too (): “Language and cognition are like the shore-line and the sea” (). We turn next to these matters of language change.

6 The linguistic cycle: Language change as a function of usage

Languages change over time as their native speakers, sharing similar conventionalized codes, do their well-practised dialogic dance; practice has made perfect: They know the steps and transitions inside out, and they perform their discourse automatically, concentrating on meanings, while the associated forms play out naturally, unhindered by conscious control. The study of language change shows that, whatever the language, whoever its current speakers, languages change in a variety of law-like ways that are as regular as the laws of other ecologies and complex adaptive systems.

6.1 High frequency of use leads to chunking and formulaic patterns

The basic principles of skill-acquisition that apply to all kinds of motor activities and productions (like playing a musical instrument, a sport, cooking, or learning a new song) are that, through repetition, sequences of units that were previously independent come to be processed as a single unit or chunk (, ). “Words used together fuse together” () (after Hebb’s () research often summarized by the phrase “Cells that fire together, wire together”). These processes result in the memorized formulaic sequences, collocations, idioms, and phraseology that pervade language (, ; ; ). Repeated practice also leads to their proceduralization, automatization and fluency of production (; ; ).

6.2 High frequency of use leads to shortening — Zipf’s law

The more frequently they use a form or chunk, the more speakers abbreviate it: this is a law-like relationship across languages (; ). Zipf () summarized this in the principle of least effort – speakers want to minimize articulatory effort and hence encourage brevity and phonological reduction. They tend to choose the most frequent words, and the more they use them, automatization of production causes their shortening. Highly frequently used words become shorter and phonologically eroded.

6.3 Zipf’s law particularly impacts grammatical functors

Grammatical functors are the most frequent words of a language. In informal and rapid speech, this tendency to give short shrift to function words and bound morphemes, exploiting their frequency and predictability, deforms their phonetic structure and blurs the boundaries between these morphemes and the words that surround them. Of the strong syllables in a corpus examined by Cutler and Carter (), 86% occurred in open class words and only 14% in closed-class words. The pattern was reversed for weak syllables, with 72% in closed-class words and 28% in open-class words. Thus grammatical function words and bound inflections tend to be short and low in stress, even in speech that is produced slowly and deliberately () or in speech directed to children (), with the result that these cues are difficult to perceive.

6.4 High frequency of use also leads to homophony and ambiguity

Ambiguity is a loss of communicative capacity that arises if individual sounds are linked to more than one meaning. The most frequent words of the language tend also to be the most ambiguous ones (). This is a natural result of shortening, where initially-distinct longer words erode down to crash into the same sound-form in a tightly-packed space of alternatives, resulting in various different meanings being hung onto the same short sound. Many of the most frequently used words of English are ambiguous in their homophony and polysemy (e.g., to, too, two; there, their, they’re; I, eye, aye). Morphemes too: the 3rd person present –s morpheme is a remnant of a much fuller verb inflectional system in Old English where all verb forms (all numbers and persons) took inflectional endings; we see remnants of this up through Early Modern English in expressions such as “what sayest thou?”. All these inflectional endings have been lost except the 3rd person –s (see, e.g., ). As noted in section 1, many varieties of English (including African American English and many world varieties of English) have already lost this –s as well ().

This pattern generalizes across languages: the greater the number of monosyllabic words in the lexicon of a language, the greater the degree of homophony ().

6.5 Grammaticalization

Words which frequently co-occur together come to be cognitively processed as single chunks and then evolve to be individual words. Frequent usage typically encourages four processes of change: Desemanticization – broadening or abstraction of meaning or content; Extension – use in new contexts; Decategorialization – loss of morphosyntactic properties; and Erosion – loss of phonetic substance (). Hopper and Traugott’s famous pattern for the cline of grammaticalization illustrates the various stages of the form:

content word → grammatical word → clitic → inflectional affix.

This is part of the very predictable “Linguistic Cycle” (; ) whereby a notion that is first expressed in

discourse → syntax → morphology → morphophonemics → zero.

The extensive scholarship on language change and grammaticalization (e.g., ; ; ; ) deserves much more than the cursory treatment that is possible here, but I hope that for our present purposes of placing L2 difficulties with morphosyntax in ecological perspective, it suffices to recognize the universality of the Linguistic Cycle: “The mechanisms and principles involved in grammaticalization conform to a complex process of coding and organization of language which is universally applicable to describe the evolution of grammatical forms” (). These universal principles emerge from dynamic processes of cognition and diachrony: “For a theory of grammaticalization, it is both unjustified and impractical to maintain a distinction between synchrony and diachrony,” (), and of usage, discourse, and social-interaction: “Grammar is not absolutely formulated and abstractly represented, but always anchored in the specific form of an utterance… Its forms are not fixed templates, but are negotiable in face-to-face interaction in ways that reflect individual speakers’ past experience of these forms, and their assessment of the present context,” (). “Grammar is always emergent and never present” ().

6.6 The Linguistic Cycle as a Panchronic Principle

From these dynamic processes over all diachronic timescales and all synchronic states, there emerge what Saussure () termed Panchronic principles, generalizations of language that exist independently of time, of a given language, or of any concrete linguistic facts. There are indeed laws of language change: Like the laws of the natural sciences, they are emergent too.

7 Language meets learning in conspiring to make morphology especially difficult to learn

Morphosyntax pervades language and so these frequency-driven principles of language change (section 6) particularly impact morphological constructions, reducing their learnability following the routine determinants (section 5). Let me fill in a little further detail.

7.1 Morphology – low contingency

One factor determining the learning of construction form is contingency of association (section 5.1). Cue-outcome reliability can be reduced in two directions: forms can have multiple interpretations (polysemy and homophony) and interpretations can be realized by more than once form (synonymy). The same usage-phenomenon whereby frequently used words become shorter drives grammatical functors towards homophony since different functions associated with forms that were originally distinct eventually merge into the same shortened form. An example is the [s] suffix in English: in modern English, it has come to encode a plural form (kids), it indicates possession (Tanner’s guitar), it marks third person singular present (Aspen thinks); this in addition to the many English words that just happen to end in -s. The [s] form is abundantly frequent in learners’ input, but not reliably associated with any/just one of these meanings/functions (increasing b in Table 2). Conversely, the plural, possessive, and third person singular constructions are all realized by more than one form: they are all variably expressed by the allomorphs [s], [z], and [ɨz]. If we evaluate just one of these, say [ɨz], as a cue for one particular outcome, say plurality, then it is clear that there are many instances of that outcome in the absence of the cue (c in Table 2). Thus, the low cue-interpretation contingency makes plural -s difficult to learn.

This fact, that many high frequency grammatical constructions are highly ambiguous in their interpretations, poses a challenge to language learners (; ; ).

7.2 Morphology – low salience

Another factor determining the learning of construction form is salience (section 5.2). In his landmark study of first language acquisition, Brown () breaks down the measurement of perceptual salience, or “clarity of acoustical marking” (p. 343), into “such variables as amount of phonetic substance, stress level, usual serial position in a sentence, and so on” (p. 463). Prepositional phrases, temporal adverbs, and lexical linguistic cues are salient and stressed in the speech stream. Verb inflections are usually not. Many grammatical form-function relationships in English, like grammatical particles and inflections such as the third person singular -s, are of low salience in the language stream.

Because grammatical function words and bound inflections are short and unstressed, they are difficult to perceive from the input. When grammatical function words (by, for, no, you, etc.) are clipped out of connected speech and presented in isolation at background noise levels where their open-class equivalents (buy, four, know, ewe, etc.) are perceived 90 to 100% correctly, adult native speakers can recognize them only 40% to 50% of the time (). Clitics, accent-less words or particles that depend accentually on an adjacent accented word and form a prosodic unit together with it, are the extreme examples of this: the /s/ of ‘he’s’, /l/ of ‘I’ll’ and /v/ of ‘I’ve’ can never be pronounced in isolation.

These factors make grammatical functors extremely difficult to perceive from bottom-up auditory evidence alone. Fluent language processors can perceive these elements in continuous speech because their language knowledge provides top-down support. But this is exactly the knowledge that L2 learners lack: they haven’t had sufficient experience to develop a sufficiently schematized knowledge system (constructicon) that would offer the same levels of top-down support as in fluent L1 processing. Thus the low psychophysical salience of grammatical functors contributes to L2 learners’ difficulty in learning them ().

7.3 Redundancy

These effects of low salience and low contingency are compounded by redundancy. Grammatical morphemes often appear in redundant contexts where their interpretation is not essential for correct interpretation of the sentence (; ; ). Tense markers often appear in contexts where other cues have already established the temporal reference (e.g., “yesterday Gabriel walked…”, “Last winter, there was so much snow, Kim and the kids sculpted an ice-horse”), plural markers are accompanied by quantifiers or numerals (e.g, “3 kids…”), etc. Hence their neglect does not result in communicative breakdown, they carry little psychological importance of the outcome (term b in the Rescorla-Wagner equation), and the Basic Variety “satisfices” () for everyday communicative purposes.

In terms of linguistic constructions and the difficulty distance separating a novice L2/FL learner from an L1 speaker, it is in morphological constructions where the gap is largest. The high frequency of L1 morphology entails maximal L1 automaticity; the lack of salience and contingency of morphology makes it maximally difficult for an L2 learner to perceive and analyze in terms of its associated meanings and functions.

7.4 Enough, though there is more

The dynamics described in this section whereby universal laws of learning interact with panchronic universals of linguistic construction are sufficient to understand why morphosyntax is more difficult to acquire than open class words. These factors apply to any learner. But there is yet more that particularly prejudices the learning of morphosyntax in second language learners. The way we perceive the world depends upon our habits of attention: new cognition builds upon cumulative prior cognition. So, L2 learners have L1-biased learned attention and L1-tuned automatized processing of language that contribute to the blocking of morphosyntax being implicitly learned by adult naturalistic L2 learners whose attentional focus is on communication. This is the subject of section 8.

8 Learned attention, blocking, transfer

L2A is subject to attentional biases which result from L2 learners’ knowledge of a prior language and the routine ways they have come to attend and process it. Transfer effects are evidenced in all aspects (phonological, morphological, lexical, collocational, syntactic, and pragmatic) of language learning ().

8.1 Blocking

Ellis (, ) presents a theoretical analysis which attributes L2 difficulties in acquiring inflectional morphology to an effect of learned attention known as “blocking” (; ; ). Blocking is an associative learning phenomenon, occurring in animals and humans alike, that shifts learners’ attention to input as a result of prior experience (; ).

Knowing that a particular cue is associated with a particular outcome (such as past temporality) makes it harder to learn that another cue (e.g., L2 English -ed), subsequently paired with that same outcome, is also a good predictor of it. The prior association “blocks” further associations. ALL languages have lexical and phrasal means of expressing temporality (the equivalents of lexical adverbs (e.g., now, next, yesterday, tomorrow), prepositional phrases (in the morning, in the future), and calendric reference (May 12, Monday), etc.). So ANYONE with knowledge of ANY first language is aware that that there are reliable and frequently used lexical cues to temporal reference (words like German gestern, French hier, Spanish ayer, English yesterday). Such are cues to look out for in an L2 because of their frequency, their reliability of interpretation, their simplicity, and their salience. In face-to-face “here and now” communication, there are also available other gestural and pragmatic means of expressing temporality (e.g., serialization: presenting events in their order of occurrence along with appropriate pointing or counting off of fingers). Learned attention theory holds that, once known, such cues block the acquisition of less salient and less reliable verb tense morphology from analysis of redundant utterances such as Yesterday I walked.

8.2 Experimental demonstrations of blocking

A series of experimental investigations involving the learning of a small number of Latin expressions and their English translations have explored the basic mechanisms of learned attention in SLA. Ellis and Sagarra () illustrates the core design. There were three groups: Adverb Pretraining, Verb Pretraining, and Control. In Phase 1, Adverb Pretraining participants learned two adverbs and their temporal reference – hodie today and heri yesterday; Verb Pretraining participants learned verbs (shown in either first, second, or third person) and their temporal reference – e.g., cogito present or cogitavisti past; the Control group had no such pretraining. In Phase 2, all participants were shown sentences which appropriately combined an adverb and a verb (e.g., heri cogitavi, hodie cogitas, cras cogitabis) and learned whether these sentences referred to the past, the present, or the future. In Phase 3, the Reception test, all combinations of adverb and verb tense marking were presented individually and participants were asked to judge whether each sentence referred to the past, present, or future. The logic of the design was that in Phase 2 every utterance contained two temporal references – an adverb and a verb inflection. If participants paid equal attention to these two cues, then in Phase 3 their judgments should be equally affected by them. If, however, they paid more attention to adverb (/verb) cues, then their judgments would be swayed towards them in Phase 3.

The results showed that the three groups reacted to the cues in very different ways – the Adverb pretraining group followed the adverb cue, the Verb pretraining group tended to follow the verb cue, and the Control group lay in between. For example, multiple regression analyses, one for each group, where the dependent variable was the group mean temporal interpretation for each of the Phase 3 strings and the independent variables were the information conveyed by the adverbial and verbal inflection cues showed in standardized ß coefficients, Adverb Group Time = 0.99Adverb – 0.01Verb; Control Group Time = 0.93Adverb + 0.17Verb; Verb Group Time = 0.76Adverb + 0.60Verb.

This experiment demonstrated how short-term instructional manipulations could affect attention to language.

8.3 Blocking and language-specific transfer effects

Ellis and Sagarra () Experiment 2 and Ellis and Sagarra () Experiments 2 and 3 also illustrated long-term language transfer effects whereby the nature of learners’ first language (+/– verb tense morphology) biased the acquisition of morphological vs. lexical cues to temporal reference in the same subset of Latin. First language speakers of Chinese (no tense morphology) were less able than first language speakers of Spanish or Russian (rich morphology) to acquire inflectional cues from the same language experience where adverbial and verbal cues were equally available, with learned attention to tense morphology being in standardized ß coefficients: Chinese (–0.02) < English (0.17) < Russian (0.22) < Spanish (0.41) (). These findings demonstrate long-term attention to language, a processing bias affecting subsequent cue learning that comes from a lifetime of prior L1 usage.

Ellis, Hafeez, et al. () replicated Ellis & Sagarra () in demonstrating short-term learned attention in the acquisition of temporal reference in L2 Latin in EFL learners, extending the investigation using eye-tracking indicators to determine the extent to which these biases are overt or covert. Eye-tracking measures showed that prior experience of particular cue dimensions affected what participants overtly focused upon during subsequent language processing, and how, in turn, this overt study resulted in covert attentional biases in comprehension and in productive knowledge. These learned attention effects have elements of both positive and negative transfer. Prior use of adverbial cues causes participants to pay more attention to adverbs – positive effects of entrenchment of the practiced cue. Additionally, increased sensitivity to adverb cues is accompanied by a reduced sensitivity to morphological cues – blocking. A meta-analysis of the combined results of Ellis and Sagarra (, ) demonstrated that the average effect size of entrenchment was large (+1.23) and that of blocking was moderate (–0.52).

While these learned attention demonstrations concern the first hour of learning Latin, Sagarra and Ellis () show the results of blocking over years of learning in intermediate and advanced learners of Spanish. 120 English (poor morphology) and Romanian (rich morphology) learners of Spanish (rich morphology) and 98 English, Romanian and Spanish monolinguals read sentences in L2 Spanish (or their L1 for the monolinguals) containing adverb-verb or verb-adverb congruencies/incongruencies. Eye-tracking data revealed significant effects for sensitivity (all participants were sensitive to tense incongruencies), cue location in the sentence (participants spent more time at their preferred cue), and L1 experience (morphologically rich L1 learners and monolinguals looked longer at verbs than morphologically poor L1 learners and monolinguals).

8.4 Learned attention and transfer in L2 morphology

Transfer phenomena pervade SLA (; ; ; ). As a result, second language learning is rarely entirely native-like, even if the learner is surrounded by ambient input. Since everything is filtered through the attentional lens of the L1, not all of the relevant input is in fact taken advantage of (hence Corder’s () classic distinction between input and intake).

One of the most extensive investigations of transfer in L2 morphology by Murakami and Alexopoulou () investigated the L2 acquisition order of six English grammatical morphemes by learners from seven L1 groups across five proficiency levels who provided approximately 10,000 written exam scripts from the Cambridge Learner Corpus. The study established clear L1 influence on the absolute accuracy of morphemes and their acquisition order, and showed that L1 influence is morpheme specific, with morphemes encoding language-specific concepts most vulnerable to L1 influence.

It is important to emphasize that the limitations of L2 learning described here do not license the conclusion that L2 learning is qualitatively different from L1 learning – second language learners employ the same statistical learning mechanisms that they employed when they acquired their first language. First language learners have learned to attend to their language environment in one particular way. L2 learners are tasked with reconfiguring the attentional biases of having acquired their first language ().

9 Form-focussed instruction in L2A

9.1 Form-focused instruction (FFI)

The fact that L2 learners must learn to adjust their L1-shaped attention biases has consequences for effective L2 instruction. Schmidt’s () Noticing Hypothesis holds that conscious attention to linguistic forms in the input is an important precondition to L2 learning: “people learn about the things they attend to and do not learn much about the things they do not attend to” (). In order to successfully acquire specific aspects of their L2, learners must pay conscious and selective (i.e., focused) attention to the target structures (). This holds in particular for the morphosyntactic forms in the L2 that are redundant and/or lack perceptual salience. Form-focused Instruction (FFI) attempts to encourage noticing, drawing learners’ attention to linguistic forms that might otherwise be ignored (, , , ; ).

Norris and Ortega’s () milestone meta-analysis comparing the outcomes from studies that employed differing levels of explicitness of L2 input demonstrated that FFI instruction results in substantial target-oriented L2 gains, that explicit types of instruction are more effective than implicit types, and that the effectiveness of L2 instruction is durable. More recent meta-analyses of effects of type of instruction by Spada and Tomita () and Goo, Granena, Yilmaz, and Novella () likewise report large advantages of explicit instruction in L2 acquisition. Variants of FFI vary in the degree and the manner in which they recruit learner consciousness and in the role of the learner’s metalinguistic awareness of the target forms (; ).

9.2 How FFI overcomes blocking: Process and processing analyses at the interface

Cintrón-Valentín and Ellis () used eye-tracking to investigate the attentional processes whereby different types of FFI overcome learned attention and blocking in learners’ online processing of L2 input. English and Chinese (no L1 verb-tense morphology) native speakers viewed Latin utterances combining lexical and morphological cues to temporality under control conditions (CC) and three types of explicit Focus on Form (FonF): verb grammar instruction (VG), verb salience with textual enhancement (VS), and verb pretraining (VP). All groups participated in three phases: exposure, comprehension test, and production test. VG participants viewed a short lesson on Latin tense morphology prior to exposure. VS participants saw the verb inflections highlighted in bold and red during exposure. VP participants had an additional introductory phase where they were presented with solitary verb forms and trained on their English translations. When the verb is presented on its own like this, rather than in potentially redundant combination with adverbial cues, there is less scope for blocking. CC participants were significantly more sensitive to the adverbs than verb morphology. Instructed participants showed greater sensitivity to morphological cues in comprehension and production. Eye-tracking revealed how FonF engages learners’ attention to cues which might otherwise be ignored during online processing and thus modulated long-term blocking of verb morphology.

Cintrón-Valentín and Ellis () extended these investigations of three aspects of salience: the physical form of language, learner associative experience, and instructional focus on form (FFI). Experiment 1 replicated the 2015 findings in Chinese native speakers and revealed the attentional processes whereby learners’ prior linguistic experience can shape their attentional focus toward cues in the input, and by which FFI helps learners overcome the long-term blocking of verbal morphological cues.

Experiment 2 additionally examined the role of modality of input presentation – aural or visual – in L1 English learners’ attentional focus on morphological cues and the effectiveness of different FFI manipulations. Again, participants learning under CC showed greater sensitivity toward the adverb than the verb cues. FFI was effective in increasing attention to verbal morphology, and learning morphological cues was considerably more difficult under aural than under visual presentation. The most effective FFI was grammar instruction. The effectiveness of morphological salience-raising varied across modality: VS was effective under visual exposure, but not under aural exposure.

This study showed that the visual modality of instruction was particularly effective in focusing learners’ attention upon low-salience forms by means of textual enhancement to facilitate the consciousness-raising that allows their initial apprehension and consolidation. There is now a wide body of research confirming these effects of multimodal input and the effectiveness of captioning (; ; ; ; ; ).

9.3 Explicit AND Implicit L2 morphosyntax acquisition

Such results demonstrate how salience in physical form, learner attention, and instructional focus all variously affect the success of L2 acquisition. Form-focused instruction recruits learners’ explicit, conscious processing capacities and allows them to consolidate unitized form-function bindings of novel L2 constructions. That’s a good and necessary start, but it’s only a precondition: there’s a lot more usage needed yet to achieve “nativelike selection and nativelike fluency” (). Once a construction has been explicitly represented, its use in subsequent implicit processing can update the statistical tallying of its frequency of usage and probabilities of form-function mapping. There needs be lots of this.

10 Learning a particular morpheme

We have considered usage-based theories concerning how domain-general cognitive mechanisms drive the learning and generalization of linguistic constructions and how acquisition is modulated by factors affecting attention and memory, including exemplar type- and token-frequency, contingency of form-function mapping, salience of form and of function, and the proportion of friends: enemies in quasi-regular domains, etc. We have also seen that the acquisition of a morpheme such as the 3rd-person singular -s can take five years or more to go from zero to 80% provision in obligatory contexts for ESL children (). Five years of English usage involves many thousands of receptive experiences of high frequency functional morphemes, and many thousands of contexts requiring their productive use, yet provision is variable. This suggests that the system is learned incrementally, and that regularities/generalization/productivity emerge from the combined experience of usage.

But are all experiences of the morpheme equally potent or, for any given morpheme, is it the case that some exemplars are more easily recognized in the input and produced earlier in acquisition? If so, what are these exemplars that are more likely to be learned early and preferentially processed? And why these bellwethers? Are they special in their distributional statistics, for example, in terms of their frequency, or their form-function contingency, or their formulaicity? Are they special in their meanings or functions?

More specifically, in the five years during which L2 learners are learning to produce 3rd-person singular -s, do experiences of particular -s inflected verbs play a role in the acquisition of the system more than others? Likewise, for the even more extended period during which L2 learners are learning to produce regular past-tense -ed, are particular -ed inflected verbs more potent exemplars than others? Two recent research projects have investigated these questions separately (1) for online processing in listening and speaking, and (2) for controlled conscious processing for written composition.

10.1 Automatic online processing in listening and speaking

Guo and Ellis () investigated how statistical distributions at different linguistic levels – morphological and lexical (Experiments 1 and 2), and phrasal (Experiment. 2) – contribute to the ease with which morphosyntax is processed and produced by second language learners. We analyzed Chinese ESL learners’ knowledge of four English inflectional morphemes: -ed, -ing, and 3rd-person -s on verbs, and plural -s on nouns. In Elicited Imitation Tasks, participants listened to length- and difficulty-matched sentences each containing one target morpheme and typed the whole sentence as accurately as they could after a short delay. Experiment 1 investigated lexical and morphemic levels, testing the hypotheses that a morpheme is expected to be more easily processed when it is 1) highly available (i.e., occurring in frequent word-forms), and 2) highly contingent (i.e., occurring in lemma words that are consistently conjugated in the form containing this morpheme). Thirty sentences were made for each morpheme, divided into three Availability-Reliability Distribution (ARD) groups on the basis of corpus analysis in the Corpus of Contemporary American English (COCA; ): 10 target words high in availability, 10 high in contingency, and 10 low in both contingency and availability. Responses were scored on whether the target morpheme was accurately reproduced given the provision of the correct lemma. Generalized linear mixed-effects logit models (GLMM) revealed significant effects of morpheme type, availability, and contingency on the accuracy of morpheme provision. There were no effects of lemma frequency – this is a common finding in psycholinguistic studies of morphosyntax processing and it carries important theoretical implications that argue against the idea that regular inflections are computed by procedural application of an affixation rule in a neurally based symbol-manipulating syntactic system.

Experiment 2 successfully replicated these results and extended the investigation to explore phrasal formulaicity by manipulating the frequency of the 4-word strings in which the morpheme was embedded. GLMMs replicated the effects of word-form availability and contingency and additionally revealed independent phrase-superiority effects where morphemes were better reproduced in phrasal contexts of higher string-frequency. These are the phrasal equivalents of word-superiority effects () whereby recognition of a letter is more accurate when it is part of a meaningful word than when it is alone. Taken together, these findings demonstrated that morpheme acquisition reflects the distributional properties of learners’ experience and the mappings therein between lexis, morphology, phraseology, and semantics. They support an emergentist view of the statistical symbolic learning of morphology where language acquisition involves the satisfaction of competing constraints across multiple grain-sizes of units.

10.2 Controlled conscious processing for written composition

Murakami and Ellis () investigated whether the accuracy of grammatical morphemes in second language (L2) learners’ writing is associated with these same types of usage-based distributional factors in EF-Cambridge Open Language Database (EFCAMDAT; ), a partially error-tagged large-scale longitudinal learner corpus which includes approximately 1.2 million writings by 175,000 learners (147 million words) in a wide range of nationality groups.

Specifically, we examined whether the accuracy of L2 English inflectional morphemes is associated with the availability (i.e., token frequency) and contingency (i.e., token frequency relative to other forms with the same lemma) of the inflected word form, as well as the formulaicity of the context it occurs in (i.e., predictability of the form given the surrounding words). Data drawn from the learner corpus indicated that contingency is a robust predictor of accuracy of morpheme provision in written composition and that its relationship with accuracy does not necessarily lessen when learners’ proficiency rises. Contrary to online processing, availability and formulaicity were not generally identified as predictors of accuracy of morpheme production in writing.

10.3 Morphemes are better processed with lemmas reliably conjugated in this form in the language

These two studies demonstrate that the ways various psycholinguistic factors determine morphosyntax accuracy can depend upon the task demands (automatic vs. controlled; reception vs. production), with memorized associative knowledge having more effect in online tasks. However, both studies suggest that morphemes are more easily processed when they occur with lemmas that are reliably conjugated in this form.

10.4 Why Reliability, particularly?

Why is reliability of association a more potent determinant of acquisition than availability? We can make sense of this from the three different perspectives adopted here (1) learning theory (see sections 4 and 5), (2) cognitive linguistics (section 3), and (3) SLA theory (sections 8, 9, 10).

  1. Associative learning theory demonstrates that contingency of association trumps mere frequency (as described in section 5.1). In operationalizing reliability in the Guo and Ellis () and Murakami and Ellis () studies, we focused upon how likely it is that a linguistic cue (a morpheme) reliably co-occurs with another (a lemma). But morphemes and lemmas are more than mere forms, they are linguistic constructions with particular functions and meanings: they are symbolic.
  2. Cognitive linguistic theories of construction grammar (section 3) view lexical, morphological, and syntactic forms as symbolic form-function pairings and hold that we learn language from usage. When learners are processing usage, they are tallying the associations between forms, between interpretations, and between forms and their interpretations. Verbs have interpretations. Morphemes have interpretations. Verbs and morphemes can be more or less reliably associated. The matrix of association goes beyond mere forms; in full it involves:

    The original “thought-sound” () better illustrates and expresses the dynamics.
  3. Functional theories of SLA emphasize the interplay of form and meaning in acquisition. One much-researched example for morphology is the Aspect Hypothesis (AH) (; ; ). The AH builds on three main constructs: tense, grammatical aspect, and lexical aspect. Tense establishes the location of an event in time with respect to the moment of speech or some other reference point. Grammatical aspect allows for “ways of viewing the internal temporal constituency of a situation” (). For instance, in English, a contrast in grammatical aspect is found between simple past “Phoebe walked” and past progressive “Phoebe was walking”. In contrast, lexical aspect refers to semantic differences in verbs and their arguments (), such as whether a predicate has inherent duration like “walk”, “sleep”, and “kid”, is punctual like “recognize”, “broke”, or “sigh”, or has elements of both duration and culmination like “walk a mile” and “paint a picture”. The AH predicts that “second language learners will initially be influenced by the inherent semantic aspect of verbs or predicates” (). “In its simplest form, the AH for SLA predicts that in the initial stages of the acquisition of tense-aspect morphology by adults, the acquisition of past morphology will be influenced by lexical aspectual categories. Namely, verbal morphology will be attracted to and will occur with predicates with similar semantics. Perfective past will occur with telic predicates (predicates with inherent endpoints); in contrast, imperfective will occur with unbounded predicates, and progressive will occur with ongoing activities” (). Bardovi-Harlig & Comajoan-Colomé conclude from their review of perhaps thirty different studies of the AH over the last twenty years that the AH accurately predicts the adult L2 acquisition of past morphology in a number of languages.

Guo and Ellis () and Murakami and Ellis () demonstrated effects of distributional learning – particularly the privileged processing of reliably associated lemma-morpheme pairings – on morphology acquisition. The question that naturally follows is to wonder why language is distributed this way. Why do particular lemmas appear in the language more reliably associated with particular morphemes? Cognitive linguistics more generally, and the AH in particular, suggest that for the case of tense-aspect morphology, there are semantic and functional motivations. Likewise, for noun number, we suspect that inherent number, pluralia tantum, and prototypically plural count nouns might lead the way. There is good reason and plenty of scope to study a broad range of morphology in a range of languages in this way.

10.5 MEANING: A morpheme shall be known by the company it keeps

A lesson from vocabulary learning (section 5.3) is that words with more meaningful, imageable, concrete, and salient referents are those that are more easily learned. A lesson from the cognitive grammar of verb-argument constructions (section 3) is that those exemplars that are closest to the construction meaning prototype, high in the Zipfian frequency profile thus to take the lion’s share of experiences, and well-connected in the semantic network, are more easily learned (; ; ). The results reviewed here in section 10 suggest a similar lesson concerning semantic motivations in the learning of morphosyntax.

Whereas lexical constructions get much of their meanings directly from being grounded in rich, imageable, multimodal experiences, this is less true for morphosyntax which has been subject to prior grammaticalization processes of Desemanticization and Extension (section 6.5). However, the unit of meaning in language goes well beyond individual words – as corpus linguistics has shown us, ultimately it lies in “the phrase, the whole phrase, and nothing but the phrase” (); (see further ; , ; ). Thus individual words derive further meaning from their collocations and contexts: “You shall know a word by the company it keeps” (). Elman (, ) reviews psycholinguistic research on the dynamics of sentence processing, and proposes that, rather than words having meaning, as in traditional views of the mental lexicon, they are better viewed as cues to meaning in the interactive conspiracy with the other available cues as they unfold in sentence processing: a word’s ‘meaning’ lies in the dynamic causal effects it can have on mental states. In the same way, it is from experience of the lemmas with which they have high contingency of association and their meaning correlates in phrasal contexts that morphemes derive their status as cues to meaning: Morphemes become known by the company they keep.

11 Conclusion

Everything discussed here warrants replication, clarification, and further investigation. Much of it is Anglocentric and needs extending cross-linguistically. Some details are possibly wrong, or wrong in their degree of generalization. There is plenty to do.

I began by framing this piece in terms of Language as a Complex Adaptive System, something the Five Graces group worked on together in 2009: “Cognition, consciousness, experience, embodiment, brain, self, human interaction, society, culture, and history are all inextricably intertwined in rich, complex, and dynamic ways in language. Everything is connected. Yet despite this complexity, despite its lack of overt government, instead of anarchy and chaos, there are patterns everywhere. Linguistic patterns are not preordained by God, genes, school curriculum, or other human policy. Instead, they are emergent—synchronic patterns of linguistic organization at numerous levels (phonology, lexis, syntax, semantics, pragmatics, discourse, genre, etc.), dynamic patterns of usage, diachronic patterns of language change (linguistic cycles of grammaticalization, pidginization, creolization, etc.), ontogenetic developmental patterns in child language acquisition, global geopolitical patterns of language growth and decline, dominance and loss, and so forth. We cannot understand these phenomena unless we understand their interplay” (). I have tried here to fill in some relevant detail regarding L2 morphology learning.

I recognize that these studies focus on but a grain of sand on the morphosyntactic shoreline where the seas of cognition and social usage meet the lands of linguistic structure in the languages of the world, now, past, future, synchronic, diachronic, and panchronic. In the header of section 2 of this paper, in its subtitle, I misquoted Wittgenstein () “How small a thought it takes to fill a whole life”. It seems that many facets of language usage, learning, structure, and change are coherently reflected in this -s. As I have explained, L2 learners of English are less likely to have spotted the missing morpheme than L1 speakers. Still, not much of any consequence could stem from the omission. I’m sure no one was prevented from getting at Wittgenstein’s original intent, one sublimely celebrated in Reich’s () Proverb.