1. Introduction

Recent years have seen renewed interest in language learning aptitude research in the context of additional or second-language (L2) learning. On the theoretical side, the role of working memory (WM) as a potential component of aptitude continues to be debated. Moreover, the proposal of distinct aptitudes for explicit and implicit learning is a current focus point that is of immediate relevance to L2 researchers, given the recognition that both explicit and implicit knowledge and learning are implicated in the attainment of L2 proficiency. No study to date has brought together all of the above strands by including measures of aptitude for explicit learning and implicit learning and phonological and executive WM in a single research design in order to scrutinize the relationships between these variables. Probing their capacity to predict L2 proficiency assessed in terms of learners’ grammar knowledge, reading, and listening skill is also required. This is what the present study set out to do.

2. Background

2.1. Language learning aptitude and WM as predictors of L2 attainment

Language learning aptitude refers to a set of cognitive and perceptual abilities that facilitate fast and easy learning of new languages (). The classic model of aptitude () comprises phonetic coding ability, associative memory, and language-analytic ability (). While the predictive power of aptitude has been found to be superior to that of factors such as WM (), motivation () or anxiety (), meta-analytic research has shown that different components of aptitude differentially predict L2 skills such as listening, reading, speaking and grammar (). In particular, language-analytic ability strongly predicts L2 grammar learning and reading comprehension, and phonetic coding ability has been found to be a good predictor of vocabulary learning and general L2 proficiency ().

The distinction between aptitude for explicit learning and aptitude for implicit learning is relatively recent. The classic aptitude components of phonetic coding ability, language-analytic ability and associative memory are seen as representing aptitude for explicit learning (henceforth: explicit aptitude). Conversely, aptitude for implicit learning (henceforth: implicit aptitude) refers to cognitive abilities that facilitate implicit L2 processing and learning in the absence of conscious awareness (). Whereas explicit aptitude may primarily predict achievement at beginner levels (; ), implicit aptitude is expected to predict ultimate attainment (). Indeed, two hypothesized measures of implicit aptitude, the serial reaction time (SRT) task and LLAMA D (further discussed below), have been found to significantly predict grades achieved in foreign-language classes (), speech fluency (), and pronunciation accuracy () at intermediate to advanced L2 levels. Therefore, in addition to considering different L2 skills, it is important to take learners’ L2 proficiency level into account when interpreting results pertaining to the predictive power of (components of) explicit and implicit aptitude.

WM refers to the ability to simultaneously store and process information while engaging in a cognitive task (; ). In L2 research, Baddeley’s (, ) multiple-component model of WM has been most influential, with two components of central interest: phonological working memory (PWM), which is responsible for the short-term storage of phonological information and articulatory rehearsal, and executive working memory (EWM), which controls processes such as inhibition, updating and switching ().

The importance of PWM and EWM in L2 processing, learning and use is well-documented (; ), and the role of WM as an individual-difference variable that can potentially predict L2 outcomes has been acknowledged in aptitude research too (, ). Furthermore, Robinson’s (, ) model of aptitude complexes argues that different aptitude complexes are dependent on specific combinations of underlying primary cognitive abilities, including WM.

Empirical investigations into the relationship between WM and aptitude have led to mixed results. Several studies have identified no or weak relationships between measures of the two constructs (; ); Li’s () meta-analysis identified a weak correlation between PWM and EWM and overall aptitude and aptitude components.

Yalçın et al. () found a relationship between EWM measured by first-language (L1) and L2 reading span tasks and language-analytic ability, while no correlation was found between EWM measured by an operation span task and any aptitude component. A study by Sáfár and Kormos () replicated these findings, but did not find a relationship between aptitude and PWM operationalized as a non-word repetition task, a result that was subsequently confirmed (). Those of the previously mentioned studies using factor analyses as well as Granena () found that PWM and EWM loaded on the same factor and separately from overall aptitude or aptitude components. This contrasts with findings reported by Li (), where EWM operationalized as a listening span task loaded on the same factor as language-analytic ability.

Taken together, these findings present a mixed picture, no doubt at least partly due to the range of measures used, but also due to differences in participants’ profiles, not least in terms of language background and L2 proficiency level. Specifically, PWM appears to be an important predictor of vocabulary, grammar and reading at lower levels of proficiency and/or in novice learners (; ), whereas the role of WM at higher levels is less clear. Linck et al. () reported a positive influence of PWM on long-term listening and reading attainment in a group of advanced learners, whereas other studies with experienced learners at advanced levels found no effect of PWM on vocabulary and grammar knowledge () and no association between EWM and knowledge of a grammatical structure of high learning difficulty ().

Nevertheless, some common threads can be identified. First, (at least some) components of WM are (weakly) related to (at least some) components of aptitude (e.g., ; ; ), suggesting a role for WM in L2 learning that is (partly) independent of aptitude—a situation which calls for the inclusion of WM measures in studies aimed at identifying predictors of L2 achievement. Second, including measures of both PWM and EWM seems advisable, given that the two components have been shown to contribute differently to L2 proficiency (). Third, taking into account learners’ L2 proficiency level appears to be of critical importance (e.g., ; ; ; ).

2.2. Measuring aptitude

Studies measuring L2 learners’ aptitude have increasingly drawn on the LLAMA battery (; ), a suite of computer-administered tests that is freely available and can be used with participants from a range of L1 backgrounds (). The LLAMA comprises four subtests that essentially operationalize the classic Carrollian notion of aptitude, that is, associative memory in the sense of vocabulary learning (LLAMA B), phonetic coding ability in the sense of auditory pattern recognition (LLAMA D) and sound-symbol correspondence (LLAMA E) and language-analytic ability in the sense of grammatical inferencing (LLAMA F). LLAMA B, E and F have learning and testing phases, while LLAMA D consists of an exposure and testing phase (with variations in different versions from v.1 to v.3, as discussed below).

While LLAMA B, E and F are regarded as measures of explicit aptitude, it has been suggested that LLAMA D may be a measure of implicit aptitude (, 2016), although this view has recently been challenged (). Another proposed measure of implicit aptitude that seems to be accepted more widely is the probabilistic SRT task (), a computer-administered, non-verbal test in which participants react to changes in the location of visual stimuli by pressing keys corresponding to the position of the stimuli on the computer screen. The stimuli follow a probabilistic sequence in an attempt to mirror implicit sequence learning (of language) in the real world (). Unknown to participants, a training sequence is presented 85% of the time, while a control sequence appears for the remaining 15% of the time. Learning is operationalized as faster responses in the training condition compared to the control condition. A growing number of studies employing this measure is testimony to its increasing popularity in L2 research (e.g., , , ; ; ; , ; ).

Research to date has reported convergent validity between the SRT task and LLAMA D with measures of implicit knowledge and divergent validity with measures of explicit knowledge (; ), as well as an absence of correlations between the SRT task and tests of explicit aptitude, PWM and EWM (; ; ; ). The status of LLAMA D in relation to other measures of aptitude and measures of WM is less clear. On the one hand, LLAMA D has been found to be uncorrelated with other LLAMA sub-tests (). On the other hand, it did correlate with PWM, long-term memory retrieval as measured via a semantic priming task, and LLAMA B, while at the same time being uncorrelated with the SRT task (). Studies drawing on factor analysis likewise show mixed results. LLAMA D loaded on the same factor as a probabilistic SRT task (; ), but on a separate factor than a deterministic SRT task ().

Taken together, these results could be interpreted as emerging evidence for a multi-componential structure of implicit aptitude (), with LLAMA D potentially probing implicit memory ability in the verbal domain and the SRT task domain-general implicit learning ability (). At the same time, seemingly inconsistent findings involving LLAMA D could be attributable to the test instructions used in any given study and thus be a methodological issue (). Specifically, if participants are informed that they will be tested on the items they hear in the exposure phase, attempts at intentional and therefore explicit learning could ensue. In order to test this hypothesis, Iizuka and DeKeyser () compared three types of LLAMA D instructions ranging from more to less explicit (‘listen and memorize’, ‘just listen’, ‘sound check’) and their effects on task performance. The researchers found that only the ‘just listen’ instructions that asked participants to carefully listen to the stimuli resulted in a relationship between LLAMA D and the SRT task. However, surprisingly, the relationship was negative. In an attempt to interpret this unexpected result, the researchers suggest that an ability to focus on the auditory stimuli helped with LLAMA D, but had the opposite effect in the case of the SRT task, where focusing on the stimuli on a trial-to-trial basis may have prevented successful (implicit) learning of the probabilistic sequence. Such an interpretation, in turn, might suggest that implicit aptitude is primarily a lack of interference rather than a measurable ability that enhances learning (). In this regard, the study reports a novel finding and offers an interesting interpretation that could potentially have wide-ranging implications for the conceptualization of implicit aptitude. However, replication is clearly needed.

3. The current study

The preceding sections have highlighted several open questions in relation to the theoretical status and empirical measurement of explicit and implicit aptitude. First, the status of LLAMA D and the SRT task as measures of implicit aptitude is still unresolved, leading to the question of exactly how these two tasks relate to each other and, as a consequence, how resulting scores are to be interpreted. Second, the role of WM remains unclear, both in relation to measures of explicit and implicit aptitude and in relation to the relative importance of PWM and EWM at different L2 proficiency levels. Third, the attainment of an understanding of the predictive validity of explicit and implicit aptitude and WM is crucial to the field, yet no study to date has included measures of all these variables in combination with an assessment of several components of L2 proficiency. With a view to addressing these issues, we posed the following research questions:

  1. Is there evidence of convergence between auditory pattern recognition ability as measured by LLAMA D and implicit sequence learning ability as measured by a probabilistic SRT task?
  2. What is the relationship between measures of aptitude for explicit and implicit learning and measures of WM?
  3. To what extent do aptitude for explicit and implicit learning and WM predict L2 proficiency?

3.1. Method

The present study used a correlational design involving the online administration of the LLAMA test suite, a probabilistic SRT task, measures of PWM and EWM, and a measure of L2 proficiency capturing the dimensions of reading comprehension, listening comprehension and morphosyntactic knowledge of selected structures.

3.1.1. Participants

A total of 86 L1 Croatian learners of L2 English participated in the study. At the time of data collection, the participants had been learning English for between 6 and 13 years (M = 10, SD = 1.72) in the context of mandatory classes as a part of their school curriculum. The sample included 62 women, 22 men, and two participants who preferred not to disclose their gender. Participants were in secondary education and ranged in age from 15 to 18 years (M = 16.14, SD = 1.29).

3.1.2. Instruments and procedure

All measures with the exception of the L2 reading and listening comprehension tests were programmed into PsychoPy and subsequently administered via the Pavlovia platform (). All test instructions were provided in L1; the participants were instructed to use headphones in a quiet environment. The first author monitored participants via Zoom to ensure adherence to protocol and allow participants to ask clarification questions. Completion of the L2 reading and listening tests was not monitored because these tests relied on a commercial testing program, as detailed below. Testing proceeded in the following order: SRT task, operation span task (EWM) (Day 1 – c. 50 minutes); forward digit span task (PWM) (Day 2 – c. 20 minutes); gap-fill task (L2 morphosyntactic knowledge), LLAMA (Day 3 – c. 50 minutes); Oxford Placement test (L2 reading and listening) (Day 4 – c. 45 minutes).

3.1.3. Explicit and implicit aptitude

Language learning aptitude was measured by means of the LLAMA suite and a probabilistic SRT task. The LLAMA battery comprises four subtests: LLAMA B, LLAMA D, LLAMA E and LLAMA F. LLAMA B assesses associative memory, requiring participants to learn 20 new vocabulary items associated with novel picture stimuli during a two-minute learning phase. In the subsequent untimed test phase, participants are presented with a word and must select the corresponding picture from the entire array of 20 pictures. LLAMA B as used in the present study was identical to v.2, except for the removal of the feedback sound in the testing phase. The maximum score was 20, with 1 point awarded for each correct answer and no penalty for guessing.

LLAMA D tests auditory pattern recognition ability. During the exposure phase, participants hear 10 words playing one by one in an unknown language. In the test phase, participants listen to words from the same language, including items heard previously and items not heard before. They respond in a yes/no format to whether an item was familiar or not. Incorrect responses were penalized to compensate for guessing. The feedback sound from v.2 was removed. While this subtest had 30 items in v.1, we included 40 items (i.e., 20 familiar items, each of the 10 items from the exposure phase appearing twice, and 20 unfamiliar items,15 from v.1 and another 5 unused in v.1, but available as downloadable files). The instructions in the present study told participants to listen carefully to the sounds because they would be tested subsequently. The maximum score was 40.

LLAMA E assesses sound-symbol correspondence. Participants are presented with 24 phonetic symbols, each corresponding to a unique syllable. Upon clicking on a symbol, the associated syllable is played. Participants can click on any symbol any number of times during the two-minute learning phase. In the untimed test phase, participants hear a combination of two syllables and must select the correct answer from an array of 20 combinations of previously seen symbols. The version used in the present study was equivalent to v.3. We applied a partial-credit scoring system which awarded one point for each correct syllable in any given two-syllable combination. The maximum score was 40.

LLAMA F is a grammatical inferencing task in which participants have four minutes to work out the rules of an unknown language. During the learning phase, they click on buttons that reveal picture stimuli with corresponding written descriptions. There are 20 items, and participants can click on any button any number of times. During the untimed test phase which comprises 20 items, participants are presented with similar stimuli and must select a combination of words that correctly describes the picture at hand. Participants construct their answers from a board of 16 words. The version used in the present study was equivalent to v.3 except for the fact that all 20 items from v.1 were used. In our partial-credit scoring system, each correct word yielded up to two points: one point for the appropriate word itself, and one point if the word was in the correct position. The maximum score was 132.

The probabilistic SRT task was administered to gauge aptitude for domain-general implicit sequence learning. The task required participants to react to visual stimuli in the form of black squares that appeared in one of four possible locations on the computer screen by pressing a corresponding key as quickly and as accurately as possible. The sequence of stimuli was produced by a probabilistic rule which meant that 85% of the time the stimuli followed a training sequence, while the remaining 15% of the time the stimuli followed a control sequence. Instructions accompanied by video animations and a 60-trial practice phase preceded the task itself, which consisted of 8 blocks, each comprising 120 trials, resulting in a total of 960 trials. There were short breaks between blocks. Following the study protocol from Kaufman et al. (), trials were first randomized within their respective block, and subsequently administered in a pre-determined sequence. The task was scored by subtracting the mean response time (RT) in the training condition from that in the control condition.

3.1.4. Wm

PWM was tested by means of a forward digit span task. We used the format developed by Linck et al. () through adaptation of a component of the operation span task created by Unsworth et al. (). Participants were presented with a series of auditory number sequences in L1, varying in length from three to nine digits. There were four sequences of a given length in each set, with the task comprising seven sets, resulting in a total of 28 sequences. Points were awarded for correctly recalled digits in their respective positions. This partial-credit scoring system has been shown to be preferable to an all-or-nothing system due to greater reliability and better discrimination (for details, see ). The maximum score was 168.

EWM was assessed by means of an automated operation span task (). Participants first solved a simple mathematical problem and then indicated whether the solution shown on screen was correct or incorrect. Subsequently, they were presented with a letter and asked to memorize it. Upon completion of a sequence of mathematical problems followed by letters, participants had to select the memorized letters from an array in the order in which they had been encountered previously. The task comprised 18 sets of letter sequences that ranged in length from three to eight, totalling 99 letters. The maximum score was 99, based on a partial-credit scoring system that awarded points for each correctly recalled letter in a given sequence. Participants’ responses to the mathematical problems were used to monitor engagement with the task; a cut-off point of 85% accuracy was set in order to ensure that cognitive resources were duly deployed towards solving the arithmetic equations rather than rehearsing the letters to be recalled.

3.1.5. L2 proficiency

Participants’ morphosyntactic knowledge of L2 English was assessed by means of a gap-fill task with a three-way multiple-choice answer format. The test comprised 75 sentences targeting the use of articles, the simple past tense and the passive voice. The choice of targeted structures was based on the participants’ grammar syllabus in the context of their English language classes and informed by frequently made mistakes as reported by their teacher (D. Linić Učur, personal communication, June 7, 2020). The maximum score was 75.

The Oxford Placement test () was used to assess participants’ L2 reading and listening comprehension. The reading section draws on test takers’ knowledge of grammatical form and meaning, implied meaning and overall reading comprehension. The listening section assesses test takers’ listening comprehension through ten dialogues of varying lengths and five short monologues. Both sections are designed to test how well learners understand the meaning of what is being communicated as an indicator of general language ability (). The test is adaptive (i.e., the difficulty of presented items is kept in line with each learner’s performance). The test provider’s platform generates a total test score as well as separate scores for each section, with a maximum score of 120 for each section.

3.1.6. Data analysis

In order to answer the research questions, reliability indices (Cronbach’s alpha) and descriptive statistics were calculated. Normality of data distributions was assessed by means of Shapiro-Wilk tests. Bivariate correlations and exploratory factor analysis were employed to investigate the relationships between variables and to examine the structure of the constructs of explicit and implicit aptitude and WM. Correlations followed by multiple regression analyses were used to establish the predictive power of the aptitude and WM measures with regard to L2 proficiency. The alpha level was set at .05. We conducted the statistical analyses in the R package, v.2021.09.2 () and IBM SPSS Statistics, v.27.0 ().

3.2. Results

This section provides answers to our three research questions in chronological order.

3.2.1. Is there evidence of convergence between auditory pattern recognition ability as measured by LLAMA D and implicit sequence learning ability as measured by a probabilistic SRT task?

To address this question, we first considered participants’ overall performance on the LLAMA. Participants scored highest on LLAMA B (M = 57.18, SD = 22.34) and lowest on LLAMA D (M = 37.57, SD = 23.10). LLAMA E (α = 0.97) and LLAMA F (α = 0.95) showed excellent reliability; LLAMA B also showed very good reliability (α = 0.81). LLAMA D yielded a lower but still acceptable coefficient (α = 0.72). The full descriptive statistics are shown in Table A in the online supplementary materials.

Additionally, we calculated participants’ SRT task scores and scrutinized differences in mean RT between the eight task blocks to establish the time course of any learning effects. First, error responses (9% of the data) were discarded. Significant outliers (1% of the data), defined as values of more than three SDs from the mean RT for each participant in each block (; ), were likewise discarded, reducing the sample size to 83. Each participant’s SRT score was calculated by subtracting the average RT in the training condition from the average RT in the control condition. Figure 1 shows mean RTs in each block of the SRT task on both training and control trials.

Figure 1 

Mean RTs on the SRT task.

As Figure 1 indicates, the RT in the control condition was larger than in the training condition on all blocks except for blocks 3 and 7. The mean RT for the training condition across all blocks was 447ms (SD = 67); the mean RT for the control condition was 457ms (SD = 71). Table B in the supplementary materials shows mean RTs broken down by block and condition.

Split-half reliability with Spearman-Brown correction resulted in a coefficient of 0.42 for all eight blocks and 0.44 for blocks 4–8. These indices are comparable to the reliability of similar tasks in previous studies (; ; , 2017). Overall, a reliability coefficient of above 0.4 is considered acceptable for measures of implicit processes, since they typically yield lower indices than measures of explicit processes ().

A series of paired-samples t-tests was run to identify differences between the training and control conditions in each block. A statistically significant difference was observed in all blocks except blocks 3 and 7. A small effect size was detected in blocks 4 and 5, and a medium effect size in blocks 2, 6, and 8. Cohen’s d across the last 5 blocks was 0.75, suggesting a medium effect size that is substantially higher than the effect sizes reported in previous research: 0.19 in Kaufman et al. () and 0.21 reported in Suzuki and DeKeyser (). As our data show a more stable learning effect from block 4 onwards, all subsequent analyses used scores based on the RT differences from blocks 4 to 8. Table C in the supplementary materials shows the results of the comparisons with effect sizes for each block.

Bivariate correlations (Spearman’s rho) were run to examine whether there was any convergence between the SRT task and LLAMA D as hypothesized measures of implicit aptitude. Figure 2 shows correlation coefficients (upper triangle), scatterplots for variable pairs (lower triangle) and density plots for each variable (on the diagonal). The results show no significant association between SRT and LLAMA D, thus indicating divergence.

Figure 2 

Correlations (Spearman’s rho) between measures of aptitude, WM, and L2 proficiency.

Finally, we conducted an exploratory factor analysis, using principal component analysis with direct oblimin (oblique) rotation, following confirmation that underlying factors were related (). Assumptions were met, with a Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy of 0.55 and Bartlett’s test of sphericity significant. The analysis yielded three components with eigenvalues above 1. The factor loadings are shown in Figure 3. A detailed overview of factor loadings can be found in Table D in the supplementary materials.

Figure 3 

Factor loadings for a three-component solution (PCA).

The SRT task and LLAMA D, the two hypothesized measures of implicit aptitude, loaded on the same factor and distinct from other measures of explicit aptitude (factor 1) and WM (factor 2). While the initial factor loadings were as expected, it is noteworthy that the SRT task and LLAMA D are not correlated (see Figure 2). Furthermore, the factor loading for the SRT task is positive, while the loading for LLAMA D is negative. This discrepancy suggests a complex relationship between the two measures that cannot be fully captured with a simple convergence test.

3.2.2. What is the relationship between measures of aptitude for explicit and implicit learning and measures of WM?

Prior to considering the relationship between the various measures, we calculated descriptive statistics, normality and reliability for the forward digit span (FDS) task that was used to assess PWM and the operation span (OSPAN) task used to assess EWM. Reliability of the FDS task was very good (α = 0.86). The OSPAN required the exclusion of three participants who did not meet the 85% accuracy criterion on the mathematical equations that preceded the letter memorization and recall component. Internal consistency was acceptable (α = 0.73) and comparable to the reliability indices reported in previous studies: 0.78 in Unsworth et al. () and 0.69 in Suzuki and DeKeyser (). Full descriptive statistics can be found in Table E in the supplementary materials.

Figure 2 shows the correlations between the measures of aptitude for explicit and implicit learning and the WM measures. LLAMA B and E, as well as LLAMA E and F, are significantly correlated at a moderate level of strength, which is in accordance with expectations. The two WM measures are likewise positively and significantly associated, as one might expect, and LLAMA B is moderately correlated with the OSPAN, suggesting an association between the ability to learn new lexical items and the central executive component of WM.

As can be seen from the high factor loadings in Figure 3, LLAMA E, B, and F as measures of explicit aptitude load on factor 1 (l = 1.97), which accounts for 28% of the variance. The two WM measures load on factor 2 (l = 1.31), which explains 19% of the variance. Finally, the SRT task and LLAMA D, conceptualized as measures of implicit aptitude, load on factor 3 (l = 1.12), which explains 16% of the variance. Taken together, the three factors explain 63% of the variance and highlight the distinct loading patterns of aptitude measures compared to those assessing WM.

3.2.3. To what extent do aptitude for explicit and implicit learning and WM predict L2 proficiency?

We began addressing the final research question by examining the descriptive statistics for the measures of L2 proficiency used in the present study (gap-fill task, reading and listening sections of the Oxford Placement test), as well as the reliability and normality of the gap-fill task.

The reliability indices for the gap-fill test are all above .98 and therefore deemed excellent. Data were not normally distributed, with a negative skew suggesting a tendency for participants to score at the higher end of the spectrum. With regard to the targeted morphosyntactic structures, articles posed the greatest challenge (M = 61.85, SD = 10.35), while the passive voice was easiest for participants (M = 88.68, SD = 13.27). Scores on the Oxford Placement test were likewise not normally distributed, again due to a negative skew indicative of generally high scores. In terms of the Common European Framework of Reference for language proficiency, 1% of participants were at level B1 (‘Threshold’), 16% at B2 (‘Vantage’), 45% at C1 (‘Effective operational proficiency’) and 38% at the highest possible level C2 (‘Mastery’). Put differently, 83% of the learners were proficient users of L2 English (i.e., at advanced levels). Full descriptive statistics can be found in Table F in the supplementary materials.

Next, we examined the relationships between all variables, as shown in Figure 2. The three L2 proficiency measures are positively associated with each other at a medium level of strength, which is not unexpected. The measures of explicit aptitude LLAMA B, E and F are moderately but significantly correlated with reading, and LLAMA E and F are moderately correlated with the gap-fill test assessing morphosyntactic knowledge. It is worth noting that none of the hypothesized predictor variables is correlated with listening. As a consequence, we conducted two hierarchical multiple regression analyses with reading and gap-fill as dependent variables, respectively. In each analysis, predictor variables were entered in descending order according to the absolute values of their correlation coefficients with the outcome variable, as shown in Figure 2. All assumptions were met in accordance with Field () and Jeon (). Univariate outliers (cases with a z-score larger than ± 3.3) and multivariate outliers (values with Mahalanobis distance greater than 26.125 – χ2 [8] = 26.125, p < 0.001) were removed. Following the analysis, only variables that significantly predicted variance in the dependent variable were included in the final model.

The final model for reading as shown in Table 1 includes four predictor variables, LLAMA B, LLAMA E, SRT and LLAMA D, which accounted for 30% of the variance in reading scores. Importantly, LLAMA B and E positively predict scores in reading, while the SRT task and LLAMA D are negative predictors.

Table 1

Hierarchical multiple regression model for reading.




LLAMA B0.175*0.3450.1220.2400.1200.2360.120*0.237

LLAMA E0.097*0.2710.089*0.2480.098*0.273


LLAMA D–0.117*–0.234

R2 0.1190.1820.2440.298


ΔR2 0.1190.0620.0620.054

DF8.521 *4.726*5.023*4.616*

Note: * Significant at 0.05 level; **significant at 0.001 level.

Table 2 shows the final model for the gap-fill test which includes two predictor variables explaining 19% of the variance in gap-fill scores: LLAMA F and forward digit span (FDS). In this model too there is both a positive predictor (LLAMA F) and a negative predictor (FDS).

Table 2

Hierarchical multiple regression model for gap-fill.




LLAMA F0.165*0.3790.1800.413


R2 0.1440.193


ΔR2 0.1440.049

DF12.928 **4.651*

Note: * Significant at 0.05 level; ** Significant at 0.001 level.

4. Discussion

The present study sought to contribute to our understanding of theory and measurement of the constructs of aptitude for explicit and implicit learning. To this end, we examined the relationship between LLAMA D and a probabilistic SRT task as hypothesized measures of implicit aptitude, and we scrutinized the relationship between all LLAMA subtests, the SRT and two measures of WM, that is, a construct that has been posited as another potential component of aptitude. Last but not least, we investigated the predictive power of aptitude for explicit and implicit learning and WM in relation to L2 proficiency, operationalized as grammar knowledge, reading, and listening comprehension. In the following, we discuss the findings in terms of their contribution to the conceptualization and operationalization of explicit and implicit aptitude in the field of L2 learning.

4.1. WM as a component of aptitude

Unlike most previous studies investigating WM in relation to aptitude, the present study included a comprehensive battery aimed at measuring not only explicit and implicit aptitude, but also both PWM and EWM. A factor analysis yielded separate factors for WM comprising PWM and EWM on the one hand, and explicit and implicit aptitude on the other hand, thus corroborating existing findings to the extent that comparisons can be made (; ; ; ). The cumulative evidence to date indicates that WM measures appear to tap a construct that is qualitatively different from aptitude, so research aimed at investigating variables interacting with and/or predicting L2 learning and use would ideally include measures of both aptitude and WM.

At the same time, we found a moderate correlation between EWM as operationalized via an operation span task and associative memory as measured by LLAMA B, a finding which suggests an involvement of executive function in the learning of new lexical items. Interestingly, we found no correlation between PWM operationalized via a forward digit span task and any of the LLAMA subtests. While similar results have been reported in other studies including measures of both EWM and PWM (; ), this result may seem counter-intuitive at first glance because PWM has been shown to be implicated in vocabulary acquisition (). However, if we take into consideration the factor of L2 proficiency, the finding is perhaps less surprising. Existing research suggests that the importance of PWM declines as proficiency increases (; ; ), and our advanced L2 learners may have crossed the threshold at which individual differences in PWM play a role. Indeed, the shared variance between the correlated measures of EWM and PWM in our study appears to confirm that it was the executive function component of WM that played a role in successful LLAMA B performance in the present study.

4.2. Implicit aptitude as a multi-componential construct

A factor analysis that included the LLAMA subtests, the SRT task and the two WM measures yielded three factors. LLAMA D and the SRT task as hypothesized measures of implicit aptitude loaded on the same factor and separately from measures of explicit aptitude and WM. This result substantiates the argument that implicit and explicit aptitude are separate constructs, each comprising distinct underlying abilities (), and that LLAMA D and the SRT task tap abilities that are part of the same construct of implicit aptitude (, 2016). However, this finding needs to be considered in conjunction with another, seemingly contradictory result, that is, the absence of a correlation between LLAMA D and the SRT task. A possible explanation that immediately suggests itself is the difference in modality between the two tests. The SRT task is visual in nature, whereas LLAMA D is an auditory task. Research in cognitive psychology has shown that sensory modality can constrain higher-level cognition, including learning and memory (). Moreover, the respective accuracy of auditory versus visual pattern perception may not be comparable (). Furthermore, the SRT task and LLAMA D differ in terms of stimulus domain, with the former relying on non-verbal and the latter on verbal stimuli. Findings from neurocognitive research suggest numerous neurophysiological differences in the processing of verbal as opposed to non-verbal stimuli (). The SRT task is an RT measure that gauges the process of on-task learning (i.e., it is a processing-based measure, ). By contrast, LLAMA D is an accuracy measure which assesses learning offline (). Thus, the SRT task measures the process of learning, whereas LLAMA D measures the product of learning.

Having said this, the fact that the SRT task and LLAMA D differ in terms of sensory modality, draw on different stimulus domains, assess the process versus the product of learning and are not statistically associated does not necessarily mean that they cannot be part of the same construct. Indeed, a lack of correlation between assumed measures of implicit aptitude has been reported in several recent studies (; ; ). If the primary abilities involved in implicit aptitude are relatively disparate in nature, a lack of association between measures tapping these primary abilities would be less surprising. This suggestion is supported by DeKeyser and Li (), who have argued that implicit learning may occur via diverse pathways, and therefore abilities tested by implicit aptitude measures may not necessarily overlap or even intersect. In other words, implicit aptitude may be a multi-componential construct (; ; ).

This line of argument is further supported by the fact that even though LLAMA D and the SRT task loaded on the same factor, the SRT task loaded positively and LLAMA D negatively on that factor. A recent study () investigating the effect of different types of LLAMA D instructions reported a similar finding when participants were instructed to ‘just listen’ to the sound sequences in LLAMA D. In that condition, LLAMA D and SRT scores were negatively correlated (i.e., the abilities measured by these two tests were pulling in opposite directions). In the present study, participants were instructed to listen carefully to the sound sequences, and they were also told that they would be tested subsequently. Our instructions were thus different and arguably more explicit; the abilities measured by LLAMA D and the SRT task likewise pulled in opposite directions.

A possible explanation for this pattern of results is that participants may be approaching both LLAMA D and the SRT task in the same way, in line with their individual proclivities and regardless of the instructions they are given. In other words, they employ the same set of abilities on all versions of LLAMA D and on the SRT task, but due to the distinct nature of these two tests, such an approach has a facilitative effect in one case and a debilitative effect in the other. Specifically, Iizuka and DeKeyser () suggest that focal attention may facilitate performance on LLAMA D but hinder performance on the SRT task. Success on the latter may depend on “the degree to which one is able to let go of the tendency to look for patterns and process input without focal attention” (). Along similar lines, Kaufman et al. () have suggested that, among other factors, openness and intuition are associated with success on the SRT task.

These considerations arguably shed new light on the construct of aptitude more generally because they imply that more is not necessarily better. If implicit aptitude in particular were not an ability in the classic sense (i.e., higher levels are invariably advantageous), but rather a propensity (see also ), where reliance on the right capacity at the right time and in the right context determines success, then this would no doubt change the outlook of L2 researchers, L2 teachers and L2 learners alike. At this point, such a line of argument is admittedly speculative. However, we believe it can usefully inform further research into the interrelations between different measures of (implicit) aptitude and their predictive power with regard to different components of L2 proficiency.

4.3. Predictors of L2 proficiency at advanced levels

Through a hierarchical multiple regression analysis, we identified predictors for two of our three proficiency measures. Overall, both explicit and implicit abilities predicted L2 morphosyntactic knowledge and L2 reading comprehension, in line with previously reported findings (). More specifically, LLAMA F positively predicted L2 morphosyntactic knowledge, while PWM as measured by a forward digit span task was a negative predictor, with a total of 19% of the variance accounted for. Moreover, LLAMA B and E positively predicted L2 reading, while the SRT task and LLAMA D were negative predictors, with a total of 30% of the variance explained.

Taking the latter finding first, we can see that two components of explicit aptitude, associative memory as measured by a vocabulary learning task and phonetic coding ability as measured by a sound-symbol association task, predicted performance on a reading test that assesses knowledge of language form and meaning, implied meaning and reading comprehension. This is entirely in line with expectations: Grapheme-phoneme mappings (or sound-symbol correspondence) and lexical knowledge are the very foundations of reading skill (in an alphabetic language). More strikingly, the two hypothesized measures of implicit aptitude used in the present study, the SRT task and LLAMA D, proved to be significant negative predictors. Put differently, domain-general implicit sequence learning ability (SRT task) and auditory pattern recognition ability (LLAMA D) were disadvantageous for reading performance, if relied upon solely (given that explicit aptitude components were already accounted for in the model).

With regard to morphosyntactic knowledge, language-analytic ability as measured by LLAMA F was a significant predictor. This is not only in line with previous empirical research (, , ), but also theoretically coherent, since language-analytic ability can be expected to be important for the acquisition of grammar. As in the case of reading, the regression analysis for morphosyntactic knowledge also yielded a negative predictor, that is, PWM as measured by a forward digit span task. A similar argument as in the case of reading can be put forward: Exclusive reliance on phonological storage and rehearsal works against successful performance on a gap-fill task targeting selected linguistic structures (given that language-analytic ability was already accounted for in the model).

Finally, it is worth noting that none of the aptitude or WM measures included in the present study predicted L2 listening, our third measure of proficiency. As L2 listening was correlated with both L2 reading and L2 morphosyntactic knowledge, it is possible to conjecture that these latter two skills may have functioned as mediators. In other words, learners invested their explicit aptitude in acquiring morphosyntactic knowledge and reading skill, whereas listening skill was developed on the back of these. While this proposed explanation must remain speculative, it does sit well with the context in which the present study was conducted (i.e., an English-as-a-foreign-language setting characterized by form-focused classroom instruction that heavily relies on metalinguistic and literacy skills).

As in the case of the role of WM discussed above, the findings relating to predictors of L2 grammar, reading and listening highlight the importance of not only the learning context, but also learners’ prior language learning experience in the sense of their proficiency level at the time of testing. Different constellations of cognitive (and other) variables can be expected to play different roles at beginner, intermediate and more advanced levels. Therefore, it is crucial to bear in mind that results from more advanced participants as reported here may not be generalizable to learners at lower levels of proficiency, and vice versa.

5. Conclusion

The present study measured explicit and implicit aptitude and WM in a group of L2 English learners of relatively advanced proficiency. Our empirical results corroborate a conceptual differentiation between explicit and implicit aptitude on the one hand and WM on the other hand, which suggests that the use of separate measures for these constructs is advisable.

In theoretical terms, the findings that (1) the hypothesized components of implicit aptitude pulled in different directions and (2) implicit aptitude components and PWM were negative predictors of L2 grammar and reading skill encourage us to consider the possibility that (implicit) aptitude may be a cognitive proclivity rather than an ability of immutable, context-independent value. This argument is in alignment with a comment put forward by Iizuka and DeKeyser () in which they refer to aptitude considered in this way as being reminiscent of (cognitive or learning) style (see also ). It also chimes with earlier research taking a multi-dimensional and dynamic view of aptitude (, ), according to which the sensitivity of aptitude to environmental factors is such that it can be either activated or inhibited, based on the characteristics of various learning conditions. Over and above the role of learning context, our findings have highlighted the role of learners’ proficiency level in the aptitude-outcome equation.

Despite yielding valuable insights, the present study was not without limitations. In particular, a limited number of exclusively cognitive variables was measured. Moreover, a larger sample size would have been desirable because this would have allowed for the empirical corroboration or otherwise of the currently entirely speculative argument that grammar and reading skill mediated the subsequent acquisition of listening skill.

In line with the findings reported here and in other recent studies on explicit and implicit aptitude, future research seeking to track the changing roles and relative weights of cognitive predictors as L2 proficiency develops would be of great value. In addition, the conceptualization of aptitude as a proclivity rather than as a fixed, context-independent ability deserves consideration in both the empirical and the theoretical domain, hopefully leading to well-informed research designs that capture multiple variables characterizing learners and the learning context. Last but not least, research aimed at identifying predictors would ideally draw on an experimental design where not only the product of learning, but also the process of learning is subject to experimental control and thus more ready interpretability. Work within an aptitude-treatment interaction paradigm (e.g., ) would satisfy these criteria.