1. Introduction

Learning to read Mandarin Chinese characters at beginner-level is thought to be one of the most challenging aspects of learning the language (; ; ; ). In this study, by exploring the visual processes involved in noticing linguistic elements in Chinese characters, we aimed to shed light on aspects of this challenge and to understand better how beginner-level learners of Mandarin begin to build a lexical repertoire. Specifically, we focused on the potential role of eyegaze patterns and visual working memory in decoding and retrieving character components, such as radicals, when learning new words, within the Paired Associate Learning paradigm () for connecting and storing visual and verbal knowledge.

Our overarching question was to explore what learners paid attention to when learning new characters. L2 Chinese learners are routinely expected to memorise characters individually using flashcards or other repetition activities, relying on visual strategies to help them learn (; ). It is not fully understood, though, how learners transform such visual memorisation activities into a workable vocabulary store (e.g., to help build up reading fluency). Previous research (e.g., ) has identified that complex visualisation processes are involved in learning Chinese characters at the word level and in building reading fluency, both by using working memory and visual learning strategies (; ; ). However, it is not yet clearly established how much learners rely on focused visual attention (e.g., in patterns of longer eyegaze on different elements of a character). It is also possible that non-attended semantic cues foster character recognition (e.g., if Chinese follows findings in other languages of having a noun bias, where nouns seem to be more easily learned than other semantic categories, ; ), although there may be some debate about this in Mandarin (). Using eyegaze patterns as an indication of focused attention on target components of a character (), where semantic cues are carefully controlled, could therefore provide more insight into the role of visual cognition in learning new words.

There are also gaps in our understanding of how visual working memory (VWM) may be involved, in view of assumptions that working memory capacity supports learning, including second-language (L2) vocabulary learning, or at least may play some role in explaining individual variation in learning (). For early-stage learners, it could logically be assumed that greater VWM would support ease of speedy visual processing and could therefore help in learning new words, but it remains unclear if VWM would be more likely to support intentional or incidental learning and how VWM may interact with other factors such as general exposure to written language outside the classroom. The study presented here, although exploratory and small-scale and acknowledging limitations in design, is thus intended to shed some light, theoretically and empirically, on key aspects of the role of visual attention in learning Chinese. In view of the rapidly growing numbers of students learning Chinese around the world (), better understanding of some of the processes involved in Chinese word learning can help unlock the potential for research and pedagogic approaches alike.

2. Literature review

2.1. Chinese character learning

Chinese is a logographic language, using characters rather than an alphabetic written system. Learning to read involves form-meaning, form-sound and sound-meaning mapping processes, requiring multi-level representations according to construction-integration models of reading (, ). The memory processes in creating such representations are complex, according to the Paired Associate Learning model (), which argues that each visual or phonological stimulus needs to be paired with an associated long-term memory response, enabling new items to be stored and retrieved appropriately. Form-sound mappings are particularly hard in Chinese, given its opaque orthography (), in that few characters have predictable overt phonological clues in the visual word form. Characters in modern simplified Mandarin Chinese are typically composed of two components: a left-hand semantic component and a right-hand component which may have phonetic characteristics; components may be based, but not exclusively, on core root elements or radicals (). Character properties impacting word knowledge retention and reading ability include frequency of radicals, complexity of strokes, transparency of semantic and phonetic components, and density of connections in semantic networks (; ). Establishing strong pairings between these different levels of representation at early stages of reading is therefore highly cognitively demanding ().

Existing research on first-language (L1) Chinese character processing confirms that reading is a complex interaction between multiple areas of cognition—visual, motor and verbal/phonological working memory processes (e.g., ; ). For L2 learners, learning to decode and read in Chinese similarly requires multiple cognitive processes (; ) as they begin to map visual, auditory and semantic cues. Beginner-level tests of Chinese, such as the Hanyu Shuiping Kaoshi tests at level 1 and 2 (equivalent to Novice on the American Council on the Teaching of Foreign Languages scale, and A1 in the Common European Framework of Reference for Languages) acknowledge this challenge and do not require character knowledge until higher levels of proficiency. Studies show that strong links remain between visual processing of a character/word and knowing its pronunciation or meaning, even at intermediate to advanced levels (; ). However, there does not yet seem to be extensive research on how visual and linguistic processing may combine at initial learner stages, despite the struggles beginner-level learners face in mastering Chinese characters ().

One route to investigate how characters are learned relates to whether characters are processed in parts (analytically) or holistically. Research indicates that a higher number of strokes, particularly for less frequent words, can require slower analytic processing () even among intermediate-level learners. Su and Samuels () suggest that “beginning Chinese readers process characters in an analytic way, but that the decoding process changes gradually from analytic to holistic as their reading skills develop” (p. 1085).

To improve their decoding and reading skills, L2 Chinese learners adopt many useful strategies, including visualisation, when learning or being tested on words/characters (e.g., ; ). Shen () found the most common strategies included “paying attention to graphic structures”, “visualizing the graphic structure of the character” and “making use of the phonetic and semantic information in radicals”. Visual and semantic information, therefore, are tightly bound in learning characters (as the paired associative learning model would predict). However, the link between character composition and word meaning is complex, where one character with one meaning can take on a different meaning if combined with another, leading to plenty of ambiguity in character interpretation and making the semantic-form connections difficult to establish (). Character instruction often uses written repetition, following the traditional approach to practising fixed stroke order to fix character knowledge (). The risk here is that characters and words are often presented as decontextualised lists, leading to a risk of rote memorisation in which the character form may not be well connected to meaning (; ). Research shows that learners struggle to surmount the perceived character learning challenges well beyond initial stages (). Teachers could therefore motivate their learners in early-stage learning by focusing on some of the most frequent, easily identifiable and semantically predictable radicals within a character’s composition, easing the transition to character recognition and learnability (; ).

2.2. Visual processing using eye-tracking methodologies and VWM

To understand better the visual/cognitive processes involved in “paying attention to graphic structures” () and how such processes are involved in learning character form and meanings, we argue that eye-tracking methodologies () can provide fresh insights into these processes in real time. Recent studies of eyetracking used by learners generally find that longer gaze patterns can indicate more focused attention, indicated by slow or repeated text reading (e.g., ; ). We suggest that longer eye gaze would therefore indicate visual attention to specific character components in the learning process, particularly among novice readers using the kind of analytic learning discussed earlier. However, to our knowledge, eyetracking methodologies have not been used to investigate word learning processes in beginner-level learners of Mandarin.

We also argue that focused visual attention using longer eyegaze would entail greater use of working memory, particularly VWM. Working memory (WM) has been argued to play a potentially critical role in L2 learning and processing where attention required (; ). Attention is controlled by a central executive system, binding phonological and/or visual information from short and long-term memory to complete a task (e.g., learning words). Visual shorter-term memory (STM) storage is assumed to be more space-constrained than phonological (around three chunks for visual STM compared to around seven for phonological STM, ). Therefore, visual storage capacity alone may be relatively hard to use in linguistic studies to predict individual variation in terms of character recognition or learning. Nevertheless, we believe it is logical to assume that greater WM executive capacity (combining storage with processing) and VWM specifically may be associated with greater ease in character reading and learning processes. And if so, it may follow that VWM is associated with specific eyetracking behaviour (e.g., greater VWM capacity may facilitate shorter and fewer fixations to free up WM capacity elsewhere).

Some studies using WM in relation to Mandarin support the logical assumption that greater WM capacity supports ease of character learning, as with most other forms of learning (). Reder et al. (), using a Paired Associate Learning paradigm, argued that familiarity with components aids learning of novel words made up of multiple components (e.g., bi-morphemes) by reducing WM load in processing the novel input. Their design was a lab-based experiment using university students with no prior knowledge of Mandarin, trained in identifying visually similar or different characters in a visual search task, for an hour a week for four weeks. Participants’ ability to create form-meaning pairings for novel bi-morphemic words was tested in a recall task for words including familiar frequently-presented characters or unfamiliar characters. They found that pairs combining more familiar characters were more easily learned. Performance in the recall task also associated with performance on a WM n-back task requiring recognition of Chinese characters in a sequence, indicating that familiarity or ease or recognition reduces WM pressure in the learning process. This study supports the logical connection made here that visual processing and WM capacity are connected in learning characters. However, the controlled experimental conditions make it difficult to extend the value of their experimental findings to more typical vocabulary learning processes for early-stage learners.

In another study of WM effects, including VWM, Kim et al. () investigated WM in character processing among intermediate-level learners (with prior knowledge of 1500 characters), but without eye-tracking. Testing involved VWM and verbal WM, associated with speed in recognising input-enhanced characters (where one stroke was made bold), phonologically connected characters or semantically connected characters. VWM was found to have an effect on recognition of the enhanced characters but not on semantic or phonological connections. This would suggest that that VWM may likely be associated with intentional focus, but perhaps more in relation to visual signals, rather than in the kind of semantic pairing needed in word learning.

While many existing visual cognition studies used participants who already had a relatively workable Chinese vocabulary, we wanted to see if a similar rationale could explain learning processes required at beginner-level. Specifically, we aimed to explore if visual character processing would be easier when including consistent semantic information in radicals, which could help beginner-level learners to “notice” and recognise character-meaning connections more easily, or whether visual salience alone would draw learners’ visual attention to help learn new words.

2.3. Individual differences and role of exposure

Alongside internal cognitive factors that may impact word learning, individual variation in external factors, such as amount of exposure in and outside the classroom, may also play a role (; ). Research into initial language learning in Chinese shows that even very little exposure (fewer than 10 encounters with a word) can lead to sound-meaning word knowledge (), though this did not include written character learning. As noted earlier in Reder et al. (), characters can be decoded accurately by novice learners with minimal training, albeit in laboratory training setting. Immersion, such as in study abroad settings, has generally been found to boost lexical acquisition, even in short stays of less than eight weeks, though findings can be variable and may depend on quality and quantity of exposure during immersion and whether learners are true novices ().

Thus, there appears to be several gaps in understanding the underlying cognitive processes supporting making initial paired-associations between form and meaning at the earliest stages of learning Chinese and in how focused visual attention and VWM may be implicated, particularly in a fully-immersed Chinese setting. Based on our interpretation of the literature discussed above, we suggest that careful focused attention in visual processing as measured in eyegaze patterns and VWM may be connected to the kind of visual strategies reportedly used by learners when processing characters, assuming that beginner-level learners use visual salience as a way to bootstrap meaning, but this is an issue that needs further exploration.

Therefore, to address this gap, this study investigated learnability of three specific target radicals in monomorphemic words, comparing visually similar but semantically different radicals for animal (nominal) and hand (verbal) and a visually salient but less semantically predictable radical for mouth (verbal). Extending previous research by Kim et al. (), Reder et al. () and Zhang et al. (), our study is the first, to our knowledge, to test visual versus salience semantic predictability in character learning, using eyetracking patterns as evidence of focused attention, in combination with visual working memory processes, among beginner-level learners in an immersion setting.

Our research questions and associated predictions are:

  • (1) How do beginner-level learners recall different types of monomorphemic characters, measured in accuracy and reaction time of responses? Is there an effect of type (semantic predictability or visual salience)?

We predicted that overall salience would impact on recall more than semantic predictability, but there may also be evidence of a general noun-bias if semantic predictability plays a role.

  • (2) Do eyegaze patterns indicate ease of recall?

We predicted that better recall would pattern with reduced focal attention (i.e., fewer fixations and shorter durations on the target area, in proportion to overall eyegaze).

  • (3) Are there effects for VWM on recall or eyegaze?

We predicted that greater VWM capacity could facilitate either higher character recall scores, or reduced eyegaze, or both.

3. Methodology

3.1. Character learning test

We created a self-paced learning task using PsychoPy to test speed and accuracy in recalling 30 target monomorphemic characters, along with 10 distractor characters. The test characters were divided between three types of radicals, always found in the left-side component of the character.

Character stimuli consisted of:

  • 10 verbs using 扌hand radical (TargetV)
  • 5 nouns using 犭animal radical (TargetN), visually similar to TargetV but a different functional/semantic category
  • 5 verbs using 口mouth radical (TargetM), in the same functional/semantic category as TargetV, with high visual salience, but less consistent semantic meaning

The characters were a mix of more or less frequent words, but most would not yet have been taught in textbooks (indicated by personal communication from the head of the participating institute, in which a standard textbook was used for all beginner-level classes). The hand and animal radicals were chosen as reasonably visually similar, though the hand radical is usually rated as more frequent in Chinese dictionaries (). The mouth radical was chosen as very visually salient. It is also found in a character included in early taught input for making introductions (叫 “jiao”, shout or call – as in “wo jiao Ma-ke” – I am called Mark) and was used as a baseline to test if learners had started to achieve some character familiarity. Table 1 shows all the characters used in the study.

Table 1

List of target characters.

TargetV morphemes – verbs with hand radical: 扌to copychao
to beatda
to shakedou
to protecthu4
to carrykang
to fastenkou
to expandkuo
to patpai
to throwtou
to findzhao

TargetN morphemes – nouns with animal radical: 犭doggou

TargetM morphemes – verbs with mouth radical: 口to blowchui
to be calledjiao
to vomittu
to inhalexi
to scarexia

Two bilingual Mandarin/English speakers confirmed that the selection of characters showed a reasonable mix of visual salience, semantic consistency, and frequency. TargetV (verbs, hand-radical) words chosen here are more frequent, but of medium semantic consistency in connection to verbs linked with the idea of “hand”. TargetN (noun, animal-radical) words are less frequent but more semantically consistent in connection to types of animal. The radicals in these two types of words were judged by the two raters to be visually similar in stroke form and salience, but the semantic types were distinct. This would allow us to test our prediction of a potential general noun-preference bias in early-stage word learning, which may be evident in the comparison of scores for the two types of words.

TargetM (verb, mouth-radical) morphemes, by contrast, were chosen to represent frequent, very visually salient words but with lower semantic consistency in connection to verbs linked with the idea of “mouth”. The 10 hand-radical verbs were taken as the baseline for learning. The 5 animal-radical nouns aimed to identify differences by semantic type, while the 5 mouth-radical verbs aimed to identify differences by visual salience.

The test was carried out using a standard Windows-based PC, connected to an Eyelink 2000 tower-mounted system, with a sampling rate of 1 kHz (SR-Research, Ontario, Canada), screen ratio 1920 × 1080. Although viewing occurred with both eyes, eye movements were recorded from the left eye only. Items were presented on a 21-inch LCD monitor, positioned 71 cm from participants, subtending approximately 1° of visual angle. Characters were presented in SimSun 350 font, using black on grey ground, after piloting various sizes and colours to ensure legibility and accuracy of target gaze fixation (). This format was judged to be clearly legible and large enough to distinguish eyegaze direction between the left and right-side components.

In the learning phase, an image of the Chinese character was presented on screen first, then the English word; participants pressed the space bar to advance to the next screen. We then tested recall of those target words, presenting the English word, then a Chinese character, asking participants to judge if the character matched the meaning, by pressing a Shift key (Right for match, Left for unmatched). The recall test was carried out after an intervening 15-minute gap used to conduct a VWM test (see below) and to gather biodata and other relevant participant information. Both the teaching phase and recall testing phase used randomised presentations of the target words to avoid practice effects.

Participants were given time to get comfortably settled at the computer using the Eyelink headrest; they were given three practice character/word pairs first to feel at ease using the computer and keyboard buttons. They were given three sample tests to practice ahead of the real test because it required some practice to keep hands still on the keys without looking down, which would alter the pace of the test. They were instructed to complete the test at a brisk but natural pace. We recommended that they do the untimed practice test in about two to three seconds per item, which many managed to do. The screen was automatically set to move to the next item after six seconds. Results from the recall test were automatically analysed by the PsychoPy test software to create accuracy scores and reaction times in seconds, allowing us to test our prediction that greater salience would lead to higher accuracy and faster reaction-time (RT) scores on the TargetM radical words compared to others.

3.2. Eyegaze procedure

For the eyegaze analysis, we set specific Areas of Interest (AOIs), encompassing the key radical on the left-side component, as shown in the example in Figure 1 for 叫 (“jiao”, shout, be called).

Figure 1 

The character jiao marked up with the area of interest.

We collected data on first fixation, length of first fixation, number of total fixations and total gaze time per item to give insights into proportion of time spent on target AOI during recall. The outcomes allowed us to test our prediction that better recall scores (higher accuracy, lower RTs) on the character recall test would pattern with reduced focal attention (i.e., fewer fixations and shorter durations on the target area, in proportion to overall eyegaze).

3.3. VWM procedure

To test VWM, we used a visual shape search and recall task (adapted from ), tapping individuals’ capacity for retaining memory for a specific version of a particular shape while doing a distractor letter-spotting activity. This was selected to mirror the kind of visual shape knowledge entailed in distinguishing characters’ stroke shape and position. The VWM recall test consisted of an automatically-timed task in which one of four types of shape were presented in different positions on the screen (diamond, square, rhombus or star), with four variants per shape (e.g., vertical or horizontal orientation, skewed left or skewed right). Participants then did a brief distractor test, looking at an image of circles on the screen with one circle labelled M or Z (randomly presented in different positions around the circle) and pressing the matching keyboard letter. Finally, they viewed a set of four variant shapes presented horizontally across the screen and selected which one had been previously presented (using Z, X, N, M letters to indicate diamond, square, rhombus or star respectively). The target recall shape was presented for 250 milliseconds, then the circle screen automatically appeared for four seconds, and finally the target choice screen appeared for four seconds. Participants could also press the space bar to proceed through the test at their own speed. They were given three sample tests to practice ahead of the real test, as before, to feel comfortable keeping their hands on the keys without looking down. Accuracy and reaction times in pressing the correct key at the final target choice stage, recalling the previously presented shape, were automatically calculated by the test software.

3.4. Participants

Bio-data were gathered using a simple questionnaire to identify suitability for the study (adapted from ). Recruited participants were Anglophone speakers, L1 English or dominant L2 English bilinguals with alphabetic-based L1s. Age was not considered to be a relevant variable because all had to be over 18 to attend university classes. Almost all had arrived in China one to two weeks just before starting classes. All participants had 15 hours per week of in-class instruction. They were tested within five to seven weeks of starting their beginner-level Chinese programme, to maximise homogeneity in terms of learning status. Despite our best efforts to recruit novices, there were some participants with some knowledge of Chinese (e.g., having travelled in China in previous years or having studied Chinese in their home country several years ago). However, they reported that they had not learned written Chinese or had not kept up Chinese over the years, hence they were all assigned to beginner-level classes. To mark this variability in prior knowledge, a between-group variable of novice/non-novice was tabulated, where more than two month’s knowledge of Chinese counted as non-novice (i.e., longer than the in-class sessions all our cohort had received at the time of testing). Length of months’ residence in China and daily levels of exposure (on a scale of one, for less than five hours a day, to three, for more than 10 hours a day, following ) was also recorded but proved to have no relation to later results.

Full ethical and consent procedures were followed according to university protocols; participation was voluntary with an inexpensive coffee-shop voucher offered as a thanks. No class teachers were involved in conducting the experiment, and all participants were informed there was no connection between the experiment and class progression.

Forty learners participated, originally 21 novices (no prior knowledge) and 19 with some prior knowledge. Missing data in the eyegaze recordings and VWM tests and some outlier extremely slow responses led to five participants being removed; remaining scores presented here are from 17 novices and 18 non-novices, 35 in total (10 male, 15 female). Data were found to be non-normally distributed, likely due to the small number of participants, and were analysed in SPSS using non-parametric measures of association or difference.

4. Results

The descriptive results, which serve to answer the first research question, are shown for character recall test scores in Table 2, split by group (novice/non-novice). Test scores mark accuracy of recall out of a possible maximum of 40; RTs are presented in seconds for average response time.

Table 2

Test Accuracy and RT scores by Group.


NoviceCharacter test score1732.534.50

Character test RT173.241.69

Non-noviceCharacter test score1833.833.81

Character test RT182.630.79

Next, character test accuracy and RT scores are presented, split by type of radical (Table 3). The raw numbers on each type differed (five noun-based, compared to 15 verb-based, split between hand and mouth radicals). Scores were therefore converted to a ratio between 0 and 1 (calculated as mean/maximum scored). This calculation made comparisons more straightforward to present and easier to analyse in statistical tests of difference or association. Mean RT scores are measured in seconds; some items were judged slowly among the novice group, but as there was no consistent outlier pattern (i.e., one much slower participant on most items, or one item judged more slowly by several participants), the full range is retained to indicate the variation in responses.

Table 3

Accuracy and RT by Group and by Type.


NoviceTargetV0.76 (0.429)*3.638 (2.856)

TargetN0.91 (0.294)3.031 (2.328)

TargetM0.81 (0.393)3.758 (3.531)

Non-noviceTargetV0.81 (0.397)3.070 (2.596)

TargetN0.87 (0.342)2.665 (1.616)

TargetM0.80 (0.402)3.085 (2.338)

Note: * p < .05.

Comparing both groups, the novices’ recall scores were generally less accurate and slower overall compared to the non-novice group. In both groups, though, characters with the animal noun radical (TargetN) were recalled most accurately and most quickly. The novice group recalled characters with the mouth radical (TargetM) more accurately than the non-novice group, but these characters had the slowest RTs. The novice group also had slower and less accurate recall for the hand verb radical (TargetV). For the non-novices, the mouth and verb radicals patterned very similarly in both RT and accuracy. The accuracy score for TargetV was significantly lower than the TargetN or TargetM scores for the novices (using a Kruskal-Wallis Test, H = 8.26, p < .05). Other comparisons between groups, or by radical type, were non-significant.

Therefore, in answer to the first research question, there were some differences by type and learner group for both accuracy and RT in recalling words; overall the non-novice group performed better, but in both groups there seemed to be an apparent semantic effect favouring noun-based word recall accuracy and RT. There did not seem to be a visual salience benefit for the target mouth radical as predicted.

Next, to answer the second research question, we present the eye-tracking data, reporting numbers of fixation and gaze time data, split by group (novice/non-novice), then split by target radical, and tested for associations with recall scores. Data are given for total number of fixations and overall gaze time per type, as well as fixations and time on the target radical area of interest (the left-side character component) and then showing the proportion of target to non-target eyegaze patterns (see Table 4). N represents an aggregated number of occurrences of fixations for each target type (maximum of 90 for noun TargetN and verb TargetM types, and maximum of 180 for verb TargetV types; note there were some missing individual items on each type in the novice group).

Table 4

Mean scores for eyegaze data, reported by group and type.









Note: * p < .05, ** p < .01.

Similar to the recall scores observed earlier, results revealed an overall advantage for the non-novice group across all measures. However, against predictions, the eyegaze data indicated a preference or ease of recognition for the visually salient mouth verb radical (TargetM) with a lower number of fixations and shorter durations on target radical area of interest, both in raw scores and in proportion to the total fixations and durations for the character as a whole. The TargetV hand radical yielded the highest measures as raw and proportion scores for fixations and duration.

Running Kruskal-Wallis tests for comparisons by type for each group, adjusted by Bonferroni correction for multiple comparisons, we found a significant effect of target type on fixations and duration in both groups. Post-hoc analysis showed that for the novice group TargetM fixations were significantly fewer (H = –3.817, p < .01) and faster (H = –3.555, p < .01) compared to TargetV fixations. For the non-novices, the same pattern was found for target fixation (H = –3.090, p < .01) and duration (H = –2.632, p < .01). Using Spearman correlation tests, recall scores (RT and accuracy) were tested for association with eyegaze data, but no pattern of significant correlations were found, either in total or on the target AOIs.

Although the eyegaze findings are somewhat mixed, we draw the conclusion here that words containing the more visually salient mouth radical was perhaps more easily “noticed”, requiring less focal attention, in terms of reduced eyegaze time, although this did not carry over to accuracy of recall. Equally, the semantic recall advantage apparently found earlier in higher accuracy on the target noun words did not seem clearly associated with visual focal attention, at least as measured here. Thus, for the second research question, our prediction of a connection between reduced focal attention and accuracy or speed of semantic recall was not sustained.

Turning now to VWM to address the third research question, we analysed patterns in the VWM shape recall results for accuracy and response times. Then we ran tests of association to see if VWM capacity was connected to better character recall or if it was associated with reduced visual focal attention in the eyegaze data. Descriptive results are reported in Table 5, split by group, showing accuracy and RT speed for recall of the test shapes. The groups behaved similarly in the VWM test; non-novices scored slightly higher than novices on accuracy, though with slightly slower RTs. No differences were significant.

Table 5

Mean VWM scores by group.


NoviceVWM accuracy1733.598.86

VWM RT172.290.67

Non-noviceVWM accuracy1836.947.39

VWM RT182.380.67

Spearman’s correlations were computed to test for any rank-order associations between VWM and recall test scores. For the non-novices, a significant strong positive correlation was found between VWM RT and recall RT (r = .705, p < .01), suggesting some overlap in visual and semantic processing speed independent of task. There were no correlations found for accuracy either in VWM or in the character recall scores for either group. Therefore, we are unable to support our prediction that VWM capacity predicts ease of character recall for beginner-level learners in general terms.

For the novices, however, we noted a significant negative correlation between VWM RT speed and character recall RT speed (r = –.500, p < .05). This may be simply a methodological confound, or it could suggest some kind of visual processing threshold effect at novice level. In other words, until learners have some familiarity with characters, VWM may not readily be channelled to boost character recognition and recall (similar to general WM threshold effects found in other studies of learner development, such as ; ). Further experimental research could test this suggestion in more detail.

We next tested for any associations between VWM scores and eyegaze patterns. This allowed us to evaluate our prediction that higher VWM capacity would facilitate reduced focal attention (i.e., fewer and shorter fixations). Again, running Spearman correlation tests, we found no significant correlations for the non-novices with either VWM accuracy or RTs. For the novices there were no associations for VWM accuracy. However, we found that higher VWM RT scores were significantly positively associated with longer eyegaze durations on TargetM characters overall (r = .542, p < .05) and with TargetV characters both overall (r = .493, p < .05) and on AOI (r = .537, p < .05). Overall, for our third research question, we did not find a significant or consistent beneficial effect of VWM to facilitate higher scores or reduced eyegaze in character learning as predicted. Indeed, among the novices, slower VWM seemed to be connected with longer eyegaze patterns. Again, we suggest this may indicate some kind of visual processing threshold, independent of semantic processing.

To recap, across all findings, the data indicate some separation between semantic and visual processing for our beginner-level cohort. This separation seems to emerge after only a little exposure, in view of the better performance found among the non-novice group, compared to the absolute novices. Overall, noun-based radicals were most successfully recalled in the test measures, while eyegaze patterns or VWM capacity did not show consistent associations with radical types. We found some indication to suggest that individuals with faster visual processing seemed able to tap those capacities to complete any of the visually presented tasks (recall of characters or of shapes) more quickly, though not necessarily more accurately, while for the novices, slower eye gaze seemed to be connected to slower VWM.

5. Discussion and conclusion

This study examined the nature of visual attention when learning new Chinese characters by beginner-level learners of Chinese after around five to seven weeks of instruction in a classroom setting in China. The participants had either no prior knowledge (novices) or around 2 months prior knowledge (non-novices), but all reported they could not read in Chinese before starting the classes. Eyegaze data was used to compare visual focal attention between three types of target radicals within characters, while testing accuracy and speed in a character recall test; VWM was also tested. Overall scores found a slight advantage of prior knowledge because the non-novice group generally scored better in terms of accuracy and speed on the recall task, but not significantly so, and not across all target types. In view of the exploratory nature of the study, our findings at this stage are somewhat limited in scope; however, we argue there are some valuable insights meriting further research in this area.

We had made three assumptions as to potential connections between radical type, eyegaze and VWM capacity. Firstly, we predicted that character-learning may be easier for characters containing radicals with greater visual salience. Second, we hypothesized that characters with radicals that were easier to learn and recall would require less visual focal attention (shorter and fewer eyegaze fixations). Third, we predicted that VWM capacity could facilitate either higher character recall scores, or reduced focal attention or both. Our data suggested that visual salience on the mouth TargetM radical did reduce load on visual attention, in that eyegaze was shorter and with fewer fixations on the characters with that radical. However, visual salience did not translate into significantly easier character recall for that radical. In contrast, the noun (animal) TargetN radical was the most accurate and fastest. VWM capacity likewise did not seem to facilitate better character recall scores or reduced visual attention.

Therefore, visual cognitive processes as measured by eyegaze patterns and VWM, at least in this study, did not seem to be strongly implicated in the linguistic and cognitive pairing processes needed to link visual form to semantic meaning for our participants. Perhaps, even among beginner-level learners of Chinese trying to find a way into decoding characters, the linguistic processes to search for meaning take preference over visual pattern-spotting. We tentatively take the success in learning the TargetN forms as some support for the notion of semantic bias found in other languages, in which nouns seem to be acquired more easily than verbs (), though further testing with more participants and a wider range of characters would be needed to reinforce this tentative conclusion.

There could also be some kind of WM threshold effect (). We noted a significant positive association between RT scores (but not accuracy) in both the recall test and VWM test for the non-novices. By contrast, among the novices, there was a significant negative association between character recall RT and VWM RT. In other words, some participants seemed to process visual information, whether semantic or shape-based, more quickly or more slowly than others. This also could be taken to indicate a separation of semantic from visual processing, as suggested in dual-code models of word learning (), particularly in early-level beginners, compared to the complex representations accessed by skilled readers with large vocabularies (e.g., as found by ). We suggest, in line with other research on WM thresholds, that there could be some kind of potential VWM processing threshold to be overcome before VWM starts to assist in making cognitive connections between semantic and visual memory.

The multi-level mappings needed for building literacy and developing firm paired-associate learned items (; ) take some time and familiarity with linguistic input to become established. Specifically, in terms of opaque Chinese orthography, it could be that it takes more than a few exposures to make the connections between visual shape recognition, character learning and semantic retrieval compared to the relative ease of word learning from contextual oral input (as found by ). It could also be that learners had not had any training in how to use radicals to identify patterns of functional type or semantic meaning. Post-hoc discussion with the participants revealed that some could recognise individual items holistically (such as the character for dog) or had spotted the use of the hand radical in some verbs, but none reported being trained in radical recognition strategies to help learn word meanings.

In view of the logical assumptions for VWM benefits drawn from the literature, we were surprised that the visual cognition measures did not lead to clearer outcomes. It could be that technical and methodological limitations over our eyegaze and VWM data obscured potential patterns; ensuring consistent eyegaze on the target areas was not always easy, and many participants reported finding the VWM test very hard. It could also be that our character presentation on the screen had some limitations: making characters large enough to detect eyegaze patterns on and off target areas made it seem non-authentic, according to some participants. Finally, many in our relatively small sample had more exposure to Chinese characters than originally intended, so some of our original assumptions and conclusions about initial exposure processing have had to be hedged.

To conclude, it remains an open question how visual cognitive processes such as VWM assisting learnability in mapping semantically-predictable word types (noun, verb) to surface forms for Chinese. Similarly, it remains inconclusive whether visual focal attention, measured in eyegaze patterns, provides an informative window into beginner-level learners’ abilities in encoding character forms. The data here suggest there are some generalisable semantic processes involved in word learning, even for languages with deep or non-transparent orthography which are independent of working memory processes, at least for VMM. Further research including suitable verbal phonological WM tests would be valuable to establish a clearer model of how form-meaning-sound connections are made for Chinese, particularly at beginner-level. Additional insights could be gained by using a wider range of radicals, more authentic screen-sizing, and with a longitudinal design with more participants, including absolute ab-initio learners.

Meanwhile, we suggest that with consistent linguistically based guidance on selected radicals, character learning can be effective from the outset. Semantic categories can also be used to inform teaching characters if Chinese follows findings in other languages of having a noun bias, where nouns seem to be more easily learned than other semantic categories (; ). However, there may be some debate about this in Mandarin (). If learning nouns, particularly those with consistent semantic radical components, is a general linguistic ability that “comes for free”, this means that teachers and learners can devote more time and attention to learning characters for other words such as verbs, where there is less semantic concreteness and consistency. We hope this study prompts more research along these lines to help learners and teachers to find successful ways to meet the challenge of learning to read in Chinese.