Lexical Decision Corpora

On this page, you will find the experimental details for the two lexical decision corpora that we have collected. The lexical decision data is available for anyone to download. The goal is to make available a large dataset to address a variety of questions regarding lexical decision performance. In doing so, we have only two requests:

1. Any use of this data should be acknowledged via the appropriate citation:

Balota, D.A., Cortese, M.J., & Pilotti, M. (1999). Item-level analyses of lexical decision performance:  Results from a mega-study. In Abstracts of the 40th Annual Meeting of the Psychonomics Society (p. 44). Los Angeles, CA: Psychonomic Society.

2. Distribution of the lexical decision data should be exclusively from this site.  In other words, do not distribute data second hand.  Please direct any requests for these data to this site.  This is primarily to ensure accurate citations and to ensure that the corpora are distributed in their entirety rather than in parts.
Please direct any questions or comments to either:

Below we report the methods for the data collection.



Thirty younger adults (mean age: 21.1) were recruited from the undergraduate student population at Washington University. Thirty older adults (mean age 73.6) were recruited from Washington University’s Aging and Development Subject Pool. All individuals were paid $40.00 for their participation.


Several different IBM compatible computers were used to control the display of stimuli and to collect response latencies to the nearest millisecond. The stimuli were displayed on a 14 inch color monitor in 40 column mode in white on a black background.


The stimuli for the lexical decision task consisted of 2,906 monosyllabic words and an equal number of length-matched monosyllabic nonwords. The words ranged in frequency from 0 to 69,971 counts per million (Kucera & Francis, 1967), and from 2 to 8 letters in length.


For the lexical decision task, each individual participated in two experimental sessions that took place on separate days within a one-week period, with half the stimuli presented in each session. Each trial consisted of the following sequence of events: (a) a fixation point (+) presented in the center of the computer screen for 400 ms, (b) a blank screen for 200 ms, (c) the LDT stimulus appeared centered at fixation until a response was made. Subjects pressed the”/” key for words and the “z” key for nonwords on the keyboard. The fixation point appeared 1,200 ms after a correct response was made and 2,700 ms after an incorrect response.
The stimuli were organized in 10 blocks of trials (Blocks 1-9 = 600 stimuli/block; block 10 = 412). Blocks were counterbalanced across subjects in a Latin Square design to control for list order effects. Trials within each block were randomly presented, with the constraints that there were an equal number of words and nonwords, and the length of words and nonwords was equated. Participants were given the opportunity to take a break between each of the blocks and 2 breaks within each block. Each session began with 20 practice trials.

Outliers were eliminated as in Spieler and Balota (1997). These criteria eliminated 2% of the data for young adults and 3% of the data for older adults. The word apse was eliminated due to the lack of any correct responses for it from either subject group.

