An effective 'Readability Index'?

I am new to this site and am still trying to find out how it works. However, let me put out some of my ideas about 'readability' in the hope that I am not covering ground already extensively passed through by others and that my main idea might encourage someone who is computer-savvy and interested enough to take it further.

The last time I tried to use a readabilty index, when they were installed in PCs, it was a disappointing experience. They had names like the 'Flesch Index' and 'Othername index'. One index automatically calculated the average number of letters in words of the selected text, made some hidden internal calculations and gave a score. The second counted the average number of words in the text's sentences and gave a score. The scores then indicated which U.S. grade level the text was suitable for. However, the recomended grade levels varied widely when I applied the two methods to the same selected reading texts. There was also a 'Flesch-Othername Index' which presumably combined the two methods. I disagreed with results from all three systems for the following reason:

All three systems indicated that, for example:

'Mars is arid'
is simple and would be suitable for first grade readers, whereas
'All the chldren and the teacher went to the zoo and saw a lot of interesting animals'
is difficult and would only be suitable for 10th. grade readers.

It seems self evident that the first sentence would in fact defeat early readers (in terms of understanding, if not enunciation) and the second sentence would be understandable to any reader with even the most basic vocabulary.

What I would hope is that someone out there with the necessary computer skills would develop a really effective readability index computer program or application for use with texts meant for foreign learners by taking into account the ratio of low to high frequency words in the selected text.

In developing my series of graded readers and EFL texts, I used a high to low frequency word list which combines the Britsh National Corpus list, the Wellington University list of the first 3,000 most common words, common sense and a list of the most to least common words used in Internet websites. The list has over 20,000 entries. However, most sensible low to high frequency lists would suffice.

Using a list ranked by frequency of general use, the most common English word 'the' receives a score of 1, and the low frequency words 'Mars' and 'arid' receive scores of 5,506 and 12,132 respectively. By my list, the two sentences would score as follows:

Mars: 5,506
is: 8
arid 12, 132

TOTAL: 17,646 Readability Index (average): 5,882

The: 1
children 176
and 3
the 1
teacher 114
went 195
to 4
the 1
zoo 272
and 3
saw 423
a 5
lot 131
of 2
interesting 315
animals 202

TOTAL: 1,749 Readablity Index: 109

The program would then indicate that:

1. the second text has no words with a frequency lower than 423 and has a readability index of 109, suitable for (say) second grade children or 'near beginner' students of English as a foreign language, and

2. the first text has mainly very low frequency words with a readability index of 5,882 and is suitable for (say) eigth grade children or 'mid-intermediate' students of English as a foreign language.

Hopefully, the above is of interest to some users of this website.

Comments

Computer Aided Extensive Reading PhD

Hi Phillip,

I am developing software as part of my PhD for Computer Assisted ER. I am using readability measures similar to those you suggest to help readers find articles on the internet appropriate to their ability... My software also automatically creates L1 glosses for the articles as well.

Active forum topics

New forum topics