Finding suitable reading content

Finding Chinese content suited to your current level is one of the most important things you can do if you want to read native materials.

If you try to read something too far above your current level then reading it will feel like a chore and it will also destroy your confidence. This is because even after learning hundreds (or even thousands) of new words you will still need to perform frequent dictionary lookups to understand the text and it will feel like you aren’t making any progress, despite the large number of words already learnt.

On the other hand, reading something at the right level, or only slighly above your level, will make you feel energized and give you a huge sense of satisfaction because you will feel that you can read Chinese! Which is an awesome feeling to have.

You’ll also be picking up new vocabulary which will make content that was previously too difficult much more accessible. It becomes a positive feedback loop where you feel good about reading, which makes you read more, which makes you learn more vocabulary, which makes it easier to read, which makes you feel good about reading, and so on.

Finding something at the right level can be a challenge however, because it’s difficult to know beforehand how suitable a given text is based on your current vocabulary, and that’s where a program like Chinese Text Analyser can come in useful.

Chinese Text Analyser analyses Chinese text and provides you with statistics you can use to compare the relative difficulties of different texts based on your known vocabulary. It works with novels, newspaper articles, blog posts or any other Chinese text that has an electronic copy, and it’s fast enough that it can be used to analyse a full novel, or even several novels, in just a few seconds.

To make a comparison, simply open the texts you are interested in, and compare the various word statistics. This can be done by opening files from disk, or directly pasting from the clipboard.

For example, the table below shows the ‘Total Percent Known’ statistics for HSK 4, 5 and 6 level vocabularies for the 5 novels 《许三观卖血记》,《活着》,《圈子圈套》,《雪山飞狐》and《天龙八部》:

Title	HSK 4	HSK 5	HSK 6
许三观卖血记	61.5%	65.9%	67.3%
活着	60.7%	65.4%	68.4%
圈子圈套	58.7%	64.4%	67.4%
雪山飞狐	41.5%	45.9%	49.0%
天龙八部	40.8%	46.0%	49.0%

Although not a perfect indicator of difficulty, if you’ve been using Chinese Text Analyser long enough and it has built up an accurate model of your vocabulary, then the percentage of total known words in a text will be a reasonably proxy for difficulty, especially when comparing the relative difficulty of multiple texts against each other.

From this table, we can see that《许三观卖血记》,《活着》and《圈子圈套》have approximately similar difficulty in each of the above vocabulary levels, with the first two novels being slightly easier at the HSK 4 level. By comparison 《雪山飞狐》and《天龙八部》are significantly more difficult across all levels.

If you have read these books, you’ll find this closely corresponds to their actual difficulty.

The amount of currently known words is not the only thing you should consider when deciding which book would be easier to read. It’s also important to get an idea of how many words it will take to reach a certain percentage of understanding of the text.

In Chinese Text Analyser you can do this by looking at the ‘Unknown’ tab of the word list view, sorting by the ‘Frequency’ column in descending order, and then scrolling down until the ‘Cumulative % Frequency’ is at the level you’d like to reach. Now you can look at the row number to get the approximate number of words you’d need to learn to reach that level.

98% is generally around the point at which unknown words in the text do not hinder understanding. See this video for a good explanation of why. ‘Sinosplice’ also has demonstrations of this in English and Chinese.

If we take the same novels above and look at the number of words it takes to reach 98% comprehension of the text starting from a base of HSK 4, 5 and 6 then we get the following results:

Title	HSK 4	HSK 5	HSK 6
许三观卖血记	2,400	2,100	1,910
活着	2,910	2,570	2,300
圈子圈套	5,350	4,650	4,060
雪山飞狐	5,630	5,280	4,850
天龙八部	9,920	9,370	8,560

For someone at HSK 4 level trying to choose which book to read,《许三观卖血记》would probably be the best choice out of these five texts.

Not only does it have the largest amount of known words, it also has the smallest amount of words needed to learn before being able to understand 98% of the text.

For someone at HSK 6, it’s a bit less clear. 《活着》has the most currently known vocabulary, but 《许三观卖血记》requires learning less vocabulary to reach 98%, and ultimately that probably makes it a better choice to read. In any case, those two books have the same author, style and setting and so many of the words you learn in one novel will be directly relevant to the other one.

《圈子圈套》has a similar amount of known words as the first two books, however by looking at the second table we can see it requires learning a much larger number of new words to get to 98% understanding of the text. This means that if you were trying to choose between these 5 books, you’d be better off leaving this book for later. As you read other texts, you’ll expand your vocabulary and that will eventually make this novel more accessible.

Rounding out the 5 books we have《雪山飞狐》and《天龙八部》, which by both metrics are significantly more difficult than the other texts and therefore they would be poor choices to read at this point in time.

Regardless of the text you are comparing, Chinese Text Analyser provides a quick and easy way to compare which of several texts is the most suitable for you to read at a given point in time.

Just open the texts, compare the relevant statistics, choose the one that’s easiest - either in terms of how well you can currently understand it, or how many words you need to learn to reach a certain level of understanding of the text, and then get reading!

By choosing content suitable for your level, you can make reading (and studying) an enjoyable process rather than a chore, and improve your Chinese at the same time.

Chinese the Hard Way

..because there is no easy way

Finding suitable reading content