Comparison of Languages by Vocabulary Size

asked 2015-03-19

anonymous user


I would like help in arriving at very rough estimates of the sizes of vocabulary in Ancient Hebrew (say 1000 BCE to 200BCE), Koine Greek roughly around 200 BCE, and 16th century English. When I say very rough, I mean probably in thousands for the first two, and tens or hundreds of thousands for the last case.


Michael Alliston

(Transferred from old LINGUIST List Ask-a-Linguist site)

answered 2015-03-19

Hi, Michael,

I don't have much access to information on the languages you ask about, but I can give you an answer applicable to all languages: there is no limit to the size of the vocabulary for any language. If you want a reference for this, I can refer you to a book and a doctoral thesis. The doctoral thesis, from about 1968, was written by George Bedell and examined words in English like internation-al, international-ize, internationaliz-ation, internationalization-al, etc.; that is, there is a cycle in English of adding suffixes to Latinate stems in the order -al, -ize, -ation, [repeat successively]. Now, each suffix adds some meaning to the word (and changes its part of speech), successively narrowing down the meaning of the result, and Dr. Bedell showed that there was no theoretical limit to how often you could add suffixes in such cases. That is, he showed that there are an infinite (unbounded) number of such words in English. Most languages have such ways of forming new words, and so we may suppose that all languages have a theoretically infinite number of words. This conclusion is also supported by the fact that even the very largest dictionaries constantly get added to and by other evidence.

The book is: Word Frequency Distributions (2002) - by R. Harald Baayen, published by Springer Verlag. [available in paperback from Amazon for as little as $65.] Amazon includes a good review of the content of the book. The review does not mention, however, a possible conclusion of the elegant mathematics that Baayen introduces, namely, that in fact there can be no limit to the number of words in (at least) English, since enlarging the corpus introduces an ever-growing number of new words, and the curve describing the resulting total number of words has no asymptote (that's math-speak for: there is no upper limit to the number of English words, which in turn is how mathematicians say: there are an infinite number of words in English). While we need to do further investigations for corpora in other languages, there is no reason to doubt that the number of words in all languages is in fact infinite.

Of course I am perfectly aware that this is not the answer you were looking for. Now, for Ancient Hebrew, for example, the total textual output available is obviously finite, so the number of different words (I assume this is what you are actually asking) is also finite. Likewise, we can surmise that many documents in Ancient Hebrew were lost over time, which means that there must be unattested (undocumented) words that existed in Ancient Hebrew. In any case, the point is that, given the lexical resources which speakers of A.H. had, there was at the time an unbounded number of possible words in A.H. Please note that I have not even included in the calculations I have described the proper nouns which existed at the time and, as in all languages, which have no upper ... (more)

I don't think that this applies to all languages.Each language has its own limitations when it comes to building vocabulary.Again acquisition of vocabulary is linked to culture and knowledge base of the native community also.

A S Sundar gravatar imageA S Sundar ( 2015-05-25 03:25:43 -0400 )edit
Asked: 2015-03-19 13:29:57 -0400

