Ask Your Question

Revision history [back]

Hi, Michael,

I don't have much access to information on the languages you ask about, but I can give you an answer applicable to all languages: there is no limit to the size of the vocabulary for any language. If you want a reference for this, I can refer you to a book and a doctoral thesis. The doctoral thesis, from about 1968, was written by George Bedell and examined words in English like internation-al, international-ize, internationaliz-ation, internationalization-al, etc.; that is, there is a cycle in English of adding suffixes to Latinate stems in the order -al, -ize, -ation, [repeat successively]. Now, each suffix adds some meaning to the word (and changes its part of speech), successively narrowing down the meaning of the result, and Dr. Bedell showed that there was no theoretical limit to how often you could add suffixes in such cases. That is, he showed that there are an infinite (unbounded) number of such words in English. Most languages have such ways of forming new words, and so we may suppose that all languages have a theoretically infinite number of words. This conclusion is also supported by the fact that even the very largest dictionaries constantly get added to and by other evidence.

The book is: Word Frequency Distributions (2002) - by R. Harald Baayen, published by Springer Verlag. [available in paperback from Amazon for as little as $65.] Amazon includes a good review of the content of the book. The review does not mention, however, a possible conclusion of the elegant mathematics that Baayen introduces, namely, that in fact there can be no limit to the number of words in (at least) English, since enlarging the corpus introduces an ever-growing number of new words, and the curve describing the resulting total number of words has no asymptote (that's math-speak for: there is no upper limit to the number of English words, which in turn is how mathematicians say: there are an infinite number of words in English). While we need to do further investigations for corpora in other languages, there is no reason to doubt that the number of words in all languages is in fact infinite.

Of course I am perfectly aware that this is not the answer you were looking for. Now, for Ancient Hebrew, for example, the total textual output available is obviously finite, so the number of different words (I assume this is what you are actually asking) is also finite. Likewise, we can surmise that many documents in Ancient Hebrew were lost over time, which means that there must be unattested (undocumented) words that existed in Ancient Hebrew. In any case, the point is that, given the lexical resources which speakers of A.H. had, there was at the time an unbounded number of possible words in A.H. Please note that I have not even included in the calculations I have described the proper nouns which existed at the time and, as in all languages, which have no upper bound on their number, and which I consider an integral part of all languages, so that the bottom line is that, apparently, all languages both theoretically and in fact have an infinite number of words.


James L. Fidelholtz Posgrado en Ciencias del Lenguaje Instituto de Ciencias Sociales y Humanidades Benemérita Universidad Autónoma de Puebla, México