Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

It seems like what you're looking for is a speech corpus from which you could extract out frequency counts of the different phones of the language. If so, of the various corpora available online, I'd recommend starting here:

Both of them can be obtained for free for academic purposes. Depending on how narrow or broad their transcriptions are, you might be able to extract out the information you're looking for. An in-depth examination of the data will probably require programming skills, so you if you don't know how to use Python, R, etc., you could ask a friend/colleague/professor for help.

Also, I should note that the answer will depend on what kind of speech you're looking at. Just like text corpora differ in many ways depending on whether they are from novels, news, etc., your distributions of frequency counts will probably depend on whether you're looking at spontaneous speech or semi-structured/elicited speech, and what kind of discourse it is. (Some of these effects will be because certain kinds of words, containing certain kinds of sounds, are more/less frequent in one genre compared to another.)