Significance Testing in Corpus Linguistics for Corpus Comparison

asked 2016-02-20 15:54:25 -0500

Dear linguists,

I would like to compare two corpora, one consisting of Wikipedia articles and one made up of articles in printed encyclopedias. The corpus is thematically comparable because I always chose the same entry and I respected the disciplines defining subcorpora for geography, chemistry and so on. Now I would like to compare the two subcopora (Wikipedia vs. printed encyclopedia) and the word frequencies which are in to find typical words for each subcorpus.

I compared relative frequencies but I ask myself if I could apply any statistical measure to proove that my results are not random and to measure the significance of my results.

Thanks for answers!

Bettina Eiber

edit retag flag offensive close merge delete