Significance Testing in Corpus Linguistics for Corpus Comparison
I would like to compare two corpora, one consisting of Wikipedia articles and one made up of articles in printed encyclopedias. The corpus is thematically comparable because I always chose the same entry and I respected the disciplines defining subcorpora for geography, chemistry and so on. Now I would like to compare the two subcopora (Wikipedia vs. printed encyclopedia) and the word frequencies which are in to find typical words for each subcorpus.
I compared relative frequencies but I ask myself if I could apply any statistical measure to proove that my results are not random and to measure the significance of my results.
Thanks for answers!