Ask Your Question

Revision history [back]

See the answer here:

http://askaling.linguistlist.org/question/199/straightforward-software-for-forming-a-corpus-of-on-line-texts/

for example AntConc should be useful, but also general Unix command line tools can allow you to process or analyze language corpora for other languages, assuming that you are talking about other scripts or text encodings (e.g. some Unicode standard). Otherwise, you can always code simple tools in Python or Java yourself. You might want to look into NLTK for Python, which can process various file formats and Unicode encoded texts.

DC