Ask Your Question

MichaelGauthier's profile - activity

2015-05-25 05:56:02 -0500 received badge  Editor (source)
2015-05-25 05:55:03 -0500 answered a question What are some English corpora which have updated taboo language?


Corpora with "updated" (though it depends on what you mean by "updated" I guess) taboo language are not easy to find, as naturally occurring instances of swearing/taboo language are hard to record to begin with. It also depends on what you are looking for. Do you want to analyze taboo language in spoken or written contexts? Do you want to focus on a specific region? I think that all these kinds of questions need to be answered before looking for corpora.

I was in the exact same situation two years ago, and I found no available corpus meeting all my requirements (but they were relatively specific), so I eventually decided to build my own corpus through Twitter. However, some of the corpora which may still have been interesting in this regard are:

  • The Scottish Corpus of Texts and Speech
  • The Glowbe Corpus, though only a very small fraction of it will be interesting for swearing in my opinion
  • The Limerick Corpus of Irish English, though I don't know the extent to which swear words are present in this one

I hope this helps, and if anyone has updated information on these kinds of corpora, I would be very interested in learning about it too!

If you want to discuss more about it, feel free to contact me on my academia profile (, I'd be glad to know more about the details of you research, or to exchange ideas!

Good luck!