Analysis of breadth of words covered in a tv show
Good afternoon to everybody,
I am in the process of developing a course/product that uses a telenovela (tv series) to teach people Spanish. However, I would like to do an analysis of the episode transcripts in order to determine what words are covered in the show and what words are not. More specifically, I need to compare the words used in the show against a frequency list of the words used in the language, in order to determine if there are any important words that are missing from the show.
Does anybody know how I can do this? If anyone has knowledge of how to do this for English but not Spanish, that is ok as well, as I am also interested in analyzing the transcripts of a few English shows as well.
Thank you to everyone for taking the time to read this!