Suggestion: Finding missing words in dictionaries via web scraping and natural language processing

Andrej Warkentin Thu, 17 Aug 2017 04:12:54 -0700

Hello,

in a talk at the PyData Berlin meetup I saw this project:https://github.com/lusy/hora-de-decir-bye-bye , where spanish articlesare scraped and searched for english words. In order to identify englishwords she used the dictionaries from Open Office and compared scrapedwords to the dictionaries. She mentioned the problem that not all wordswere in the dictionaries.

So I thought this could be used to find (or at least help finding) mostmissing words in dictionaries for all languages. One could scrape e.g.all Wikipedia articles of a certain language and create a candidate listof missing words. Or it could also be used to find domain specific wordsby scraping e.g. scientific articles, articles from certain types ofwebsites and so on.

My question is if this would be something helpful at all or if missingwords in dictionaries is not a problem anymore. Also, I unfortunatelydon't have much spare time at the moment to work on this so if anyonewants to pick this up feel free to do so. I will let you know when Iimplemented something myself.


I'm looking forward to your feedback.

Cheers,

Andrej

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Suggestion: Finding missing words in dictionaries via web scraping and natural language processing

Reply via email to