On Tue, Oct 8, 2024 at 8:59 AM Bryan Davis <bd...@wikimedia.org> wrote: > > I was asked recently what I knew about the types of tools that use > data from the https://dumps.wikimedia.org/ project. I had to admit > that I really didn't know of many tools off the top of my head that > relied on dumps. Most of the use cases I have heard about are for > research topics like looking at word frequencies and sentence > complexity, or machine learning things that consume some or all of the > wiki corpus. > > Do you run a tool that needs data from Dumps to do its job? I would > love to hear some stories about how this data helps folks advance the > work of the movement.
YiFeiBot uses dumps to find a list of pages with interlanguage links, for the interlanguage link removal task. It does this by processing each page's wikitext through a regex. > Bryan > -- > Bryan Davis Wikimedia Foundation > Principal Software Engineer Boise, ID USA > [[m:User:BDavis_(WMF)]] irc: bd808 > _______________________________________________ > Cloud mailing list -- cloud@lists.wikimedia.org > List information: > https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/ _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/