Oh I don't know where to even start: AI/ML done by me:
- https://www.mediawiki.org/wiki/User:Ladsgroup/masz this uses dumps to help checkusers of 11 wikis do their work more efficiently. - Tool that finds "bad words" of each wiki automatically using the history of edits dumps which later got used in ores and abuse filters: https://gist.github.com/Ladsgroup/cc22515f55ae3d868f47 General research done by others: - There are around 5,000 papers indexed in google scholar where they explicitly mention "dumps.wikimedia.org" https://scholar.google.com/scholar?hl=de&as_sdt=0%2C5&q=%22dumps.wikimedia.org%22&btnG= Other projects I remember from top of my head: - https://www.wikidata.org/wiki/Wikidata:Automated_finding_references_input - https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia Cleaning up articles from syntax and styling issues - I have done also all sorts of small clean ups using dumps, - guessing the name of a person in another language based on how first name and last name is used in wikidata. - Same for translating the name of species into Persian using their scientific name (e.g. Acacia sibirica -> آکاسیای سیبری which is Siberian Acacia). - Finding duplicate items to merge - Many more I can't remember right now. Hope that helps Am Di., 8. Okt. 2024 um 18:06 Uhr schrieb Kimmo Virtanen < kimmo.virta...@wikimedia.fi>: > I am from time to time using dumps for parsing data that I cannot get via > SQL/API. For example in summer I fetched Wikimedia Commons page history for > getting the list of old categories of images so that I would not be > re-inserting categories by bot which were least once removed from the photo. > > Br, > -- Kimmo Virtanen, Zache > > On Tue, Oct 8, 2024 at 6:59 PM Bryan Davis <bd...@wikimedia.org> wrote: > >> I was asked recently what I knew about the types of tools that use >> data from the https://dumps.wikimedia.org/ project. I had to admit >> that I really didn't know of many tools off the top of my head that >> relied on dumps. Most of the use cases I have heard about are for >> research topics like looking at word frequencies and sentence >> complexity, or machine learning things that consume some or all of the >> wiki corpus. >> >> Do you run a tool that needs data from Dumps to do its job? I would >> love to hear some stories about how this data helps folks advance the >> work of the movement. >> >> Bryan >> -- >> Bryan Davis Wikimedia Foundation >> Principal Software Engineer Boise, ID USA >> [[m:User:BDavis_(WMF)]] irc: bd808 >> _______________________________________________ >> Cloud mailing list -- cloud@lists.wikimedia.org >> List information: >> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/ >> > _______________________________________________ > Cloud mailing list -- cloud@lists.wikimedia.org > List information: > https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/ > -- Amir (he/him)
_______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/