Oh I don't know where to even start:

AI/ML done by me:

   - https://www.mediawiki.org/wiki/User:Ladsgroup/masz this uses dumps to
   help checkusers of 11 wikis do their work more efficiently.
   - Tool that finds "bad words" of each wiki automatically using the
   history of edits dumps which later got used in ores and abuse filters:
   https://gist.github.com/Ladsgroup/cc22515f55ae3d868f47

General research done by others:

   - There are around 5,000 papers indexed in google scholar where they
   explicitly mention "dumps.wikimedia.org"
   
https://scholar.google.com/scholar?hl=de&as_sdt=0%2C5&q=%22dumps.wikimedia.org%22&btnG=

Other projects I remember from top of my head:

   -
   https://www.wikidata.org/wiki/Wikidata:Automated_finding_references_input
   - https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia
   Cleaning up articles from syntax and styling issues
   - I have done also all sorts of small clean ups using dumps,
      - guessing the name of a person in another language based on how
      first name and last name is used in wikidata.
      - Same for translating the name of species into Persian using their
      scientific name (e.g. Acacia sibirica -> آکاسیای سیبری which is Siberian
      Acacia).
      - Finding duplicate items to merge
      - Many more I can't remember right now.

Hope that helps

Am Di., 8. Okt. 2024 um 18:06 Uhr schrieb Kimmo Virtanen <
kimmo.virta...@wikimedia.fi>:

> I am from time to time using dumps for parsing data that I cannot get via
> SQL/API. For example in summer I fetched Wikimedia Commons page history for
> getting the list of old categories of images so that I would not be
> re-inserting categories by bot which were least once removed from the photo.
>
>  Br,
> -- Kimmo Virtanen, Zache
>
> On Tue, Oct 8, 2024 at 6:59 PM Bryan Davis <bd...@wikimedia.org> wrote:
>
>> I was asked recently what I knew about the types of tools that use
>> data from the https://dumps.wikimedia.org/ project. I had to admit
>> that I really didn't know of many tools off the top of my head that
>> relied on dumps. Most of the use cases I have heard about are for
>> research topics like looking at word frequencies and sentence
>> complexity, or machine learning things that consume some or all of the
>> wiki corpus.
>>
>> Do you run a tool that needs data from Dumps to do its job? I would
>> love to hear some stories about how this data helps folks advance the
>> work of the movement.
>>
>> Bryan
>> --
>> Bryan Davis                                        Wikimedia Foundation
>> Principal Software Engineer                               Boise, ID USA
>> [[m:User:BDavis_(WMF)]]                                      irc: bd808
>> _______________________________________________
>> Cloud mailing list -- cloud@lists.wikimedia.org
>> List information:
>> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>>
> _______________________________________________
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>


-- 
Amir (he/him)
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to