Hi Luis, This sounds great! Sourcing the images, and inserting them into the articles, sounds like the hard part. Finding the occupations with page views seems doable, and here's my attempt using Wikidata.
Here's a link for around 1200 occupations, that have en wikipedia articles, listed in reverse order of page view of the last month: https://pageviews.wmcloud.org/massviews/?platform=all-access&agent=user&source=pagepile&range=latest-20&sort=views&direction=1&view=list&target=62078 This was created by entering this SPARQL query from wikidata: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%0AWHERE%0A%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ12737077%0A%20%20%7B%20%3Farticle%20schema%3Aabout%20%3Fitem.%20%3Farticle%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cmul%2Cen%22.%20%7D%0A%7D Into page pile: https://pagepile.toolforge.org/api.php?id=62078&action=get_data&format=html And then viewing the results in MassViews, which is the original link. Hope this helps, -Travis On Thu, Nov 14, 2024 at 7:57 PM Luis Villa <[email protected]> wrote: > Hey! I mentioned this at the meeting tonight and I thought I'd share it - > and wondering if anyone here has thoughts on how to script this to make it > a little more systematic? > > My project was to improve the diversity of photos of careers, since NPOV > is slightly ambiguous there and we know there's impact for kids in terms of > representation. > > The basic strategy was a wikidata query on… jobs? careers? This was then > joined with pageview data, so that I could prioritize the pages by traffic. > Someone on twitter helped me find the right Wikidata items and construct > the query; sadly I can't find it in my notes, though. (I’ve been poking at > building a new one with the help of chatgpt but haven’t had much time for > it.) The output was a csv that I then jammed into Google Sheets to track > it, but presumably it wouldn't be that hard to regenerate the list > dynamically (and extend it beyond enwiki). > > I then simply did a lot of Flickr, IA, and usgov searches to find better > photos - not just women, also geographic/racial diversity. Some were pretty > easy (especially where the US government has many people in the named > career role) but others harder. As a general matter, I didn't start with > Commons; I mostly assumed I had to look off Commons first and then bring > the images to Commons, though that wasn't always true. > > Some example edits: > > - adding women, an African, and an Asian to “Presidents”: > > https://en.wikipedia.org/w/index.php?title=President_(government_title)&diff=prev&oldid=841458719 > - add a woman to "Sommelier": > > https://en.wikipedia.org/w/index.php?title=Sommelier&diff=prev&oldid=842546650 > - add an African man and Mexican group to “Chef”: > https://en.wikipedia.org/w/index.php?title=Chef&diff=prev&oldid=842550576 > - add a gender-diverse photo and black man to “System Administrator”; > if I recall correctly the black man was reverted but i didn’t fight it too > hard: > > https://en.wikipedia.org/w/index.php?title=System_administrator&diff=prev&oldid=840728240 > > I seem to recall that the attempt to diversify “Lawyer” was reverted, but > most stuck at least in the short term. > > Now that Wikidata has matured, and maybe more photos out there, it'd be > interesting to turn this into something more structured — eg, there's > obviously problems with relying on LLMs to do gender identification of > photos, but as a first pass to identify the most problematic pages? > > Anyway, throwing that out into the void- > Luis > _______________________________________________ > Wikimedia-SF mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
_______________________________________________ Wikimedia-SF mailing list -- [email protected] To unsubscribe send an email to [email protected]
