Thanks, Jaime, for your recommendation. If I understand the result of [1] correctly, there are around 3.5 million pages with a GeoNames property specified on Wikidata. I'm sure some of them are redirects, or not cities, etc. But still, going through millions of pages through API calls of 1000 at a time is cumbersome and inefficient. (The example you gave takes 20 seconds to run; that would mean a total of 20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag or error).
However, what you suggested gave me an idea: I can take a look at the code for the Api itself (I guess it is at [2]) and figure out how the query is written there, then try to write a similar query on my own. If I figure it out, I will report back here. Huji [1] https://quarry.wmflabs.org/query/25418 [2] https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/client/includes/Api/ApiPropsEntityUsage.php On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcre...@wikimedia.org> wrote: > I am not 100% sure there is a perfect way to do what you want by querying > the metadata databases (I assume that is what you mean with query)- I don't > think that data is metadata, but content itself, which is not on the > metadata databases. > > Calling the wikidata query service is probably what you want: > > <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fgeoname% > 0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%3Fgeoname%20. > %0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase% > 3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22%20%7D%0A% > 7D%0ALIMIT%201000%20OFFSET%201000> > > Note the LIMIT and OFFSET that will let you iterate over the dataset (a > where close would be faster). > > There is a way to get results, which is iterating over: > <https://www.wikidata.org/w/index.php?title=Special: > WhatLinksHere/Property:P1566&hidetrans=1&hideredirs=1> > > That is a standard mediawiki api query, you will also find this on the > pagelinks table, but you should check every page you get afterwards (by > retrieving its contents), as it could include false positives or be behind > on updates. > > On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.h...@gmail.com> wrote: > >> Hello, >> >> I need help writing a query that I would like to run on the Clouds. The >> goal of the query is to retrieve the following information from >> wikidatawiki_p: >> >> * Find all pages that have a claim for the property P1566, for example >> see https://www.wikidata.org/wiki/Q2113430 >> * Find out what is the value of their P1566 property (in this case, 18918) >> >> Output format should be like this: >> >> page_id entity property_value >> 2039804 Q2113430 18918 >> ... >> >> Thanks in advance, >> >> Huji >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> > > > > -- > Jaime Crespo > <http://wikimedia.org> > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud >
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud