Huji, do you need the page_id in the query results? Otherwise, I would suggest using either the Wikidata Query Service, as Jaime suggested (though I’d omit the LIMIT and OFFSET – I think it’s better to let the server send you all the results at once) or the Linked Data Fragments server: https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP1566&object= (this URL will return HTML, RDF XML, Turtle, LD-JSON, … depending on Accept header).
Cheers, Lucas 2018-03-14 1:03 GMT+01:00 Huji Lee <huji.h...@gmail.com>: > Thanks, Jaime, for your recommendation. > > If I understand the result of [1] correctly, there are around 3.5 million > pages with a GeoNames property specified on Wikidata. I'm sure some of them > are redirects, or not cities, etc. But still, going through millions of > pages through API calls of 1000 at a time is cumbersome and inefficient. > (The example you gave takes 20 seconds to run; that would mean a total of > 20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag or error). > > However, what you suggested gave me an idea: I can take a look at the code > for the Api itself (I guess it is at [2]) and figure out how the query is > written there, then try to write a similar query on my own. If I figure it > out, I will report back here. > > Huji > > [1] https://quarry.wmflabs.org/query/25418 > [2] https://phabricator.wikimedia.org/diffusion/EWBA/browse/ > master/client/includes/Api/ApiPropsEntityUsage.php > > On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcre...@wikimedia.org> > wrote: > >> I am not 100% sure there is a perfect way to do what you want by querying >> the metadata databases (I assume that is what you mean with query)- I don't >> think that data is metadata, but content itself, which is not on the >> metadata databases. >> >> Calling the wikidata query service is probably what you want: >> >> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel >> %20%3Fgeoname%0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20% >> 3Fgeoname%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd% >> 3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE% >> 5D%2Cen%22%20%7D%0A%7D%0ALIMIT%201000%20OFFSET%201000> >> >> Note the LIMIT and OFFSET that will let you iterate over the dataset (a >> where close would be faster). >> >> There is a way to get results, which is iterating over: >> <https://www.wikidata.org/w/index.php?title=Special:WhatLink >> sHere/Property:P1566&hidetrans=1&hideredirs=1> >> >> That is a standard mediawiki api query, you will also find this on the >> pagelinks table, but you should check every page you get afterwards (by >> retrieving its contents), as it could include false positives or be behind >> on updates. >> >> On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.h...@gmail.com> wrote: >> >>> Hello, >>> >>> I need help writing a query that I would like to run on the Clouds. The >>> goal of the query is to retrieve the following information from >>> wikidatawiki_p: >>> >>> * Find all pages that have a claim for the property P1566, for example >>> see https://www.wikidata.org/wiki/Q2113430 >>> * Find out what is the value of their P1566 property (in this case, >>> 18918) >>> >>> Output format should be like this: >>> >>> page_id entity property_value >>> 2039804 Q2113430 18918 >>> ... >>> >>> Thanks in advance, >>> >>> Huji >>> >>> _______________________________________________ >>> Wikimedia Cloud Services mailing list >>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >>> https://lists.wikimedia.org/mailman/listinfo/cloud >>> >> >> >> >> -- >> Jaime Crespo >> <http://wikimedia.org> >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> > > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud > -- Lucas Werkmeister Software Developer (Intern) Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud