Thanks, Jaime, for your recommendation.

If I understand the result of [1] correctly, there are around 3.5 million
pages with a GeoNames property specified on Wikidata. I'm sure some of them
are redirects, or not cities, etc. But still, going through millions of
pages through API calls of 1000 at a time is cumbersome and inefficient.
(The example you gave takes 20 seconds to run; that would mean a total of
20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag or error).

However, what you suggested gave me an idea: I can take a look at the code
for the Api itself (I guess it is at [2]) and figure out how the query is
written there, then try to write a similar query on my own. If I figure it
out, I will report back here.

Huji

[1] https://quarry.wmflabs.org/query/25418
[2]
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/client/includes/Api/ApiPropsEntityUsage.php

On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcre...@wikimedia.org> wrote:

> I am not 100% sure there is a perfect way to do what you want by querying
> the metadata databases (I assume that is what you mean with query)- I don't
> think that data is metadata, but content itself, which is not on the
> metadata databases.
>
> Calling the wikidata query service is probably what you want:
>
> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fgeoname%
> 0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%3Fgeoname%20.
> %0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%
> 3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22%20%7D%0A%
> 7D%0ALIMIT%201000%20OFFSET%201000>
>
> Note the LIMIT and OFFSET that will let you iterate over the dataset (a
> where close would be faster).
>
> There is a way to get results, which is iterating over:
> <https://www.wikidata.org/w/index.php?title=Special:
> WhatLinksHere/Property:P1566&hidetrans=1&hideredirs=1>
>
> That is a standard mediawiki api query, you will also find this on the
> pagelinks table, but you should check every page you get afterwards (by
> retrieving its contents), as it could include false positives or be behind
> on updates.
>
> On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.h...@gmail.com> wrote:
>
>> Hello,
>>
>> I need help writing a query that I would like to run on the Clouds. The
>> goal of the query is to retrieve the following information from
>> wikidatawiki_p:
>>
>> * Find all pages that have a claim for the property P1566, for example
>> see https://www.wikidata.org/wiki/Q2113430
>> * Find out what is the value of their P1566 property (in this case, 18918)
>>
>> Output format should be like this:
>>
>> page_id       entity            property_value
>> 2039804      Q2113430     18918
>> ...
>>
>> Thanks in advance,
>>
>> Huji
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
>
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to