Huji, do you need the page_id in the query results? Otherwise, I would
suggest using either the Wikidata Query Service, as Jaime suggested (though
I’d omit the LIMIT and OFFSET – I think it’s better to let the server send
you all the results at once) or the Linked Data Fragments server:
https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP1566&object=
(this URL will return HTML, RDF XML, Turtle, LD-JSON, … depending on Accept
header).

Cheers,
Lucas

2018-03-14 1:03 GMT+01:00 Huji Lee <huji.h...@gmail.com>:

> Thanks, Jaime, for your recommendation.
>
> If I understand the result of [1] correctly, there are around 3.5 million
> pages with a GeoNames property specified on Wikidata. I'm sure some of them
> are redirects, or not cities, etc. But still, going through millions of
> pages through API calls of 1000 at a time is cumbersome and inefficient.
> (The example you gave takes 20 seconds to run; that would mean a total of
> 20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag or error).
>
> However, what you suggested gave me an idea: I can take a look at the code
> for the Api itself (I guess it is at [2]) and figure out how the query is
> written there, then try to write a similar query on my own. If I figure it
> out, I will report back here.
>
> Huji
>
> [1] https://quarry.wmflabs.org/query/25418
> [2] https://phabricator.wikimedia.org/diffusion/EWBA/browse/
> master/client/includes/Api/ApiPropsEntityUsage.php
>
> On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcre...@wikimedia.org>
> wrote:
>
>> I am not 100% sure there is a perfect way to do what you want by querying
>> the metadata databases (I assume that is what you mean with query)- I don't
>> think that data is metadata, but content itself, which is not on the
>> metadata databases.
>>
>> Calling the wikidata query service is probably what you want:
>>
>> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel
>> %20%3Fgeoname%0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%
>> 3Fgeoname%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%
>> 3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%
>> 5D%2Cen%22%20%7D%0A%7D%0ALIMIT%201000%20OFFSET%201000>
>>
>> Note the LIMIT and OFFSET that will let you iterate over the dataset (a
>> where close would be faster).
>>
>> There is a way to get results, which is iterating over:
>> <https://www.wikidata.org/w/index.php?title=Special:WhatLink
>> sHere/Property:P1566&hidetrans=1&hideredirs=1>
>>
>> That is a standard mediawiki api query, you will also find this on the
>> pagelinks table, but you should check every page you get afterwards (by
>> retrieving its contents), as it could include false positives or be behind
>> on updates.
>>
>> On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.h...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I need help writing a query that I would like to run on the Clouds. The
>>> goal of the query is to retrieve the following information from
>>> wikidatawiki_p:
>>>
>>> * Find all pages that have a claim for the property P1566, for example
>>> see https://www.wikidata.org/wiki/Q2113430
>>> * Find out what is the value of their P1566 property (in this case,
>>> 18918)
>>>
>>> Output format should be like this:
>>>
>>> page_id       entity            property_value
>>> 2039804      Q2113430     18918
>>> ...
>>>
>>> Thanks in advance,
>>>
>>> Huji
>>>
>>> _______________________________________________
>>> Wikimedia Cloud Services mailing list
>>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>
>>
>>
>>
>> --
>> Jaime Crespo
>> <http://wikimedia.org>
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
>



-- 
Lucas Werkmeister
Software Developer (Intern)

Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de

Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to