Actually, never mind. I reviewed the Java code behind it and it doesn't
support more items per page. It also gets slow when you look at later pages
(first few pages are in a warm cache and are fast).

I think my best bet is to just download the latest JSON dump from
https://www.wikidata.org/wiki/Wikidata:Database_download and parse it
myself.

Thanks again!

Huji

On Wed, Mar 14, 2018 at 12:12 PM, Huji Lee <huji.h...@gmail.com> wrote:

> Lucas,
>
> No I don't need the page_id. The other two are enough.
>
> Wikidata Query Service seems very slow (it'll take about one day of
> continuous querying to get all the data). Linked Data Fragments server
> seems faster, but I wish I knew how to make it return more than 100 results
> at a time. Do you?
>
> Thanks,
>
> Huji
>
> On Wed, Mar 14, 2018 at 7:00 AM, Lucas Werkmeister <
> lucas.werkmeis...@wikimedia.de> wrote:
>
>> Huji, do you need the page_id in the query results? Otherwise, I would
>> suggest using either the Wikidata Query Service, as Jaime suggested (though
>> I’d omit the LIMIT and OFFSET – I think it’s better to let the server send
>> you all the results at once) or the Linked Data Fragments server:
>> https://query.wikidata.org/bigdata/ldf?subject=&predicate=
>> http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP1566&object= (this URL
>> will return HTML, RDF XML, Turtle, LD-JSON, … depending on Accept header).
>>
>> Cheers,
>> Lucas
>>
>> 2018-03-14 1:03 GMT+01:00 Huji Lee <huji.h...@gmail.com>:
>>
>>> Thanks, Jaime, for your recommendation.
>>>
>>> If I understand the result of [1] correctly, there are around 3.5
>>> million pages with a GeoNames property specified on Wikidata. I'm sure some
>>> of them are redirects, or not cities, etc. But still, going through
>>> millions of pages through API calls of 1000 at a time is cumbersome and
>>> inefficient. (The example you gave takes 20 seconds to run; that would mean
>>> a total of 20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag
>>> or error).
>>>
>>> However, what you suggested gave me an idea: I can take a look at the
>>> code for the Api itself (I guess it is at [2]) and figure out how the query
>>> is written there, then try to write a similar query on my own. If I figure
>>> it out, I will report back here.
>>>
>>> Huji
>>>
>>> [1] https://quarry.wmflabs.org/query/25418
>>> [2] https://phabricator.wikimedia.org/diffusion/EWBA/browse/mast
>>> er/client/includes/Api/ApiPropsEntityUsage.php
>>>
>>> On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcre...@wikimedia.org>
>>> wrote:
>>>
>>>> I am not 100% sure there is a perfect way to do what you want by
>>>> querying the metadata databases (I assume that is what you mean with
>>>> query)- I don't think that data is metadata, but content itself, which is
>>>> not on the metadata databases.
>>>>
>>>> Calling the wikidata query service is probably what you want:
>>>>
>>>> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel
>>>> %20%3Fgeoname%0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%3F
>>>> geoname%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3Aser
>>>> viceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Ce
>>>> n%22%20%7D%0A%7D%0ALIMIT%201000%20OFFSET%201000>
>>>>
>>>> Note the LIMIT and OFFSET that will let you iterate over the dataset (a
>>>> where close would be faster).
>>>>
>>>> There is a way to get results, which is iterating over:
>>>> <https://www.wikidata.org/w/index.php?title=Special:WhatLink
>>>> sHere/Property:P1566&hidetrans=1&hideredirs=1>
>>>>
>>>> That is a standard mediawiki api query, you will also find this on the
>>>> pagelinks table, but you should check every page you get afterwards (by
>>>> retrieving its contents), as it could include false positives or be behind
>>>> on updates.
>>>>
>>>> On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.h...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I need help writing a query that I would like to run on the Clouds.
>>>>> The goal of the query is to retrieve the following information from
>>>>> wikidatawiki_p:
>>>>>
>>>>> * Find all pages that have a claim for the property P1566, for example
>>>>> see https://www.wikidata.org/wiki/Q2113430
>>>>> * Find out what is the value of their P1566 property (in this case,
>>>>> 18918)
>>>>>
>>>>> Output format should be like this:
>>>>>
>>>>> page_id       entity            property_value
>>>>> 2039804      Q2113430     18918
>>>>> ...
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Huji
>>>>>
>>>>> _______________________________________________
>>>>> Wikimedia Cloud Services mailing list
>>>>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>>>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jaime Crespo
>>>> <http://wikimedia.org>
>>>>
>>>> _______________________________________________
>>>> Wikimedia Cloud Services mailing list
>>>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia Cloud Services mailing list
>>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>
>>
>>
>>
>> --
>> Lucas Werkmeister
>> Software Developer (Intern)
>>
>> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
>> Phone: +49 (0)30 219 158 26-0
>> https://wikimedia.de
>>
>> Imagine a world, in which every single human being can freely share in
>> the sum of all knowledge. That‘s our commitment.
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
>> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
>> Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>
>
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to