Actually, never mind. I reviewed the Java code behind it and it doesn't support more items per page. It also gets slow when you look at later pages (first few pages are in a warm cache and are fast).
I think my best bet is to just download the latest JSON dump from https://www.wikidata.org/wiki/Wikidata:Database_download and parse it myself. Thanks again! Huji On Wed, Mar 14, 2018 at 12:12 PM, Huji Lee <huji.h...@gmail.com> wrote: > Lucas, > > No I don't need the page_id. The other two are enough. > > Wikidata Query Service seems very slow (it'll take about one day of > continuous querying to get all the data). Linked Data Fragments server > seems faster, but I wish I knew how to make it return more than 100 results > at a time. Do you? > > Thanks, > > Huji > > On Wed, Mar 14, 2018 at 7:00 AM, Lucas Werkmeister < > lucas.werkmeis...@wikimedia.de> wrote: > >> Huji, do you need the page_id in the query results? Otherwise, I would >> suggest using either the Wikidata Query Service, as Jaime suggested (though >> I’d omit the LIMIT and OFFSET – I think it’s better to let the server send >> you all the results at once) or the Linked Data Fragments server: >> https://query.wikidata.org/bigdata/ldf?subject=&predicate= >> http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP1566&object= (this URL >> will return HTML, RDF XML, Turtle, LD-JSON, … depending on Accept header). >> >> Cheers, >> Lucas >> >> 2018-03-14 1:03 GMT+01:00 Huji Lee <huji.h...@gmail.com>: >> >>> Thanks, Jaime, for your recommendation. >>> >>> If I understand the result of [1] correctly, there are around 3.5 >>> million pages with a GeoNames property specified on Wikidata. I'm sure some >>> of them are redirects, or not cities, etc. But still, going through >>> millions of pages through API calls of 1000 at a time is cumbersome and >>> inefficient. (The example you gave takes 20 seconds to run; that would mean >>> a total of 20 * 3.5 * 1000 seconds which is like 19 hours, assuming no lag >>> or error). >>> >>> However, what you suggested gave me an idea: I can take a look at the >>> code for the Api itself (I guess it is at [2]) and figure out how the query >>> is written there, then try to write a similar query on my own. If I figure >>> it out, I will report back here. >>> >>> Huji >>> >>> [1] https://quarry.wmflabs.org/query/25418 >>> [2] https://phabricator.wikimedia.org/diffusion/EWBA/browse/mast >>> er/client/includes/Api/ApiPropsEntityUsage.php >>> >>> On Tue, Mar 13, 2018 at 5:39 AM, Jaime Crespo <jcre...@wikimedia.org> >>> wrote: >>> >>>> I am not 100% sure there is a perfect way to do what you want by >>>> querying the metadata databases (I assume that is what you mean with >>>> query)- I don't think that data is metadata, but content itself, which is >>>> not on the metadata databases. >>>> >>>> Calling the wikidata query service is probably what you want: >>>> >>>> <https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel >>>> %20%3Fgeoname%0AWHERE%20%7B%0A%09%3Fitem%20wdt%3AP1566%20%3F >>>> geoname%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3Aser >>>> viceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Ce >>>> n%22%20%7D%0A%7D%0ALIMIT%201000%20OFFSET%201000> >>>> >>>> Note the LIMIT and OFFSET that will let you iterate over the dataset (a >>>> where close would be faster). >>>> >>>> There is a way to get results, which is iterating over: >>>> <https://www.wikidata.org/w/index.php?title=Special:WhatLink >>>> sHere/Property:P1566&hidetrans=1&hideredirs=1> >>>> >>>> That is a standard mediawiki api query, you will also find this on the >>>> pagelinks table, but you should check every page you get afterwards (by >>>> retrieving its contents), as it could include false positives or be behind >>>> on updates. >>>> >>>> On Sun, Mar 11, 2018 at 3:44 PM, Huji Lee <huji.h...@gmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> I need help writing a query that I would like to run on the Clouds. >>>>> The goal of the query is to retrieve the following information from >>>>> wikidatawiki_p: >>>>> >>>>> * Find all pages that have a claim for the property P1566, for example >>>>> see https://www.wikidata.org/wiki/Q2113430 >>>>> * Find out what is the value of their P1566 property (in this case, >>>>> 18918) >>>>> >>>>> Output format should be like this: >>>>> >>>>> page_id entity property_value >>>>> 2039804 Q2113430 18918 >>>>> ... >>>>> >>>>> Thanks in advance, >>>>> >>>>> Huji >>>>> >>>>> _______________________________________________ >>>>> Wikimedia Cloud Services mailing list >>>>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >>>>> https://lists.wikimedia.org/mailman/listinfo/cloud >>>>> >>>> >>>> >>>> >>>> -- >>>> Jaime Crespo >>>> <http://wikimedia.org> >>>> >>>> _______________________________________________ >>>> Wikimedia Cloud Services mailing list >>>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >>>> https://lists.wikimedia.org/mailman/listinfo/cloud >>>> >>> >>> >>> _______________________________________________ >>> Wikimedia Cloud Services mailing list >>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >>> https://lists.wikimedia.org/mailman/listinfo/cloud >>> >> >> >> >> -- >> Lucas Werkmeister >> Software Developer (Intern) >> >> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin >> Phone: +49 (0)30 219 158 26-0 >> https://wikimedia.de >> >> Imagine a world, in which every single human being can freely share in >> the sum of all knowledge. That‘s our commitment. >> >> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. >> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter >> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für >> Körperschaften I Berlin, Steuernummer 27/029/42207. >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> > >
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud