On 11.07.22 22:37, Asaf Bartov wrote:
Yes, it sounds like the missing link here is a tool for creating the list of resources to offline.  Z made one particular specification, but I suppose it could be made a little more general, and potentially even leverage some existing general-purpose pageset curation tools, such as PetScan.  AFAIK, PetScan currently doesn't support the use-case of "get me N levels of pages linked from this first page (or from these P first pages)", but we can imagine (and advocate for) PetScan supporting it at some point.

Then, a PetScan query ID (which is enough to generate the page-set) can be an input to the Wikipedia-on-Demand tool, the problem is solved. (Well, almost: we'd still need to specify the logic for collecting related resources -- i.e. none/some/all images included in the pages, Wikidata items, etc.)

At a high level, this is indeed the kind of challenge we face now. This is a lack of tool which has been identified already a long time ago at Kiwix. I believe we are now ready to move forward on this because the underlying software pieces are ready.

The overall strategy is to extend wp1.openzim.org (API) to allow to implement sophisticated selection modules. So far, how these modules will look like at the end is really open and all the ideas are welcome (please open tickets at https://github.com/openzim/wp1). Collaborating/Relying with/on PetScan is an idea which should be assessed.

Once a selection done, our Zimfarm infrastructure is ready to build the snapshots (ZIM files). We will probably have to build (a) dedicated frontend(s) to bring these two tools together in a user friendly manner.

This is the goal of the Wikipeda-on-Demand project (WMCH granted) https://meta.wikimedia.org/wiki/Kiwix/Wikipedia_on_demand we have started to work on. We have an other project (related to the war in Ukraine) in the pipe which should even extend this tool.

Kelson

--
Kiwix - Wikipedia Offline & more
* Web: https://kiwix.org/
* Twitter: https://twitter.com/KiwixOffline
* Wiki: https://wiki.kiwix.org/

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
Offline-l mailing list -- offline-l@lists.wikimedia.org
To unsubscribe send an email to offline-l-le...@lists.wikimedia.org

Reply via email to