Re: [Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-11-08 Thread Amirouche Boubekki
Hello Emmanuel, I figured I will make it work: I will translate the .zim files into an sqlite database with an ad-hoc python script that way I can add any metadata I need. ___ Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia

Re: [Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-11-08 Thread Amirouche Boubekki
Hello Samuel, Le mar. 3 nov. 2020 à 00:05, Samuel Klein a écrit : > > Hi Amirouche -- is this for an offline search? Would love to read more about > it. > The primary use-case is not offline search. I am working on a description of how it works, but it is not ready yet. That said with the "ca

Re: [Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-11-03 Thread Emmanuel Engelhart
Hi Amirouche On 01.11.20 08:52, Amirouche Boubekki wrote: > I am working on a search engine (unlike sphinx or elastic search, more > like bing or google), I was planning to use .zim files to feed the > index, the problem is there is no systematic way to find the original > URL of the documents. Y

Re: [Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-11-02 Thread Samuel Klein
Hi Amirouche -- is this for an offline search? Would love to read more about it. On Sun, Nov 1, 2020 at 6:36 AM Amirouche Boubekki < amirouche.boube...@gmail.com> wrote: > Hello, > > > I am working on a search engine (unlike sphinx or elastic search, more > like bing or google), I was planning t

Re: [Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-11-01 Thread Amirouche Boubekki
Hello, I am working on a search engine (unlike sphinx or elastic search, more like bing or google), I was planning to use .zim files to feed the index, the problem is there is no systematic way to find the original URL of the documents. I am wondering whether one of the following will be possibl

Re: [Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-10-29 Thread Asaf Bartov
oh neat! That does indeed open all kinds of interesting possibilities! :) A. Asaf Bartov (he/him/his) Senior Program Officer, Emerging Wikimedia Communities Wikimedia Foundation Imagine a world in which every single human being can freely share in the sum

[Offline-l] [NEW] first releases of openZIM tool warc2zim

2020-10-29 Thread Emmanuel Engelhart
Hi I'm very proud to announce the release of our new tool: warc2zim. Warc2zim is a command line tool for GNU/Linux and macOS which allows to convert a WARC file to a ZIM file. WARC being a widely used storage format of the archive world, warc2zim offers new opportunities to reuse WARC stored data