Re: [Wikisource-l] The very first result of IA _abbyy.gz parsing & bot uploading into nsPage

Anika Born Mon, 16 Oct 2017 11:09:50 -0700

as aubrey: Thank you very much!

I shared these news at the Scriptorium of de.ws.


I also used the opportunity to inform them about your "Visualizzatore".
This is so cool!!!! (especially the search-function)

And because I had some time (and the best things come in threes) I invited
them to your it.WikiCon in Trento (
https://meta.wikimedia.org/wiki/ItWikiCon/2017/Proposte#Wikisource). Have
fun there! My best wishes to the organizers. I co-organized it three times
in a row for the all-German-Community....

https://de.wikisource.org/wiki/Wikisource:Skriptorium#Italien:_17._bis_19._November_WikiCon_in_Trient



Anika

2017-10-16 19:35 GMT+02:00 Andrea Zanni <[email protected]>:

> Thanks Alex!
> I really hope this is a direction where other developers will follow:
> being able to harness the full potential of structured data from OCR
> software is absolutely crucial for Wikisource:
> we could actually automatize *a lot* of the formatting work now done by
> volunteers, and their time could be spent still formatting, proofreading
> and validating, but with much power than before.
> IMO, it changes a lot if a book is formatted ~50% by a machine, we could
> do much more books in less time.
> Go Alex!
>
> Aubrey
>
> On Mon, Oct 16, 2017 at 5:42 PM, Asaf Bartov <[email protected]>
> wrote:
>
>> That's really promising!
>>
>> Thank you for sharing this.
>>
>>    A.
>>
>> On Oct 17, 2017 00:11, "Alex Brollo" <[email protected]> wrote:
>>
>>> Here:
>>> Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46
>>> <https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46>
>>> and immediately previous and following pages both the text and some
>>> formatting  from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz
>>> <https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz>
>>>  (in previous pages only some templates have been added and a little
>>> bit of regex manipulation has be done)
>>>
>>> Internet Archive _abbyy.gz files are gzipped, enormous xml files where
>>> any detail of FineReader OCR output is exported - but, even if enormous and
>>> terribly complex, they can be parsed and any detail (a little bit
>>> painfully...)  can be used; presently, only bold, italic,  smallcaps and
>>> paragraphs have been explored,  translated into wiki code by a prettily
>>> simple python code.
>>>
>>> Alex
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikisource-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>>
>>>
>> _______________________________________________
>> Wikisource-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>>
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] The very first result of IA _abbyy.gz parsing & bot uploading into nsPage

Reply via email to