Hi Arnaud,
What do you think to move bible-scraper from github repo to our gitlab repo? I did this but not with the last commits. I make you dev on it. https://gitlab.com/crosswire-bible-society/bible-scraper

Le 02/06/2024 à 11:46, Arnaud Vié a écrit :
Thank you both for your interest !

> What about commentary?
> https://www.awmi.net/reading/online-bible-commentary/

Not yet, I'm really focusing on bibles for the time being - that's a lot of work already ! But nothing prevents adapting the solution to commentaries in the future, I'll keep that idea in mind :-)

> If you want to use CzeBKR as your test case, I am ready to help
> you with any testing or Czech issues or whatever

Thanks a lot !
I've just pushed a scraper configuration for this bible : https://github.com/UnasZole/bible-scraper/blob/master/src/main/resources/scrapers/GenericHtml/KralickaWikisource.yaml Main books were easy to parse - deuterocanonical books extracted from a different manuscript were a bit messier. I made a few assumptions (I interpret italics in verse as translation additions, and side notes in deuterocanonical books as section titles, etc.) Feel free to test it : after checking out and building the repository, you should just need to run for example:

> ./run.sh scrape -s GenericHtml -i KralickaWikisource -b Ps -c 1 -w USFM

Cheers,

Arnaud

Le dim. 2 juin 2024 à 08:50, Matěj Cepl <mc...@cepl.eu> a écrit :

    On Sun Jun 2, 2024 at 1:09 AM CEST, Arnaud Vié wrote:
    > I'm open to any kind of feedback or suggestions of course !
    > In particular :
    >
    >    - if you have any specific website in mind that you would
    like to be
    >    able to build sword modules from, let me know, we can try to
    add it.
    >    (Currently I only included a few French websites, but I'm
    interested to add
    >    some other languages).

    Sword module CzeBKR is sourced from the Czech WikiSource [1]
    and there seems to be the official way [2] how to get source
    in some hopefully more useful formats (plain text, RTF, HTML,
    EPubs). I was using my own home-grown Python script [3], but it
    seems like with all web-scrapping scripts it rotten away (that
    script is under some of kind of very free open source license,
    let’s say MIT/X11 … I am going to add the proper LICENSE file
    momentarily). It started at [4] (look at the source view), but it
    doesn’t seem to be that useful anymore.

    >    - And if you are knowledgeable about the intellectual
    property laws in
    >    other countries, I'm interested : currently, I've added a
    section to the
    >    README explaining why the usage of the scraper on any public
    website is
    >    allowed in France with references to the related texts, but
    it would
    >    probably be useful to have similar information for users from
    other
    >    countries.

    I am absolutely certain, there are no problems with CzeBKR:

        1. It is WikiSource, so we have somebody else to blame ;)
        2. The original Bible of Kralice [5] is from the sixteenth
           century and it is absolutely in the public domain.
        3. Source for the WikiSource was a scan [6] of the book
           from 1918, without any authors shown. The works of only
           possible editor of that Bible I know about [7] (and he is
           not shown on the title page, but he was working in the
           early 20th century with the International Bible Society on
           the revision of the Bible) are under the Bern Convention
           (death in 1929 + 75 years) in the public domain as well.
        4. We are in EU as well.

    If you want to use CzeBKR as your test case, I am ready to help
    you with any testing or Czech issues or whatever.

    Blessed Sunday!

    Matěj

    [1] https://cs.wikisource.org/wiki/Bible_kralick%C3%A1_(1918)
    [2]
    https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29
    
<https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29>
    [3]
    https://gitlab.com/crosswire-bible-society/CzeBKR/-/blob/master/kralicka.py
    [4]
    
https://cs.wikisource.org/wiki/Speci%C3%A1ln%C3%AD:Exportovat_str%C3%A1nky/Bible_kralick%C3%A1_(1918)
    [5] https://en.wikipedia.org/wiki/Bible_of_Kralice
    [6] http://archive.org/details/biblsvatanebvec00socigoog
    [7] https://cs.wikipedia.org/wiki/Jan_Karafi%C3%A1t
-- http://matej.ceplovi.cz/blog/, @mcepl@floss.social
    GPG Finger: 3C76 A027 CA45 AD70 98B5  BC1D 7920 5802 880B C9D8

    The ratio of literacy to illiteracy is a constant, but nowadays
    the illiterates can read.
        -- Alberto Moravia

    _______________________________________________
    sword-devel mailing list: sword-devel@crosswire.org
    http://crosswire.org/mailman/listinfo/sword-devel
    Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

--
Vous aimez la Bible ? Vous êtes étudiant en théologie ? Utilisez l'application libre Xiphos <https://xiphos.org/> ou Andbible <https://andbible.github.io/> et accédez aux textes sources, à des commentaires, des dictionnaires et beaucoup d'autres fonctionnalités... Me contacter pour des traductions en français.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to