Hi Arnaud,
What do you think to move bible-scraper from github repo to our gitlab
repo? I did this but not with the last commits. I make you dev on it.
https://gitlab.com/crosswire-bible-society/bible-scraper
Le 02/06/2024 à 11:46, Arnaud Vié a écrit :
Thank you both for your interest !
> What about commentary?
> https://www.awmi.net/reading/online-bible-commentary/
Not yet, I'm really focusing on bibles for the time being - that's a
lot of work already !
But nothing prevents adapting the solution to commentaries in the
future, I'll keep that idea in mind :-)
> If you want to use CzeBKR as your test case, I am ready to help
> you with any testing or Czech issues or whatever
Thanks a lot !
I've just pushed a scraper configuration for this bible :
https://github.com/UnasZole/bible-scraper/blob/master/src/main/resources/scrapers/GenericHtml/KralickaWikisource.yaml
Main books were easy to parse - deuterocanonical books extracted from
a different manuscript were a bit messier.
I made a few assumptions (I interpret italics in verse as translation
additions, and side notes in deuterocanonical books as section titles,
etc.)
Feel free to test it : after checking out and building the repository,
you should just need to run for example:
> ./run.sh scrape -s GenericHtml -i KralickaWikisource -b Ps -c 1 -w USFM
Cheers,
Arnaud
Le dim. 2 juin 2024 à 08:50, Matěj Cepl <mc...@cepl.eu> a écrit :
On Sun Jun 2, 2024 at 1:09 AM CEST, Arnaud Vié wrote:
> I'm open to any kind of feedback or suggestions of course !
> In particular :
>
> - if you have any specific website in mind that you would
like to be
> able to build sword modules from, let me know, we can try to
add it.
> (Currently I only included a few French websites, but I'm
interested to add
> some other languages).
Sword module CzeBKR is sourced from the Czech WikiSource [1]
and there seems to be the official way [2] how to get source
in some hopefully more useful formats (plain text, RTF, HTML,
EPubs). I was using my own home-grown Python script [3], but it
seems like with all web-scrapping scripts it rotten away (that
script is under some of kind of very free open source license,
let’s say MIT/X11 … I am going to add the proper LICENSE file
momentarily). It started at [4] (look at the source view), but it
doesn’t seem to be that useful anymore.
> - And if you are knowledgeable about the intellectual
property laws in
> other countries, I'm interested : currently, I've added a
section to the
> README explaining why the usage of the scraper on any public
website is
> allowed in France with references to the related texts, but
it would
> probably be useful to have similar information for users from
other
> countries.
I am absolutely certain, there are no problems with CzeBKR:
1. It is WikiSource, so we have somebody else to blame ;)
2. The original Bible of Kralice [5] is from the sixteenth
century and it is absolutely in the public domain.
3. Source for the WikiSource was a scan [6] of the book
from 1918, without any authors shown. The works of only
possible editor of that Bible I know about [7] (and he is
not shown on the title page, but he was working in the
early 20th century with the International Bible Society on
the revision of the Bible) are under the Bern Convention
(death in 1929 + 75 years) in the public domain as well.
4. We are in EU as well.
If you want to use CzeBKR as your test case, I am ready to help
you with any testing or Czech issues or whatever.
Blessed Sunday!
Matěj
[1] https://cs.wikisource.org/wiki/Bible_kralick%C3%A1_(1918)
[2]
https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29
<https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29>
[3]
https://gitlab.com/crosswire-bible-society/CzeBKR/-/blob/master/kralicka.py
[4]
https://cs.wikisource.org/wiki/Speci%C3%A1ln%C3%AD:Exportovat_str%C3%A1nky/Bible_kralick%C3%A1_(1918)
[5] https://en.wikipedia.org/wiki/Bible_of_Kralice
[6] http://archive.org/details/biblsvatanebvec00socigoog
[7] https://cs.wikipedia.org/wiki/Jan_Karafi%C3%A1t
--
http://matej.ceplovi.cz/blog/, @mcepl@floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8
The ratio of literacy to illiteracy is a constant, but nowadays
the illiterates can read.
-- Alberto Moravia
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
--
Vous aimez la Bible ? Vous êtes étudiant en théologie ? Utilisez
l'application libre Xiphos <https://xiphos.org/> ou Andbible
<https://andbible.github.io/> et accédez aux textes sources, à des
commentaires, des dictionnaires et beaucoup d'autres fonctionnalités...
Me contacter pour des traductions en français._______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page