Hello there,
If I may suggest a good harvester library, Catmandu may do the job pretty well. I've not used the OAI module but used it to harvest from a JSON source and transform to an UNIMARC file with pretty good success so far.
It can export seamlessly to iso2709 or marcxml.
https://metacpan.org/dist/Catmandu-OAI

Best,
Arthur

On 2022-11-22 15:57, Mike D. wrote:
Hey. Hey,
I'm really glad to see the OAI-PMH harvester debate going on for Koha. I think if we choose a good external harvester with support, we can save a lot of energy and resources to implement related activities in the system.
Shoveling the logs is only part of the story. The easy part. Since the
result of shoveling is a lot of records, most of the time we can't avoid
post-processing, merging with the records in the local database. For
example, if you need to update records from a source where there are
millions of records, but there are hundreds of thousands in the local
database. Only a slice of that huge amount is relevant. If we design the
processing workflow wrong, it will take unnecessarily long and burn
valuable resources.
I would hereby like to invite us to be in touch, to debate and share our
experiences. Let's get this area moving towards a successful finish.

Take care.

Michal

út 22. 11. 2022 v 15:13 odesílatel BOUIS Sonia <[email protected]>
napsal:

Hi,
Thanks to David, Tomas, Michal and Michael for your replies.

So we have decided to evaluate several external OAI-PMH client that could
be used by Koha and to choose one in the end of January
There a lot to do after that and we discussed about the background jobs and cronjobs seems to be appropriate. We thought that the settings in the
koha intranet should be only to define URLs, SETs, or XSLT sheets (for
example, to transform DC XML in MARCXML).

We are only at the begining of the process 😊

Kind regards,
Sonia

------------------------------

Message: 2
Date: Wed, 26 Oct 2022 10:37:49 +1100
From: "David Cook" <[email protected]>
To: "'Tomas Cohen Arazi'" <[email protected]>, "'BOUIS Sonia'"
        <[email protected]>
Cc: "'koha'" <[email protected]>, "'koha-devel'"
        <[email protected]>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"

Hi Sonia,



I’m excited to hear that KohaLA would like to finance an OAI-PMH client in Koha! This functionality is always brewing in the back of my mind, since I
first raised 10662 back in 2013.



As Tomas says, I think that the background jobs are a key component for
processing incoming OAI-PMH records.



However, the ***missing component right now is the scheduling of the
OAI-PMH harvesting tasks***, and I think this is where opinions get
divided. Below, I’ll provide some history and opinions on Koha OAI-PMH.



--



With 10662, the sponsored goal was for Koha library staff to schedule
OAI-PMH harvests through the Web UI. However, Fridolin from BibLibre raised a point with me at Kohacon18 about how letting library staff control the timing of harvesting tasks could be a problem for support vendors. If too many libraries using the same public IP address tried to harvest from the same OAI-PMH repository, they could be rate limited or blocked. There could
also be server load concerns. So there probably needs to be a balance
between user configuration and system configuration. If I recall correctly, this is how DSpace’s OAI-PMH harvester works. Users set up targets and can start/stop harvests, but things like frequency and concurrency are handled
by the system configuration.



Based on my experience working on OAI-PMH on and off for nearly 10 years and as a Koha support vendor, I think my preference would be for sysadmins
to handle most of the OAI-PMH harvesting details.



The sponsorship for 10662 had certain requirements that many other
libraries might not have, which is what made me think that it might be
better to have an external client that connects to Koha. I thought maybe I
could get the ordinary requirements pushed into Koha, and then handle
extraordinary requirements externally. However, an external harvester won’t perform as fast as an internal harvester. (The compromise would be to write the harvester in such a way that people could provide different OAI-PMH
harvester Perl modules that all stage records using the same core Koha
modules.)



Even then… the scheduling would depend on a library’s needs. Back in 2013, I had a Koha OAI-PMH harvester which worked as a cronjob. It would run each night. However, some libraries want to run OAI-PMH harvests as frequently as every 3 seconds. A cronjob’s smallest frequency is 60 seconds, so that
wouldn’t work for that requirement.



If a cronjob isn’t suitable, then I think you’d need a daemon created by a
new command like “koha-oai --start <instance_name>”. It could read a
configuration file and handle scheduling accordingly. With 10662, I used
the POE module, because I knew it well and it has some timer tools for
scheduling tasks. If I were to work on it again, I’d probably use
Mojo::IOLoop instead these days, since Mojolicious is already part of Koha
while POE is not. (That said, using modules like Mojo and POE are
difficult, because they’re difficult to test using automation. That was one of the stumbling blocks with 10662. While the 10662 harvester worked very well, it was difficult to unit test. In hindsight, I should’ve written it in a way that was easier to unit test, but it had a lot of event-driven
code which made things more difficult.)



Another option would be to create a generic daemon for task scheduling in general (e.g. “koha-schedule”). Koha could use this for many things, but
it’s a project in itself.



--



The process of downloading OAI-PMH records and importing MARCXML into Koha is actually a fairly straightforward process. The difficulty is the task
scheduling and management of tasks (and unit testing).



I don’t know the answer that will make everyone happy. There’s lots of
different ways of managing and scheduling the tasks. Based on my
experience, I’d suggest targeting the simplest approach first, because
complexity will make it less likely for the project to succeed.



On that note, I’d be happy to test/QA any OAI-PMH harvester put forward. When I was writing OAI-PMH harvester patches, I found it really hard to get QA, so I’m happy to be that resource for someone else. I’ve spent a lot of
time thinking about this topic, so happy to provide advice, warnings,
emotional support 😉.



David Cook

Senior Software Engineer

Prosentient Systems

Suite 7.03

6a Glen St

Milsons Point NSW 2061

Australia



Office: 02 9212 0899

Online: 02 8005 0595



From: Koha-devel <[email protected]> On Behalf
Of Tomas Cohen Arazi
Sent: Wednesday, 26 October 2022 3:46 AM
To: BOUIS Sonia <[email protected]>
Cc: koha <[email protected]>; koha-devel <
[email protected]>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester



I think with background jobs we have most of the framework that is needed
to deal with this within Koha.



Best regards



El mar, 25 oct 2022 7:08, BOUIS Sonia <[email protected] <mailto:
[email protected]> > escribió:

Hi,
KohaLA would like to finance an OAI-PMH client in Koha but, we have
questions that we want to raise to the community.
There was already tries to propose an OAI-PMH client :
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 : it's an old project that doesnt seem compatible with the current version of Koha - https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 : the scope is more to use an external OAI-PMH client and to connect it to Koha

Our main question is about the way to handle this. Do you think that it's a better idea to use an external software or PERL routine and to find a way to connect it to Koha. Or would it be better to a new module in Koha from
scratch and that Koha have his own OAI-PMH client.

Please, let us hear your toughts about this projet.

Kind regards

Sonia

Sonia BOUIS
------------------------------------------------------
Responsable du Service informatique documentaire Département d'Appui à la Recherche et aux Projets (DARP) Bibliothèques universitaires Université Jean Moulin Lyon 3 ADRESSE GÉOGRAPHIQUE > Manufacture des Tabacs | 6 cours Albert Thomas | LYON 8e ADRESSE POSTALE > Bibliothèque de la Manufacture |
1C avenue des Frères Lumière | CS 78242 - 69372 LYON CEDEX 08

Ligne directe : 33 (0)4 78 78 79 03

http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous > Facebook< https://www.facebook.com/bulyon3/> | Twitter<https://twitter.com/bulyon3>|
Instagram<https://www.instagram.com/bu.lyon3/?hl=fr>

_______________________________________________

Koha mailing list  http://koha-community.org [email protected]
<mailto:[email protected]>
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://lists.koha-community.org/pipermail/koha-devel/attachments/20221026/d7712779/attachment-0001.htm
>

------------------------------

Subject: Digest Footer

_______________________________________________
Koha-devel mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/ git :
https://git.koha-community.org/ bugs : https://bugs.koha-community.org/


------------------------------

End of Koha-devel Digest, Vol 203, Issue 15
*******************************************
_______________________________________________

Koha mailing list  http://koha-community.org
[email protected]
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

_______________________________________________

Koha mailing list  http://koha-community.org
[email protected]
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

--
Arthur Suzuki, 🌈🏔️
Développeur @BibLibre
_______________________________________________
Koha-devel mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

Reply via email to