Hey. Hey, I'm really glad to see the OAI-PMH harvester debate going on for Koha. I think if we choose a good external harvester with support, we can save a lot of energy and resources to implement related activities in the system. Shoveling the logs is only part of the story. The easy part. Since the result of shoveling is a lot of records, most of the time we can't avoid post-processing, merging with the records in the local database. For example, if you need to update records from a source where there are millions of records, but there are hundreds of thousands in the local database. Only a slice of that huge amount is relevant. If we design the processing workflow wrong, it will take unnecessarily long and burn valuable resources. I would hereby like to invite us to be in touch, to debate and share our experiences. Let's get this area moving towards a successful finish.
Take care. Michal út 22. 11. 2022 v 15:13 odesílatel BOUIS Sonia <sonia.bo...@univ-lyon3.fr> napsal: > Hi, > Thanks to David, Tomas, Michal and Michael for your replies. > > So we have decided to evaluate several external OAI-PMH client that could > be used by Koha and to choose one in the end of January > There a lot to do after that and we discussed about the background jobs > and cronjobs seems to be appropriate. We thought that the settings in the > koha intranet should be only to define URLs, SETs, or XSLT sheets (for > example, to transform DC XML in MARCXML). > > We are only at the begining of the process 😊 > > Kind regards, > Sonia > > ------------------------------ > > Message: 2 > Date: Wed, 26 Oct 2022 10:37:49 +1100 > From: "David Cook" <dc...@prosentient.com.au> > To: "'Tomas Cohen Arazi'" <tomasco...@gmail.com>, "'BOUIS Sonia'" > <sonia.bo...@univ-lyon3.fr> > Cc: "'koha'" <koha@lists.katipo.co.nz>, "'koha-devel'" > <koha-de...@lists.koha-community.org> > Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester > Message-ID: <07af01d8e8ca$dfbddef0$9f399cd0$@prosentient.com.au> > Content-Type: text/plain; charset="utf-8" > > Hi Sonia, > > > > I’m excited to hear that KohaLA would like to finance an OAI-PMH client in > Koha! This functionality is always brewing in the back of my mind, since I > first raised 10662 back in 2013. > > > > As Tomas says, I think that the background jobs are a key component for > processing incoming OAI-PMH records. > > > > However, the ***missing component right now is the scheduling of the > OAI-PMH harvesting tasks***, and I think this is where opinions get > divided. Below, I’ll provide some history and opinions on Koha OAI-PMH. > > > > -- > > > > With 10662, the sponsored goal was for Koha library staff to schedule > OAI-PMH harvests through the Web UI. However, Fridolin from BibLibre raised > a point with me at Kohacon18 about how letting library staff control the > timing of harvesting tasks could be a problem for support vendors. If too > many libraries using the same public IP address tried to harvest from the > same OAI-PMH repository, they could be rate limited or blocked. There could > also be server load concerns. So there probably needs to be a balance > between user configuration and system configuration. If I recall correctly, > this is how DSpace’s OAI-PMH harvester works. Users set up targets and can > start/stop harvests, but things like frequency and concurrency are handled > by the system configuration. > > > > Based on my experience working on OAI-PMH on and off for nearly 10 years > and as a Koha support vendor, I think my preference would be for sysadmins > to handle most of the OAI-PMH harvesting details. > > > > The sponsorship for 10662 had certain requirements that many other > libraries might not have, which is what made me think that it might be > better to have an external client that connects to Koha. I thought maybe I > could get the ordinary requirements pushed into Koha, and then handle > extraordinary requirements externally. However, an external harvester won’t > perform as fast as an internal harvester. (The compromise would be to write > the harvester in such a way that people could provide different OAI-PMH > harvester Perl modules that all stage records using the same core Koha > modules.) > > > > Even then… the scheduling would depend on a library’s needs. Back in 2013, > I had a Koha OAI-PMH harvester which worked as a cronjob. It would run each > night. However, some libraries want to run OAI-PMH harvests as frequently > as every 3 seconds. A cronjob’s smallest frequency is 60 seconds, so that > wouldn’t work for that requirement. > > > > If a cronjob isn’t suitable, then I think you’d need a daemon created by a > new command like “koha-oai --start <instance_name>”. It could read a > configuration file and handle scheduling accordingly. With 10662, I used > the POE module, because I knew it well and it has some timer tools for > scheduling tasks. If I were to work on it again, I’d probably use > Mojo::IOLoop instead these days, since Mojolicious is already part of Koha > while POE is not. (That said, using modules like Mojo and POE are > difficult, because they’re difficult to test using automation. That was one > of the stumbling blocks with 10662. While the 10662 harvester worked very > well, it was difficult to unit test. In hindsight, I should’ve written it > in a way that was easier to unit test, but it had a lot of event-driven > code which made things more difficult.) > > > > Another option would be to create a generic daemon for task scheduling in > general (e.g. “koha-schedule”). Koha could use this for many things, but > it’s a project in itself. > > > > -- > > > > The process of downloading OAI-PMH records and importing MARCXML into Koha > is actually a fairly straightforward process. The difficulty is the task > scheduling and management of tasks (and unit testing). > > > > I don’t know the answer that will make everyone happy. There’s lots of > different ways of managing and scheduling the tasks. Based on my > experience, I’d suggest targeting the simplest approach first, because > complexity will make it less likely for the project to succeed. > > > > On that note, I’d be happy to test/QA any OAI-PMH harvester put forward. > When I was writing OAI-PMH harvester patches, I found it really hard to get > QA, so I’m happy to be that resource for someone else. I’ve spent a lot of > time thinking about this topic, so happy to provide advice, warnings, > emotional support 😉. > > > > David Cook > > Senior Software Engineer > > Prosentient Systems > > Suite 7.03 > > 6a Glen St > > Milsons Point NSW 2061 > > Australia > > > > Office: 02 9212 0899 > > Online: 02 8005 0595 > > > > From: Koha-devel <koha-devel-boun...@lists.koha-community.org> On Behalf > Of Tomas Cohen Arazi > Sent: Wednesday, 26 October 2022 3:46 AM > To: BOUIS Sonia <sonia.bo...@univ-lyon3.fr> > Cc: koha <koha@lists.katipo.co.nz>; koha-devel < > koha-de...@lists.koha-community.org> > Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester > > > > I think with background jobs we have most of the framework that is needed > to deal with this within Koha. > > > > Best regards > > > > El mar, 25 oct 2022 7:08, BOUIS Sonia <sonia.bo...@univ-lyon3.fr <mailto: > sonia.bo...@univ-lyon3.fr> > escribió: > > Hi, > KohaLA would like to finance an OAI-PMH client in Koha but, we have > questions that we want to raise to the community. > There was already tries to propose an OAI-PMH client : > - https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 : it's > an old project that doesnt seem compatible with the current version of Koha > - https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 : the > scope is more to use an external OAI-PMH client and to connect it to Koha > > Our main question is about the way to handle this. Do you think that it's > a better idea to use an external software or PERL routine and to find a way > to connect it to Koha. Or would it be better to a new module in Koha from > scratch and that Koha have his own OAI-PMH client. > > Please, let us hear your toughts about this projet. > > Kind regards > > Sonia > > Sonia BOUIS > ------------------------------------------------------ > Responsable du Service informatique documentaire Département d'Appui à la > Recherche et aux Projets (DARP) Bibliothèques universitaires Université > Jean Moulin Lyon 3 ADRESSE GÉOGRAPHIQUE > Manufacture des Tabacs | 6 cours > Albert Thomas | LYON 8e ADRESSE POSTALE > Bibliothèque de la Manufacture | > 1C avenue des Frères Lumière | CS 78242 - 69372 LYON CEDEX 08 > > Ligne directe : 33 (0)4 78 78 79 03 > > http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous > Facebook< > https://www.facebook.com/bulyon3/> | Twitter<https://twitter.com/bulyon3>| > Instagram<https://www.instagram.com/bu.lyon3/?hl=fr> > > _______________________________________________ > > Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz > <mailto:Koha@lists.katipo.co.nz> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.koha-community.org/pipermail/koha-devel/attachments/20221026/d7712779/attachment-0001.htm > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Koha-devel mailing list > koha-de...@lists.koha-community.org > https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel > website : https://www.koha-community.org/ git : > https://git.koha-community.org/ bugs : https://bugs.koha-community.org/ > > > ------------------------------ > > End of Koha-devel Digest, Vol 203, Issue 15 > ******************************************* > _______________________________________________ > > Koha mailing list http://koha-community.org > Koha@lists.katipo.co.nz > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha > _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha