Hi,
Thanks to David, Tomas, Michal and Michael for your replies.
So we have decided to evaluate several external OAI-PMH client that
could
be used by Koha and to choose one in the end of January
There a lot to do after that and we discussed about the background
jobs
and cronjobs seems to be appropriate. We thought that the settings in
the
koha intranet should be only to define URLs, SETs, or XSLT sheets (for
example, to transform DC XML in MARCXML).
We are only at the begining of the process 😊
Kind regards,
Sonia
------------------------------
Message: 2
Date: Wed, 26 Oct 2022 10:37:49 +1100
From: "David Cook" <[email protected]>
To: "'Tomas Cohen Arazi'" <[email protected]>, "'BOUIS Sonia'"
<[email protected]>
Cc: "'koha'" <[email protected]>, "'koha-devel'"
<[email protected]>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
Hi Sonia,
I’m excited to hear that KohaLA would like to finance an OAI-PMH
client in
Koha! This functionality is always brewing in the back of my mind,
since I
first raised 10662 back in 2013.
As Tomas says, I think that the background jobs are a key component
for
processing incoming OAI-PMH records.
However, the ***missing component right now is the scheduling of the
OAI-PMH harvesting tasks***, and I think this is where opinions get
divided. Below, I’ll provide some history and opinions on Koha
OAI-PMH.
--
With 10662, the sponsored goal was for Koha library staff to schedule
OAI-PMH harvests through the Web UI. However, Fridolin from BibLibre
raised
a point with me at Kohacon18 about how letting library staff control
the
timing of harvesting tasks could be a problem for support vendors. If
too
many libraries using the same public IP address tried to harvest from
the
same OAI-PMH repository, they could be rate limited or blocked. There
could
also be server load concerns. So there probably needs to be a balance
between user configuration and system configuration. If I recall
correctly,
this is how DSpace’s OAI-PMH harvester works. Users set up targets and
can
start/stop harvests, but things like frequency and concurrency are
handled
by the system configuration.
Based on my experience working on OAI-PMH on and off for nearly 10
years
and as a Koha support vendor, I think my preference would be for
sysadmins
to handle most of the OAI-PMH harvesting details.
The sponsorship for 10662 had certain requirements that many other
libraries might not have, which is what made me think that it might be
better to have an external client that connects to Koha. I thought
maybe I
could get the ordinary requirements pushed into Koha, and then handle
extraordinary requirements externally. However, an external harvester
won’t
perform as fast as an internal harvester. (The compromise would be to
write
the harvester in such a way that people could provide different
OAI-PMH
harvester Perl modules that all stage records using the same core Koha
modules.)
Even then… the scheduling would depend on a library’s needs. Back in
2013,
I had a Koha OAI-PMH harvester which worked as a cronjob. It would run
each
night. However, some libraries want to run OAI-PMH harvests as
frequently
as every 3 seconds. A cronjob’s smallest frequency is 60 seconds, so
that
wouldn’t work for that requirement.
If a cronjob isn’t suitable, then I think you’d need a daemon created
by a
new command like “koha-oai --start <instance_name>”. It could read a
configuration file and handle scheduling accordingly. With 10662, I
used
the POE module, because I knew it well and it has some timer tools for
scheduling tasks. If I were to work on it again, I’d probably use
Mojo::IOLoop instead these days, since Mojolicious is already part of
Koha
while POE is not. (That said, using modules like Mojo and POE are
difficult, because they’re difficult to test using automation. That
was one
of the stumbling blocks with 10662. While the 10662 harvester worked
very
well, it was difficult to unit test. In hindsight, I should’ve written
it
in a way that was easier to unit test, but it had a lot of
event-driven
code which made things more difficult.)
Another option would be to create a generic daemon for task scheduling
in
general (e.g. “koha-schedule”). Koha could use this for many things,
but
it’s a project in itself.
--
The process of downloading OAI-PMH records and importing MARCXML into
Koha
is actually a fairly straightforward process. The difficulty is the
task
scheduling and management of tasks (and unit testing).
I don’t know the answer that will make everyone happy. There’s lots of
different ways of managing and scheduling the tasks. Based on my
experience, I’d suggest targeting the simplest approach first, because
complexity will make it less likely for the project to succeed.
On that note, I’d be happy to test/QA any OAI-PMH harvester put
forward.
When I was writing OAI-PMH harvester patches, I found it really hard
to get
QA, so I’m happy to be that resource for someone else. I’ve spent a
lot of
time thinking about this topic, so happy to provide advice, warnings,
emotional support 😉.
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
From: Koha-devel <[email protected]> On
Behalf
Of Tomas Cohen Arazi
Sent: Wednesday, 26 October 2022 3:46 AM
To: BOUIS Sonia <[email protected]>
Cc: koha <[email protected]>; koha-devel <
[email protected]>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester
I think with background jobs we have most of the framework that is
needed
to deal with this within Koha.
Best regards
El mar, 25 oct 2022 7:08, BOUIS Sonia <[email protected]
<mailto:
[email protected]> > escribió:
Hi,
KohaLA would like to finance an OAI-PMH client in Koha but, we have
questions that we want to raise to the community.
There was already tries to propose an OAI-PMH client :
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 :
it's
an old project that doesnt seem compatible with the current version of
Koha
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 :
the
scope is more to use an external OAI-PMH client and to connect it to
Koha
Our main question is about the way to handle this. Do you think that
it's
a better idea to use an external software or PERL routine and to find
a way
to connect it to Koha. Or would it be better to a new module in Koha
from
scratch and that Koha have his own OAI-PMH client.
Please, let us hear your toughts about this projet.
Kind regards
Sonia
Sonia BOUIS
------------------------------------------------------
Responsable du Service informatique documentaire Département d'Appui à
la
Recherche et aux Projets (DARP) Bibliothèques universitaires
Université
Jean Moulin Lyon 3 ADRESSE GÉOGRAPHIQUE > Manufacture des Tabacs | 6
cours
Albert Thomas | LYON 8e ADRESSE POSTALE > Bibliothèque de la
Manufacture |
1C avenue des Frères Lumière | CS 78242 - 69372 LYON CEDEX 08
Ligne directe : 33 (0)4 78 78 79 03
http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous >
Facebook<
https://www.facebook.com/bulyon3/> |
Twitter<https://twitter.com/bulyon3>|
Instagram<https://www.instagram.com/bu.lyon3/?hl=fr>
_______________________________________________
Koha mailing list http://koha-community.org [email protected]
<mailto:[email protected]>
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://lists.koha-community.org/pipermail/koha-devel/attachments/20221026/d7712779/attachment-0001.htm
>
------------------------------
Subject: Digest Footer
_______________________________________________
Koha-devel mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/ git :
https://git.koha-community.org/ bugs :
https://bugs.koha-community.org/
------------------------------
End of Koha-devel Digest, Vol 203, Issue 15
*******************************************
_______________________________________________
Koha mailing list http://koha-community.org
[email protected]
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha