Re: [Koha] best practice for indexing and re-indexing

Katrin Fischer Fri, 13 Sep 2024 10:56:10 -0700

Hi Eric,

the search with Zebra and Elasticsearch only works when your records
include a unique identifier that links the record in the index with the
record in your database. This is achieved by adding the biblionumber to
the MARC record automatically. For MARC21 field 999 is used. These
fields and mappings should not be changed.


If you import records, the biblionumber will automatically be added. If
you want to carry over an identifier of your old system to Koha, in
MARC21 you could use 035$a with a prefix or 001/003.

You can't speed up the indexing process by adding anything to your MARC
data.

In general, indexing using Elasticsearch will be much quicker than using
Zebra for this number of records.

You can always do another full reindex, without deleting. But if you
load new improved records, you will need to reindex them again.

Hope that helps,

Katrin

On 09.09.24 20:11, Eric Lease Morgan wrote:

What are some of the best practices for Zebra indexing and re-indexing of MARC 
records; ought my MARC records include unique identifiers in sone  9xx field?

I am in the process of curating about .7 million MARC records, putting them 
into Koha, and providing access to them via both the traditional catalogue as 
well as the Search-Retrieve Via URL (SRU) interfaces. I am in a constant 
process of improving the records in one way or another. Adding date values. 
Adding subject headings. Adding content notes. Removing duplicates. Etc.

After creating an improved set of records, I have been zealously deleting 
bibliographic records using the command line, but this process also deletes 
things I don't want to be deleted. See: https://bit.ly/3XkMeKV

I know I can use bulkmarcimport.pl to delete records, but the process is very 
slow, especially when I want to delete 100's of thousands of items.

A few days ago I learned about the koha-rebuild-zebra command, and I believe I 
saw something about Zebra identifiers in 9xx fields flashing by on the screen. 
Maybe, if I put identifiers in a 9xx fields, I can re-index things more 
quickly? If so, then how?

Maybe, if my records have magic 9xx fields, then, when I use bulkmarcimport.pl 
to import things, Zebra will really overwrite my existing records? That would 
be nice.

After I create a new set of improved MARC records, how can I efficiently 
reindex them sans deleteing them from the MySQL database?

--
Eric Morgan


_______________________________________________

Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

_______________________________________________

Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

Re: [Koha] best practice for indexing and re-indexing

Reply via email to