HI again, Mason, I just remembered a little trick that you might find useful.
Try the following: echo "PZ 7 .W663 1984" | yaz-icu -x -c /path/to/phrases-icu.xml echo "PZ 7 .W663 1984" | yaz-icu -x -c /path/to/words-icu.xml That should show you how the string is normalized and tokenized for indexing with ICU. You should see the same thing when you're using yaz-client, but this can be a bit more convenient I reckon. David Cook Systems Librarian Prosentient Systems 72/330 Wattle St Ultimo, NSW 2007 Australia Office: 02 9212 0899 Direct: 02 8005 0595 -----Original Message----- From: koha-devel-boun...@lists.koha-community.org <koha-devel-boun...@lists.koha-community.org> On Behalf Of dc...@prosentient.com.au Sent: Wednesday, 10 July 2019 5:05 PM To: 'Mason James' <m...@kohaaloha.com>; koha@lists.katipo.co.nz; koha-de...@lists.koha-community.org Subject: Re: [Koha-devel] Problems searching callnumbers with Koha and ICU Hi Mason, Can you tell us what version of Zebra you're running? And what is your exact query? According to https://packages.debian.org/stretch/idzebra-2.0, you're probably running Zebra 2.0.59, unless you're pulling packages from Indexdata's APT repository. I discovered a ICU bug in Zebra 2.0.59 back in February 2015, which could very well be impacting you now. At the time, I thought it was just an issue when hyphens were used in search terms, but I've had the same problem with spaces lately when using "se,phr,ext" (which uses the phrase register rather than the word register) with Zebra 2.0.59 on Debian. I think most people using ICU are using Zebra from Indexdata's APT repositories. I had an issue with that recently but I'm going to revisit it soon. I have a few other ICU related questions that I have asked Indexdata, but so far I haven't heard back. That's mostly about how normalization and tokenization is done at search time vs index time, as I don't think the documentation is clear about that. (For instance, https://software.indexdata.com/zebra/doc/icuchain-files.html says " The ICU chain files defines a chain of rules which specify the conversion process to be carried out for each record string for indexing. Both searching and sorting is based on the sort normalization that ICU provides. This means that scan and sort will return terms in the sort order given by ICU." Which to me sounds like different rules are used for indexing and searching/sorting, which is consistent with my testing. I think search/sort uses default ICU settings while indexing uses custom settings and we replace apostrophes with a space when indexing in the word register but search replaces apostrophes with nothing which creates tokenization issues that don't match up, but I digress...) Relevant reading: 1. Look for ZEB-664 in https://software.indexdata.com/zebra/doc/NEWS 2. Robin opened a bug report in Debian but it never went anywhere: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777515;msg=5 3. https://github.com/indexdata/idzebra/commit/704fd190292cb771df94553b0ed6f9f4 b71660a6 4. https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=16581 David Cook Systems Librarian Prosentient Systems 72/330 Wattle St Ultimo, NSW 2007 Australia Office: 02 9212 0899 Direct: 02 8005 0595 -----Original Message----- From: koha-devel-boun...@lists.koha-community.org <koha-devel-boun...@lists.koha-community.org> On Behalf Of Mason James Sent: Wednesday, 10 July 2019 3:54 PM To: koha@lists.katipo.co.nz; koha-de...@lists.koha-community.org Subject: [Koha-devel] Problems searching callnumbers with Koha and ICU Hi Folks Has anyone hit a problem searching callnumbers with Koha and ICU - specifically callnumbers with SPACE ' ' characters? An example problematic callnumber is 'PZ 7 .W663 1984' Or, has anyone had *success* searching callnumbers with ICU? :) Either way, I'd be curious to hear from you I tested on Koha 18.05.12 and Debian 9.8 Cheers, Mason _______________________________________________ Koha-devel mailing list koha-de...@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
signature.asc
Description: PGP signature
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha