Actually rebuild_zebra.pl -b -r -v does the trick. I had broken my rebuild_zebra with some debugging logic while tracking down this problem. zebraidx was not being called... We look good now on this issue.
Solution: 1. use 'checkNonIndexedBiblios.pl -c' to get a report of missing biblios. 2. visit http://mylibrary.org/cgi-bin/koha/catalogue/detail.pl?biblionumber=14769 (missing biblio) and then Edit, and fixes the control number to be unique (repeat for each biblio) 3. rebuild_zebra.pl -b -r -v It would be good to build a tool to find duplicate control numbers. I did this by exporting all the biblios, using marcprint (my python utility) | grep "=001" | sort | uniq -c | sort -r | less, and looked for counts greater than 1. A better approach would be to make the control number a unique field and enforce this in the database. Is this possible? -Doug- On 1 September 2012 16:11, Doug Kingston <d...@randomnotes.org> wrote: > So doing some further research, it definitely looks like we have duplicate > control numbers (001). This is a data entry mistake and it looks like the > cataloger copied the biblios for similar entries. I have gone back and > altered the control numbers to be unique, but rebuild_zebra.pl -b -r is > not adding the new entries. Any idea what else we might need to do? > > -Doug- > > > On 1 September 2012 15:32, Ian Bays <ian.b...@ptfs-europe.com> wrote: > >> Hi. >> The 3.8 upgrade offers the dom indexing by default and if you have taken >> that option (as seen in $KOHA_CONF) the xsl used instead of record.abs >> (~/koha-dev/etc/zebradb/marc_**defs/marc21/biblios/biblio-**zebra-indexdefs.xsl) >> uses a construct (z:id) for the 001 which uses that (if it exists) as the >> zebra unique id. This means if you have more than one bib record with the >> same 001 (as you get if you duplicate a bib for instance) it will only >> index the last one and it won't complain at all about it. >> Not sure if it's a hangover from using the xml used by authorities which >> stores the auth_id in the 001 or UNIMARC which might use 001 as the bib >> number. Either way I bet if you remove the 001 or make it unique then it >> will index OK. >> The better solution is to fix the xsl to probably not use the z:id for >> biblios or maybe get it to use the 999$c, but the zebra config scares me. >> It took ages to find the cause so I hope this helps someone. >> Ian >> >> On 01/09/2012 18:11, Doug Kingston wrote: >> >>> On 1 September 2012 09:46, Jared Camins-Esakov >>> <jcam...@cpbibliography.com>**wrote: >>> >>> Doug, >>>> >>>> So environment variables are not the issue. We are carefully managing >>>> >>>>> those. >>>>> >>>>> Make sure when you are using cron jobs that you set the environment >>>> variables IN YOUR CRONTAB. Setting environment variables elsewhere is a >>>> recipe for confusion and misery down the road. However, this is -- as >>>> you >>>> say -- not the problem. >>>> >>>> >>>> I have tried using the new tool checkNonIndexedBiblios.pl (from patch >>>>> 6566) >>>>> and it indeed finds a few recent biblios that are not indexed. Using >>>>> the >>>>> -z option to mark them for indexing followed by a manual run of >>>>> rebuild_zebra -b -v -z did not get the biblios indexed. I cranked up >>>>> the >>>>> debugging on zebraidx (by modifying rebuild_zebra.pl and using -v -v) >>>>> and >>>>> did not see any obvious errors in the output that would suggest why >>>>> indexing was failing. >>>>> >>>>> Did you change your bibliographic frameworks? It could be a matter of >>>> the >>>> biblionumber not being stored properly. The other thing to do is to >>>> confirm >>>> that the non-indexed biblios are *actually* getting added to the >>>> zebraqueue >>>> by the 6566 script. It's kind of a long shot, but it could be an issue >>>> with >>>> the zebraqueue table getting corrupted. I've seen this happen when the >>>> zebraqueue table got too large, and disk space was low. >>>> >>>> So I think this is working as expected. Disk space is ample on the >>> system >>> in question, and the catalogue is small by most standards (about 2500 >>> biblios). I ran rebuild_zebra.pl with the -k flag so it left the >>> exported >>> records and here's the tree I got. >>> >>> library:/tmp# ls -altR p6tjtKrrK3/ >>> p6tjtKrrK3/: >>> total 0 >>> drwxrwxrwt 6 root root 1040 Sep 1 17:50 .. >>> drwx------ 5 koha koha 100 Sep 1 06:36 . >>> drwxr-xr-x 2 koha koha 60 Sep 1 06:36 upd_biblio >>> drwxr-xr-x 2 koha koha 60 Sep 1 06:36 del_biblio >>> drwxr-xr-x 2 koha koha 40 Sep 1 06:36 biblio >>> >>> p6tjtKrrK3/upd_biblio: >>> total 16 >>> -rw-r--r-- 1 koha koha 12670 Sep 1 06:36 exported_records >>> drwxr-xr-x 2 koha koha 60 Sep 1 06:36 . >>> drwx------ 5 koha koha 100 Sep 1 06:36 .. >>> >>> p6tjtKrrK3/del_biblio: >>> total 0 >>> drwx------ 5 koha koha 100 Sep 1 06:36 .. >>> drwxr-xr-x 2 koha koha 60 Sep 1 06:36 . >>> -rw-r--r-- 1 koha koha 0 Sep 1 06:36 exported_records >>> >>> p6tjtKrrK3/biblio: >>> total 0 >>> drwx------ 5 koha koha 100 Sep 1 06:36 .. >>> drwxr-xr-x 2 koha koha 40 Sep 1 06:36 . >>> >>> Using marcprint.py, a small python program built around pymarc package, I >>> decoded this file and find 13 MARC records, as expected. >>> Example: >>> =LDR 00871nam a22002417a 4500 >>> =001 201112071555.ls >>> =003 UkLoVW >>> =005 20111209110116.0 >>> =008 111207t1982\\\\enkg\\\\r\\\\\**001\0\eng\d >>> =040 \\$aUkLoVW$cUkLoVW >>> =099 \\$aQS 40 >>> =100 1\$aSheffield, Ken$92330 >>> =245 \0$aTen country dances :$bmainly from Thompson, Wright & Wilson. >>> =260 \\$aOxford :$b[The Author],$c1982. >>> =300 \\$a12 p. :$bmusic ;$c30 cm. >>> =490 1\$aFrom two barns ;$vv. 1 >>> =650 \\$9117$aCountry dances >>> =650 \\$9127$aDance music >>> =830 \5$aFrom two barns$92331 >>> =942 \\$2VWML$cBK$hQS 40$n0$6QS_00040 >>> =999 \\$c14879$d14879 >>> =952 \\$w2011-12-07$p10914$r2011-**12-07$40$00$6QS_00040$915083$** >>> bVWML$10$oQS >>> 40$d2011-12-07$70$cBOX$2VWML$**yBK$aVWML >>> =952 \\$w2011-12-07$p11121$r2011-**12-07$40$00$6QS_00040$915084$** >>> bVWML$10$oQS >>> 40$d2011-12-07$71$cBOX$2VWML$**yBK$aVWML >>> >>> I have attached an ascii printout of all 13 records in case someone wants >>> to look for a pattern in these records. >>> >>> The problem is either in the format/contents of those records, or in >>> zebraidx/zebrasrv or their config files. My suspicion is with the later >>> since we have already had to fix one problem there with for bug 6566. >>> >>> -Doug- >>> >>> Regards, >>>> Jared >>>> >>>> -- >>>> Jared Camins-Esakov >>>> Bibliographer, C & P Bibliography Services, LLC >>>> (phone) +1 (917) 727-3445 >>>> (e-mail) jcam...@cpbibliography.com >>>> (web) http://www.cpbibliography.com/ >>>> >>>> >>>> >>>> >>>> ______________________________**_________________ >>>> Koha mailing list http://koha-community.org >>>> Koha@lists.katipo.co.nz >>>> http://lists.katipo.co.nz/**mailman/listinfo/koha<http://lists.katipo.co.nz/mailman/listinfo/koha> >>>> >>> >> -- >> Ian Bays >> Director of Projects, PTFS Europe Limited >> Content Management and Library Solutions >> +44 (0) 800 756 6803 (phone) >> +44 (0) 7774 995297 (mobile) >> +44 (0) 800 756 6384 (fax) >> skype: ian.bays >> email: ian.b...@ptfs-europe.com >> >> >> ______________________________**_________________ >> Koha mailing list http://koha-community.org >> Koha@lists.katipo.co.nz >> http://lists.katipo.co.nz/**mailman/listinfo/koha<http://lists.katipo.co.nz/mailman/listinfo/koha> >> > > _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha