Hi As you already read in Paul previous message about "BibLibre strategy for 3.4 and next version", we are growing, want be involved in the community as previously. Paul promised some POCs, here is one available. We also worked on Plack and support. We created a base of script to search for Memoryleaks. We'll demonstrate that later.
zebra is fast and embeds native z3950 server. But it has also some major drawbacks we have to cope with on our everyday life making it quite difficult to maintain. 1. zebra config files are a nightmare. You can't drive the configuration file easily. namely : Can't edit indexs via HTTP or configuration. all is in files hardcoded on disk. ⇒ you can't list indexes you can't change indexes, you can't edit indexes, you can't say I want this index at OPAC, that in intranet. (Could be done with scraping ccl.properties, and then record.abs and bib1.att…. But what a HELL) So you cannot customize configuration defining the indexes you want easily. And ppl donot get a translation of the indexes since all the indexes are hardcoded in the ccl.properties and we donot have a translation process so that ccl attributes could be translated into different languages. 2. no real-time indexing : the use of a crontab is poor: when you add an authority while creating a biblio, you have to wait some some minutes to end your biblio (might be solved since zebra has some way to index biblios via z3950 extended services, but hard and should be tested and at the time community first tested that, a performance problem was raised on indexing.) 3. no way to access/process/delete data easily. If you have indexes in it or have some problems with your data, you have to reindex the whole stuff and indexing errors are quite difficult to detect. 4. during index process of a file, if you have a problem in your data, zebraidx just fails silently… And this is NOT secure. And you have no way to know WHICH biblio made the process crash. We had a LOT of trouble with Aix-Marseille universities that have some arabic translitterated biblios that makes zebra/icu completly crash ! We had to do some recursive script to find 14 biblios on 730 000 that makes zebra crash (even is properly stored & displayed) 5. facets are not working properly : they are on the result displayed because there are problems with diacritics & facets that can't be solved as of today. And noone can provide a solution (we spoke about that with indexdata and no clear solution was really provided. 6. zebra does not evolve anymore. There is no real community around it, it's just an opensource indexdata software. We sent many questions onlist and never got answers. We could pay for better support but the fee required is quite deterrent and benefit is still questionable. 7. icu & zebra are colleagues, not really friends : right truncation not working, fuzzy search not working and facets. 8. we use a deprecated way to define indexes for biblios (grs1) and the tool developped by indexdata to change to DOM has many flaws. we could manage and do with it. But is it worth the strive ? I think that every one agrees that we have to refactor C4::Search. Indeed, query parser is not able to manage independantly all the configuration options. And usage of usmarc as internal for biblio comes with a serious limitation of 9999 bytes, which for big biblios with many items, is not enough. BibLibre investigated in a catalogue based on solr. A University in France contracted us for that development. This University is in relation with all the community here in France and solr will certainly be adopted by all the libraries France wide. We are planning to release the code on our git early spring next year and rebase on whatever Koha version will be released at that time 3.4 or 3.6. Why ? Solr indexes with data with HTTP. It can provide fuzzy search, search on synonyms, suggestions It can provide facet search, stemming. utf8 support is embedded. Community is really impressively reactive and numerous and efficient. And documentation is very good and exhaustive. You can see the results on solr.biblibre.com and catalogue.solr.biblibre.com http://catalogue.solr.biblibre.com/cgi-bin/koha/opac-search.pl?q=jean http://solr.biblibre.com/cgi-bin/koha/admin/admin-home.pl you can log there with demo/demo lgoin/password http://solr.biblibre.com/cgi-bin/koha/solr/indexes.pl is the page where ppl can manage their indexes and links. a) Librarians can define their own indexes, and there is a plugin that fetches data from rejected authorities and from authorised_values (that could/should have been achieved with zebra but only with major work on xslt). b) C4/Search.pm count lines of code could be shrinked ten times. You can test from poc_solr branch on git://git.biblibre.com/koha_biblibre.git But you have to install solr. Any feedback/idea welcome. -- Henri-Damien LAURENT BibLibre _______________________________________________ Koha-devel mailing list [email protected] http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
