I did some (very limited) testing on storing and retrieving MARC in YAML. The results were not encouraging. IIRC, I just did a direct conversion of the MARC::Record object into YAML and back. Perhaps there's a way to optimize the formatting that would improve performance, but my testing showed sometimes even worse performance than XML.
MARCXML is a performance killer at this point, but there's no other apparent way to handle large bib records. The parsing is the issue, not the data transfer load. Perhaps cached BSON-formatted MARC::Record objects are a way out of this. Clay On Tue, Oct 12, 2010 at 11:45 AM, Thomas Dukleth <[email protected]>wrote: > Reply inline: > > > On Tue, October 12, 2010 16:20, LAURENT Henri-Damien wrote: > > Le 12/10/2010 14:48, Thomas Dukleth a écrit : > >> Reply inline: > >> > >> > >> Original Subject: [Koha-devel] Search Engine Changes : let's get some > >> solr > >> > >> On Mon, October 4, 2010 08:10, LAURENT Henri-Damien wrote: > > [...] > > >>> I think that every one agrees that we have to refactor C4::Search. > >>> Indeed, query parser is not able to manage independantly all the > >>> configuration options. And usage of usmarc as internal for biblio comes > >>> with a serious limitation of 9999 bytes, which for big biblios with > >>> many > >>> items, is not enough. > >> > >> How do MARC limitations on record size relate to Solr/Indexing or Zebra > >> indexing which lacks Solr/Lucene support in the current version? > > Koha is now using iso2709 returned from zebra in order to display result > > lists. > > I recall that having Zebra return ISO2709, MARC communications format, > records had the supposed advantage of faster response time from Zebra. > > > Problem is that if zebra is returning only part of the biblio and/or > > MARC::Record is not able to parse the whole data then the biblio is not > > displayed. We have biblio records which contains more than 1000 items. > > And MARC::Record/MARC::File::XML fails to parse that. > > > > So this is a real issue. > > Ultimately, we need a specific solution to various problems arising from > storing holdings directly in the MARC bibliographic records. > > > > > > >> > >> How does BibLibre intend to fix the limitation on the size of > >> bibliographic records as part of its work on record indexing and > >> retrieval > >> in Koha or in some parallel work.? > > Solr/Lucene can return indexes and thoses be used in order to display > > desired data or we could also do the same as we do with zebra : > > - store the data record (Format could be iso2709 or marcxml or > YAML) > > - use that for display. > > If using ISO 2709, MARC communications format, how would the problem of > excess record size be addressed? > > > Or we could use GetBiblio in order to get the data from database. > > Problem now would be the fact that storing xml in database is not really > > optimal for process. > > I like the idea of using YAML for some purposes. > > As you state, previous testing showed that returning every record in a > large result set from the SQL database was very inefficient as compared to > using the records as part of the response from the index server. > > Is there any practical way of sufficiently improving the efficiency of > accessing a large set of records from the SQL database? How much might > retrieving and parsing YAML records from the database help? > > I can imagine using XSLT to pre-process MARCXML records into an > appropriate format, such YAML with embedded HTML, pure HTML, or whatever > is needed embedded for a particular purpose and storing the pre-processed > records in appropriate special purpose columns. Real time parsing would > be minimised. The OPAC result set display might use > biblioitems.recordOPACDisplayBrief. The standard single record view might > use biblioitems.recordOPACDisplayDetail. An ISBD card view might use > biblioitems.recordOPACDisplayISBD. > > [...] > > > Thomas Dukleth > Agogme > 109 E 9th Street, 3D > New York, NY 10003 > USA > http://www.agogme.com > +1 212-674-3783 > > > _______________________________________________ > Koha-devel mailing list > [email protected] > http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel >
_______________________________________________ Koha-devel mailing list [email protected] http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
