Hi! Thanks!! I'll try the DocValues for sure, and of course the smaller chunk size. Just to add up on the number of bytes stored: it's for instance 72 bytes for CEDD, ~96 for JCD, 64 bytes for OpponentHistogram, etc. and there is 0<n<10 fields per image (aka document).
cheers, Mathias On Sun, Jun 23, 2013 at 9:08 PM, Savia Beson <eks...@googlemail.com> wrote: > Uwe, > I think Mathias was talking about the case with many smallish fields that all > get read per document. DV approach would mean seeking N times, while stored > fields, only once? Or you meant he should encode all his fields into single > byte[]? > > Or did I get it all wrong about stored vs DV :) > > What helped a lot in a similar case was to make own codec and reduce chunk > size to something smallish, depending on your average document size⦠there is > a sweet spot somewhere compression/speed. > > Simply make your own Codec and delegate to: > > public final class MySmallishChunkStoredFieldFormat extends > CompressingStoredFieldsFormat { > > /** Sole constructor. */ > public MySmallishChunkStoredFieldFormat() { > //TODO: try different chunk sizes, maybe 1-2KB? > super("YourFormatName", CompressionMode.FAST, 1 << 12); > } > > } > > > On Jun 23, 2013, at 7:40 PM, Uwe Schindler <u...@thetaphi.de> wrote: > >> Hi, >> >> To do this type of processing, use the new DocValues field type. They are >> like FieldCache but persisted to disk. Different datatypes exist and can be >> used to get random access based on document number. They are organized as >> column-stride fields, means each column is a separate data structure with >> random access like a big array (persisted on disk). >> >> Stored Fields should *only* ever be used to display search results! >> >> Uwe >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -----Original Message----- >>> From: mathias....@gmail.com [mailto:mathias....@gmail.com] On Behalf Of >>> Mathias Lux >>> Sent: Sunday, June 23, 2013 7:27 PM >>> To: java-user@lucene.apache.org >>> Subject: Stored fields: decompression slows down in my scenario ... any idea >>> for a workaround? >>> >>> Hi! >>> >>> I'm managing the development of LIRE >>> (https://code.google.com/p/lire/), a image search toolbox based on Lucene. >>> While optimizing different search routines for global image features I came >>> around to take a look at the CPU usage, i.e. to see if my new distance >>> function is faster than the old one :) >>> >>> Unfortunately I found out the the decompression routine for stored fields >>> made up for nearly 60% of the search time. (see >>> http://www.semanticmetadata.net/?p=1092) >>> >>> So what I basically do is to open each document in an index sequentially, >>> check it upon distance to a query feature and maintain my result list. The >>> image features are in stored fields, byte[] arrays. I optimized quite a lot >>> to >>> get them really small and fast to parse and store. >>> >>> I know that this is not the way Lucene is intended to use, I'm working with >>> Lucene for years now :) And just to ensure you: approximate indexing and >>> local feature search are based on terms, ... and fast. >>> But linear search makes up an important part of LIRE, so I'd be glad to get >>> some suggestions how either to disable compression, or how to sneak in >>> byte[] data with some textual data that is "fast as hell" to read. >>> >>> cheers, >>> Mathias >>> >>> ps. I know that it'd be possible to write it to a data file, put it into >>> memory >>> and gain a lot of speed. But of course I'd prefer to maintain "just one" >>> index >>> and not two of them :) >>> >>> -- >>> Dr. Mathias Lux >>> Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux- >>> itec >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org