Hi!

Thanks!! I'll try the DocValues for sure, and of course the smaller
chunk size. Just to add up on the number of bytes stored: it's for
instance 72 bytes for CEDD, ~96 for JCD, 64 bytes for
OpponentHistogram, etc. and there is 0<n<10 fields per image (aka
document).

cheers,
  Mathias

On Sun, Jun 23, 2013 at 9:08 PM, Savia Beson <eks...@googlemail.com> wrote:
> Uwe,
> I think Mathias was talking about the case with many smallish fields that all 
> get read per document.  DV approach would mean seeking N times, while stored 
> fields, only once? Or you meant he should encode all his fields  into single 
> byte[]?
>
> Or did I get it all wrong about stored vs DV :)
>
> What helped a lot in a similar case was to make own codec and reduce chunk 
> size to something smallish, depending on your average document size… there is 
> a sweet spot somewhere compression/speed.
>
> Simply make your own Codec and delegate to:
>
> public final class MySmallishChunkStoredFieldFormat extends 
> CompressingStoredFieldsFormat {
>
>   /** Sole constructor. */
>   public MySmallishChunkStoredFieldFormat() {
>     //TODO: try different chunk sizes, maybe 1-2KB?
>     super("YourFormatName", CompressionMode.FAST, 1 << 12);
>   }
>
> }
>
>
> On Jun 23, 2013, at 7:40 PM, Uwe Schindler <u...@thetaphi.de> wrote:
>
>> Hi,
>>
>> To do this type of processing, use the new DocValues field type. They are 
>> like FieldCache but persisted to disk. Different datatypes exist and can be 
>> used to get random access based on document number. They are organized as 
>> column-stride fields, means each column is a separate data structure with 
>> random access like a big array (persisted on disk).
>>
>> Stored Fields should *only* ever be used to display search results!
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: mathias....@gmail.com [mailto:mathias....@gmail.com] On Behalf Of
>>> Mathias Lux
>>> Sent: Sunday, June 23, 2013 7:27 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Stored fields: decompression slows down in my scenario ... any idea
>>> for a workaround?
>>>
>>> Hi!
>>>
>>> I'm managing the development of LIRE
>>> (https://code.google.com/p/lire/), a image search toolbox based on Lucene.
>>> While optimizing different search routines for global image features I came
>>> around to take a look at the CPU usage, i.e. to see if my new distance
>>> function is faster than the old one :)
>>>
>>> Unfortunately I found out the the decompression routine for stored fields
>>> made up for nearly 60% of the search time. (see
>>> http://www.semanticmetadata.net/?p=1092)
>>>
>>> So what I basically do is to open each document in an index sequentially,
>>> check it upon distance to a query feature and maintain my result list. The
>>> image features are in stored fields, byte[] arrays. I optimized quite a lot 
>>> to
>>> get them really small and fast to parse and store.
>>>
>>> I know that this is not the way Lucene is intended to use, I'm working with
>>> Lucene for years now :) And just to ensure you: approximate indexing and
>>> local feature search are based on terms, ... and fast.
>>> But linear search makes up an important part of LIRE, so I'd be glad to get
>>> some suggestions how either to disable compression, or how to sneak in
>>> byte[] data with some textual data that is "fast as hell" to read.
>>>
>>> cheers,
>>>  Mathias
>>>
>>> ps. I know that it'd be possible to write it to a data file, put it into 
>>> memory
>>> and gain a lot of speed. But of course I'd prefer to maintain "just one" 
>>> index
>>> and not two of them :)
>>>
>>> --
>>> Dr. Mathias Lux
>>> Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-
>>> itec
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to