GROUP BY in Lucene

2015-08-09 Thread Gimantha Bandara
Hi all, Is there a way to achieve $subject? For example, consider the following SQL query. SELECT A, B, C SUM(D) as E FROM `table` WHERE time BETWEEN fromDate AND toDate *GROUP BY X,Y,Z* In the above query we can group the records by, X,Y,Z. Is there a way to achieve the same in Lucene? (I gues

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Olivier Binda
On 08/09/2015 06:29 PM, Uwe Schindler wrote: Hi, My values are unique and equal to the number of documents, They have varying sizes, say at least 10 bytes and may be a lot bigger (say 4kbytes) I don't share, index or sort them. I don't do grouping/faceting either I only want to store, retri

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Toke Eskildsen
Arjen van der Meijden wrote: > On 9-8-2015 16:22, Toke Eskildsen wrote: > > Maybe you could update the JavaDoc for that field to warn against using it? > It (probably) depends on the contents of the values. That was my impression too, but we both seem to be second-guessing Robert's very non-nuan

RE: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Uwe Schindler
Hi, > My values are unique and equal to the number of documents, They have > varying sizes, say at least 10 bytes and may be a lot bigger (say 4kbytes) > > I don't share, index or sort them. > I don't do grouping/faceting either > > > I only want to store, retrieve and traverse those values T

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Olivier Binda
On 08/09/2015 04:55 PM, Arjen van der Meijden wrote: On 9-8-2015 16:22, Toke Eskildsen wrote: Robert Muir wrote: I am tired of repeating this: Don't use BINARY docvalues Don't use BINARY docvalues Don't use BINARY docvalues Use types like SORTED/SORTED_SET which will compress the term diction

Re: Mapping doc values back to doc ID (in decent time)

2015-08-09 Thread András Péteri
If I understand it correctly, the Zoie library [1][2] implements the "sledgehammer" approach by collecting docValues for all documents when a segment reader is opened. If you have some RAM to throw at the problem, this could indeed bring you an acceptable level of performance. [1] http://senseidb.

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Arjen van der Meijden
On 9-8-2015 16:22, Toke Eskildsen wrote: > Robert Muir wrote: >> I am tired of repeating this: >> Don't use BINARY docvalues >> Don't use BINARY docvalues >> Don't use BINARY docvalues >> Use types like SORTED/SORTED_SET which will compress the term >> dictionary and make use of ordinals in your

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Toke Eskildsen
Robert Muir wrote: > I am tired of repeating this: > Don't use BINARY docvalues > Don't use BINARY docvalues > Don't use BINARY docvalues > Use types like SORTED/SORTED_SET which will compress the term > dictionary and make use of ordinals in your application instead. This seems contrary to http

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Robert Muir
That makes no sense at all, it would make it slow as shit. I am tired of repeating this: Don't use BINARY docvalues Don't use BINARY docvalues Don't use BINARY docvalues Use types like SORTED/SORTED_SET which will compress the term dictionary and make use of ordinals in your application instead.

Re: Mapping doc values back to doc ID (in decent time)

2015-08-09 Thread Trejkaz
On Fri, Aug 7, 2015 at 5:34 PM, Adrien Grand wrote: > Does your application actually iterate in order over dense ids, or is > it just for benchmarking purposes? Because if it does, you probably > don't actually need seeking, you could just see what the current ID in > the terms enum is. Both dens

Re: Lucene TermsFilter lookup slow

2015-08-09 Thread jamie
Mike Thank you kindly for the reply. I am using Lucene v4.10.4. Are the optimization you refer to, available in this version? We haven't yet upgraded to Lucene 5 as there appear to be many API changes. Jamie On 2015/08/08 5:13 PM, Michael McCandless wrote: Which version of Lucene are you us