I agree, I'll improve the docs about this limit. Thanks Sheng. Mike McCandless
http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 10:59 PM, Sheng <sheng...@gmail.com> wrote: > I agree. That said, wouldn't it also make sense to clearly point it out by > adding the comments to the corresponding classes. This is not the first > time I am running into this "magic number" pitfall when using Lucene > (e.g., 1024 > limit for the token length in early version of Lucene). Generally speaking, > the documentation is pretty good and helpful. But without documenting > subtle issues like this, they may only manifest themselves in production > when the real data come in and they are "big". > > On Wednesday, July 6, 2016, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Well, if you must sort on a 32K single value (although I think this is > > extremely silly, _nobody_ will notice that two docs are out of order > > because they were identical up until the 30,000th character but the > > 30,001st character isn't sorted correctly), do as Mike suggests and > > chop it off before sending it to Lucene. > > > > Best, > > Erick > > > > On Wed, Jul 6, 2016 at 3:53 PM, Sheng <sheng...@gmail.com > <javascript:;>> > > wrote: > > > You misunderstand. I have many fields, and unfortunately a few of them > > are > > > quite big, i.e. exceeding the 32k limit. In order to make these "big" > > > fields sortable, they have to be stored as SortedDocValueField. Or that > > is > > > wrong, one can actually sort the search result by a "big" field without > > > indexing it to a SortedDocValueField. Suggestion ? > > > > > > On Wednesday, July 6, 2016, Erick Erickson <erickerick...@gmail.com > > <javascript:;>> wrote: > > > > > >> bq: In this case, we > > >> have to index a particular data structure which has bunch of fields > and > > >> each of them is promised to be searchable and search-sortable to the > > user > > >> > > >> If I'm reading this right, you have some structure. You say > > >> "each of them is promised to be searchable and search-sortable" > > >> > > >> It _sounds_ like what you want to do is break these fields out > > >> into separate fields each of which is searchable and sortable > > >> independently. But from what you've described, putting the entire > > >> thing into a single DV field isn't useful. > > >> > > >> Best, > > >> Erick > > >> > > >> > > >> > > >> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <sheng...@gmail.com > > <javascript:;> <javascript:;>> > > >> wrote: > > >> > To be clear, the "field" is indeed tokenized, which is accompanied > > with a > > >> > SortedDocValueField so that it is sortable too. Am I making the > wrong > > >> > assumption here ? > > >> > > > >> > On Wednesday, July 6, 2016, Sheng <sheng...@gmail.com > <javascript:;> > > <javascript:;>> > > >> wrote: > > >> > > > >> >> Hi Eric, > > >> >> > > >> >> I am refactoring a legacy system. One of the most annoying things > is > > I > > >> >> have to keep the old feature even though it makes little sense. In > > this > > >> >> case, we have to index a particular data structure which has bunch > of > > >> >> fields and each of them is promised to be searchable and > > >> search-sortable to > > >> >> the user. Turns out one field is notoriously large. I think the old > > >> >> implementation uses some quite clumsy way to make it happen. But > > since > > >> we > > >> >> decide to refactor the system with all the goodies from Lucene, we > > want > > >> to > > >> >> do the sorting right, and here we are at this issue... :-( > > >> >> > > >> >> On Wednesday, July 6, 2016, Erick Erickson < > erickerick...@gmail.com > > <javascript:;> > > >> <javascript:;> > > >> >> <javascript:_e(%7B%7D,'cvml','erickerick...@gmail.com > <javascript:;> > > <javascript:;>');>> > > >> wrote: > > >> >> > > >> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger > > than > > >> >>> 32K? > > >> >>> > > >> >>> You can't search it as text as it's not tokenized. Faceting and > > sorting > > >> >>> by a 32K > > >> >>> field doesn't seem very useful. You may have a perfectly valid > > reason, > > >> >>> but it's > > >> >>> not obvious what use-case you're serving from this thread so > far.... > > >> >>> > > >> >>> Nobody has yet put forth a compelling use-case for such large > > fields, > > >> >>> perhaps > > >> >>> this would be one. > > >> >>> > > >> >>> Best, > > >> >>> Erick > > >> >>> > > >> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <sheng...@gmail.com > > <javascript:;> > > >> <javascript:;>> wrote: > > >> >>> > Mike - Thanks for the prompt response. Is there a way to bypass > > this > > >> >>> > constraint for SortedDocValueField ? Or we have to live with it, > > >> >>> meaning no > > >> >>> > fix even in future release? > > >> >>> > > > >> >>> > On Wednesday, July 6, 2016, Michael McCandless < > > >> >>> luc...@mikemccandless.com <javascript:;> <javascript:;>> > > >> >>> > wrote: > > >> >>> > > > >> >>> >> I believe only binary DVs can be larger than 32K bytes. > > >> >>> >> > > >> >>> >> Mike McCandless > > >> >>> >> > > >> >>> >> http://blog.mikemccandless.com > > >> >>> >> > > >> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <sheng...@gmail.com > > <javascript:;> > > >> <javascript:;> > > >> >>> <javascript:;>> > > >> >>> >> wrote: > > >> >>> >> > > >> >>> >> > Hi, > > >> >>> >> > > > >> >>> >> > I am getting an IAE indicating one of the SortedDocValueField > > is > > >> too > > >> >>> >> large, > > >> >>> >> > > 32k > > >> >>> >> > > > >> >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed > > this > > >> >>> issue > > >> >>> >> in > > >> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss > > or > > >> >>> >> > misunderstand anything ? > > >> >>> >> > > > >> >>> >> > Thanks, > > >> >>> >> > > > >> >>> >> > > >> >>> > > >> >>> > > --------------------------------------------------------------------- > > >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > <javascript:;> > > >> <javascript:;> > > >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > > <javascript:;> > > >> <javascript:;> > > >> >>> > > >> >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > <javascript:;> > > >> <javascript:;> > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > <javascript:;> > > >> <javascript:;> > > >> > > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > <javascript:;> > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > <javascript:;> > > > > >