Re: dv field is too large

2016-07-07 Thread Michael McCandless
I agree, I'll improve the docs about this limit. Thanks Sheng. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 10:59 PM, Sheng wrote: > I agree. That said, wouldn't it also make sense to clearly point it out by > adding the comments to the corresponding classes. This is

Re: dv field is too large

2016-07-06 Thread Sheng
I agree. That said, wouldn't it also make sense to clearly point it out by adding the comments to the corresponding classes. This is not the first time I am running into this "magic number" pitfall when using Lucene (e.g., 1024 limit for the token length in early version of Lucene). Generally speak

Re: dv field is too large

2016-07-06 Thread Erick Erickson
Well, if you must sort on a 32K single value (although I think this is extremely silly, _nobody_ will notice that two docs are out of order because they were identical up until the 30,000th character but the 30,001st character isn't sorted correctly), do as Mike suggests and chop it off before send

Re: dv field is too large

2016-07-06 Thread Sheng
You misunderstand. I have many fields, and unfortunately a few of them are quite big, i.e. exceeding the 32k limit. In order to make these "big" fields sortable, they have to be stored as SortedDocValueField. Or that is wrong, one can actually sort the search result by a "big" field without indexin

Re: dv field is too large

2016-07-06 Thread Michael McCandless
Yes, or you could get the utf8 bytes yourself client side and check that length. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 6:16 PM, Sheng wrote: > Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of > characters a payload string can carry? > > On We

Re: dv field is too large

2016-07-06 Thread Erick Erickson
bq: In this case, we have to index a particular data structure which has bunch of fields and each of them is promised to be searchable and search-sortable to the user If I'm reading this right, you have some structure. You say "each of them is promised to be searchable and search-sortable" It _so

Re: dv field is too large

2016-07-06 Thread Sheng
Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of characters a payload string can carry? On Wednesday, July 6, 2016, Michael McCandless wrote: > Maybe you could simply truncate the user-supplied values at 32 KB? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed

Re: dv field is too large

2016-07-06 Thread Sheng
To be clear, the "field" is indeed tokenized, which is accompanied with a SortedDocValueField so that it is sortable too. Am I making the wrong assumption here ? On Wednesday, July 6, 2016, Sheng wrote: > Hi Eric, > > I am refactoring a legacy system. One of the most annoying things is I > have

Re: dv field is too large

2016-07-06 Thread Michael McCandless
Maybe you could simply truncate the user-supplied values at 32 KB? Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 5:55 PM, Sheng wrote: > Hi Eric, > > I am refactoring a legacy system. One of the most annoying things is I have > to keep the old feature even though it mak

Re: dv field is too large

2016-07-06 Thread Sheng
Hi Eric, I am refactoring a legacy system. One of the most annoying things is I have to keep the old feature even though it makes little sense. In this case, we have to index a particular data structure which has bunch of fields and each of them is promised to be searchable and search-sortable to

Re: dv field is too large

2016-07-06 Thread Erick Erickson
Is this an "XY" problem? Meaning, why do you need DV fields larger than 32K? You can't search it as text as it's not tokenized. Faceting and sorting by a 32K field doesn't seem very useful. You may have a perfectly valid reason, but it's not obvious what use-case you're serving from this thread so

Re: dv field is too large

2016-07-06 Thread Sheng
Mike - Thanks for the prompt response. Is there a way to bypass this constraint for SortedDocValueField ? Or we have to live with it, meaning no fix even in future release? On Wednesday, July 6, 2016, Michael McCandless wrote: > I believe only binary DVs can be larger than 32K bytes. > > Mike Mc

Re: dv field is too large

2016-07-06 Thread Michael McCandless
I believe only binary DVs can be larger than 32K bytes. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 10:31 AM, Sheng wrote: > Hi, > > I am getting an IAE indicating one of the SortedDocValueField is too large, > > 32k > > I googled a bit, and it seems like #Lucene-4583