I agree, I'll improve the docs about this limit. Thanks Sheng.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 10:59 PM, Sheng wrote:
> I agree. That said, wouldn't it also make sense to clearly point it out by
> adding the comments to the corresponding classes. This is
I agree. That said, wouldn't it also make sense to clearly point it out by
adding the comments to the corresponding classes. This is not the first
time I am running into this "magic number" pitfall when using Lucene
(e.g., 1024
limit for the token length in early version of Lucene). Generally speak
Well, if you must sort on a 32K single value (although I think this is
extremely silly, _nobody_ will notice that two docs are out of order
because they were identical up until the 30,000th character but the
30,001st character isn't sorted correctly), do as Mike suggests and
chop it off before send
You misunderstand. I have many fields, and unfortunately a few of them are
quite big, i.e. exceeding the 32k limit. In order to make these "big"
fields sortable, they have to be stored as SortedDocValueField. Or that is
wrong, one can actually sort the search result by a "big" field without
indexin
Yes, or you could get the utf8 bytes yourself client side and check that
length.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 6:16 PM, Sheng wrote:
> Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of
> characters a payload string can carry?
>
> On We
bq: In this case, we
have to index a particular data structure which has bunch of fields and
each of them is promised to be searchable and search-sortable to the user
If I'm reading this right, you have some structure. You say
"each of them is promised to be searchable and search-sortable"
It _so
Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of
characters a payload string can carry?
On Wednesday, July 6, 2016, Michael McCandless
wrote:
> Maybe you could simply truncate the user-supplied values at 32 KB?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed
To be clear, the "field" is indeed tokenized, which is accompanied with a
SortedDocValueField so that it is sortable too. Am I making the wrong
assumption here ?
On Wednesday, July 6, 2016, Sheng wrote:
> Hi Eric,
>
> I am refactoring a legacy system. One of the most annoying things is I
> have
Maybe you could simply truncate the user-supplied values at 32 KB?
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 5:55 PM, Sheng wrote:
> Hi Eric,
>
> I am refactoring a legacy system. One of the most annoying things is I have
> to keep the old feature even though it mak
Hi Eric,
I am refactoring a legacy system. One of the most annoying things is I have
to keep the old feature even though it makes little sense. In this case, we
have to index a particular data structure which has bunch of fields and
each of them is promised to be searchable and search-sortable to
Is this an "XY" problem? Meaning, why do you need DV fields larger than 32K?
You can't search it as text as it's not tokenized. Faceting and sorting by a 32K
field doesn't seem very useful. You may have a perfectly valid reason, but it's
not obvious what use-case you're serving from this thread so
Mike - Thanks for the prompt response. Is there a way to bypass this
constraint for SortedDocValueField ? Or we have to live with it, meaning no
fix even in future release?
On Wednesday, July 6, 2016, Michael McCandless
wrote:
> I believe only binary DVs can be larger than 32K bytes.
>
> Mike Mc
I believe only binary DVs can be larger than 32K bytes.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jul 6, 2016 at 10:31 AM, Sheng wrote:
> Hi,
>
> I am getting an IAE indicating one of the SortedDocValueField is too large,
> > 32k
>
> I googled a bit, and it seems like #Lucene-4583
13 matches
Mail list logo