[
https://issues.apache.org/jira/browse/LUCENE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969292#comment-13969292
]
Uwe Schindler commented on LUCENE-5596:
---------------------------------------
There is one problem with the patch: Lucene currently encodes the shift value
in the indexed tokens (xxxToPrefixCoded) with some offset as "type marker"
(SHIFT_START_*; see e.g.,
[http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.6.0/org/apache/lucene/util/NumericUtils.java#149]).
By that, it is ensured, that you hit no documents, if you index a field as
integer but query as long, the differently encoded shift ensures that you don't
find the term in the dictionary, so no documents are returned. With your patch,
one can index a BigInteger and it might return random hits if queried as long
or int range. Unfortunately the current "shift encoding" only supports
fictional "short" and "byte", which is never used.
A second problem is: You are limited to a maximum shift of 127 (or 255 if you
correctly mask the shift byte) currently, otherwise the encoding overflows.
I am not sure how to handle this. The main problem is Lucene's schemaless
design (the index does not know the type of the field, except for stored
fields), the "shift encoding" with the type marker bits is just a hack around
that, to no produce incorrect results.
Because of that we should really do some investigation before starting to push
those changes in. Maybe only make it work on Lucene trunk only and change the
index encoding completely.
> Support for index/search large numeric field
> --------------------------------------------
>
> Key: LUCENE-5596
> URL: https://issues.apache.org/jira/browse/LUCENE-5596
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Kevin Wang
> Assignee: Uwe Schindler
> Attachments: LUCENE-5596.patch, LUCENE-5596.patch
>
>
> Currently if an number is larger than Long.MAX_VALUE, we can't index/search
> that in lucene as a number. For example, IPv6 address is an 128 bit number,
> so we can't index that as a numeric field and do numeric range query etc.
> It would be good to support BigInteger / BigDecimal
> I've tried use BigInteger for IPv6 in Elasticsearch and that works fine, but
> there are still lots of things to do
> https://github.com/elasticsearch/elasticsearch/pull/5758
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]