[jira] [Commented] (LUCENE-5596) Support for index/search large numeric field

Uwe Schindler (JIRA) Tue, 15 Apr 2014 00:20:13 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969292#comment-13969292
 ]


Uwe Schindler commented on LUCENE-5596:
---------------------------------------

There is one problem with the patch: Lucene currently encodes the shift value 
in the indexed tokens (xxxToPrefixCoded) with some offset as "type marker" 
(SHIFT_START_*; see e.g., 
[http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.6.0/org/apache/lucene/util/NumericUtils.java#149]).
 By that, it is ensured, that you hit no documents, if you index a field as 
integer but query as long, the differently encoded shift ensures that you don't 
find the term in the dictionary, so no documents are returned. With your patch, 
one can index a BigInteger and it might return random hits if queried as long 
or int range. Unfortunately the current "shift encoding" only supports 
fictional "short" and "byte", which is never used.
A second problem is: You are limited to a maximum shift of 127 (or 255 if you 
correctly mask the shift byte) currently, otherwise the encoding overflows.

I am not sure how to handle this. The main problem is Lucene's schemaless 
design (the index does not know the type of the field, except for stored 
fields), the "shift encoding" with the type marker bits is just a hack around 
that, to no produce incorrect results.

Because of that we should really do some investigation before starting to push 
those changes in. Maybe only make it work on Lucene trunk only and change the 
index encoding completely.

> Support for index/search large numeric field
> --------------------------------------------
>
>                 Key: LUCENE-5596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5596
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Kevin Wang
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-5596.patch, LUCENE-5596.patch
>
>
> Currently if an number is larger than Long.MAX_VALUE, we can't index/search 
> that in lucene as a number. For example, IPv6 address is an 128 bit number, 
> so we can't index that as a numeric field and do numeric range query etc.
> It would be good to support BigInteger / BigDecimal
> I've tried use BigInteger for IPv6 in Elasticsearch and that works fine, but 
> there are still lots of things to do
> https://github.com/elasticsearch/elasticsearch/pull/5758



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5596) Support for index/search large numeric field

Reply via email to