Actual min and max-value of NumericField during codec flush

2014-02-06 Thread Ravikumar Govindarajan
I use a Codec to flush data. All methods delegate to actual Lucene42Codec,
except for intercepting one single-field. This field is indexed as an
IntField [Numeric-Trie...], with precisionStep=4.

The purpose of the Codec is as follows

1. Note the first BytesRef for this field
2. During finish() call [TermsConsumer.java], note the last BytesRef for
this field
3. Converts both the first/last BytesRef to respective integers
4. Store these 2 ints in segment-info diagnostics

The problem with this approach is that, first/last BytesRef is totally
different from the actual "int" values I try to index. I guess, this is
because Numeric-Trie explodes all the integers into it's own format of
BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics

Is there a way I can record actual min/max int-values correctly in my codec
and still support NumericRange search?

--
Ravi


Re: Actual min and max-value of NumericField during codec flush

2014-02-06 Thread Michael McCandless
Somewhere in those numeric trie terms are the exact integers from your
documents, encoded.

You can use oal.util.NumericUtils.prefixCodecToInt to get the int
value back from the BytesRef term.

But you need to filter out the "higher level" terms, e.g. using
NumericUtils.getPrefixCodedLongShift(term) == 0.  Or use
NumericUtils.filterPrefixCodedLongs to wrap a TermsEnum.  I believe
all the terms you want come first, so once you hit a term where
.getPrefixCodedLongShift is > 0, that's your max term and you can stop
checking.

BTW, in 5.0, the codec API for PostingsFormat has improved, so that
you can e.g. pull your own TermsEnum and iterate the terms yourself.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 6, 2014 at 5:16 AM, Ravikumar Govindarajan
 wrote:
> I use a Codec to flush data. All methods delegate to actual Lucene42Codec,
> except for intercepting one single-field. This field is indexed as an
> IntField [Numeric-Trie...], with precisionStep=4.
>
> The purpose of the Codec is as follows
>
> 1. Note the first BytesRef for this field
> 2. During finish() call [TermsConsumer.java], note the last BytesRef for
> this field
> 3. Converts both the first/last BytesRef to respective integers
> 4. Store these 2 ints in segment-info diagnostics
>
> The problem with this approach is that, first/last BytesRef is totally
> different from the actual "int" values I try to index. I guess, this is
> because Numeric-Trie explodes all the integers into it's own format of
> BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics
>
> Is there a way I can record actual min/max int-values correctly in my codec
> and still support NumericRange search?
>
> --
> Ravi

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Wildcard searches

2014-02-06 Thread Allison, Timothy B.
Ditto Jack on ComplexPhraseQueryParser.

See also: https://issues.apache.org/jira/i#browse/LUCENE-5205

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, February 05, 2014 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard searches

Take a look at the complex phrase query parser.

See:
http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html

See also:
https://issues.apache.org/jira/browse/LUCENE-1486

-- Jack Krupansky

-Original Message- 
From: raghavendra.k@barclays.com
Sent: Wednesday, February 5, 2014 6:30 PM
To: java-user@lucene.apache.org
Subject: Wildcard searches

Hi,

Can Lucene support wildcard searches such as the ones shown below?

Indexed value is "XYZ CORPORATION LIMITED".

XYZ CORPORATION LIMI*
XYZ CORPORATION *MIT*
XYZ *PORAT* LIMI*
*YZ CORPO* LIMITE*

In other words, the flexibility for the user to provide a wild card at any 
position, in a situation where they aren't sure about the exact value. 
Ignoring the performance aspect, please suggest if it is even possible. If 
yes, please provide further inputs on how to approach it such as Analyzer / 
Tokenizer to consider, whether PhraseQueries can be formed etc.

Any input is greatly appreciated.

Regards,
Raghu


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___ 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Highlighting text, do I seriously have to reimplement this from scratch?

2014-02-06 Thread Michael Sokolov

On 2/6/2014 12:53 AM, Earl Hood wrote:

On Tue, Feb 4, 2014 at 6:05 PM, Michael Sokolov wrote:


Thanks for the feedback.  I think it's difficult to know what to do about
attribute value highlighting in the general case - do you have any
suggestions?

That is a challenging one since one has to know how attribute data will
be transformed for rendering purposes.

I do not know the workings of Lux, so I cannot provide any specific
suggestions on what Lux can do.  I would need time to dive into it.

However, one solution is to workaround the limitation by preprocessing
the data in a form that is friendly to Lux (or at least the highligher).
For example, if I have attribute data I know will be transformed into
renderable content, I would transform it into element-style content,
which should be more friendly for indexing and highlighting purposes.

Lux's XmlHighlighter wraps matching text in an XML element tag.  The 
name of the tag is configurable.  But it won't work for attribute values 
since XML doesn't allow "<" in an attribute value.  I think Olivier's 
suggestion of providing a callback is interesting; that way we can 
provide the user much greater control, and the "highlighter" can 
actually become more of a query-driven document-processing engine: you 
could imagine fairly complex document transformations driven by Lucene 
query matching.


I created http://issues.luxdb.org/browse/LUX-73 to track that.  If 
anybody is interested in continuing this discussion, I'd suggest picking 
it up over on Lux's mailing list at lu...@luxdb.org since this seems a 
little off topic here.


-Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Regarding CorruptedIndexException in using Lucene Facet Search

2014-02-06 Thread Jebarlin Robertson
Hi,

I am using Lucene 3.6 version for indexing and searching in Android
application.
I have implemented Facet search. But when I try to search, it is giving the
below exception  while creating the DirectoryTaxonomyReader object.

02-06 21:00:58.082: W/System.err(15518):
org.apache.lucene.index.CorruptIndexException: Missing parent data for
category 1


Can anyone help me to know what is the problem for this. Whether the
Categories are not added to the Lucene index or some other problem.


It will be better if somebody provides some sample code to use lucene facet
for 3.6 version.


-- 
Thanks & Regards,
Jebarlin Robertson.R
GSM: 91-9538106181.


Re: Regarding CorruptedIndexException in using Lucene Facet Search

2014-02-06 Thread Shai Erera
It looks like something's wrong with the index indeed. Are you sure you
committed both the IndexWriter and TaxoWriter?
Do you have some sort of testcase / short program which demonstrates the
problem?

I know there were few issues running Lucene on Android, so I cannot
guarantee it works fully .. we never tested this code on Android.

Shai


On Thu, Feb 6, 2014 at 3:21 PM, Jebarlin Robertson wrote:

> Hi,
>
> I am using Lucene 3.6 version for indexing and searching in Android
> application.
> I have implemented Facet search. But when I try to search, it is giving the
> below exception  while creating the DirectoryTaxonomyReader object.
>
> 02-06 21:00:58.082: W/System.err(15518):
> org.apache.lucene.index.CorruptIndexException: Missing parent data for
> category 1
>
>
> Can anyone help me to know what is the problem for this. Whether the
> Categories are not added to the Lucene index or some other problem.
>
>
> It will be better if somebody provides some sample code to use lucene facet
> for 3.6 version.
>
>
> --
> Thanks & Regards,
> Jebarlin Robertson.R
> GSM: 91-9538106181.
>


RE: Wildcard searches

2014-02-06 Thread raghavendra.k.rao
Thank you, Tim.

I have read that ComplexPhraseQueryParser has issues while searching in more 
than one field. In my case, I need to search the value in multiple fields of 
each document.

Do you think it is possible? Also, could you please direct me to any useful 
links for ComplexPhraseQueryParser that you may be aware of? I am looking for 
some examples. Thanks!

Regards,
Raghu


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, February 06, 2014 8:02 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Ditto Jack on ComplexPhraseQueryParser.

See also: https://issues.apache.org/jira/i#browse/LUCENE-5205

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, February 05, 2014 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard searches

Take a look at the complex phrase query parser.

See:
http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html

See also:
https://issues.apache.org/jira/browse/LUCENE-1486

-- Jack Krupansky

-Original Message-
From: raghavendra.k@barclays.com
Sent: Wednesday, February 5, 2014 6:30 PM
To: java-user@lucene.apache.org
Subject: Wildcard searches

Hi,

Can Lucene support wildcard searches such as the ones shown below?

Indexed value is "XYZ CORPORATION LIMITED".

XYZ CORPORATION LIMI*
XYZ CORPORATION *MIT*
XYZ *PORAT* LIMI*
*YZ CORPO* LIMITE*

In other words, the flexibility for the user to provide a wild card at any 
position, in a situation where they aren't sure about the exact value. 
Ignoring the performance aspect, please suggest if it is even possible. If yes, 
please provide further inputs on how to approach it such as Analyzer / 
Tokenizer to consider, whether PhraseQueries can be formed etc.

Any input is greatly appreciated.

Regards,
Raghu


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___ 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Wildcard searches

2014-02-06 Thread Allison, Timothy B.
Sorry, you're right.  I'm not sure that it analyzes multiterm components, 
either.  The Surround query parser also has similar limitations.

Best bet might be to compile: 
https://issues.apache.org/jira/i#browse/LUCENE-5205 or 
https://issues.apache.org/jira/browse/LUCENE-1486 .


-Original Message-
From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] 
Sent: Thursday, February 06, 2014 11:49 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Thank you, Tim.

I have read that ComplexPhraseQueryParser has issues while searching in more 
than one field. In my case, I need to search the value in multiple fields of 
each document.

Do you think it is possible? Also, could you please direct me to any useful 
links for ComplexPhraseQueryParser that you may be aware of? I am looking for 
some examples. Thanks!

Regards,
Raghu


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, February 06, 2014 8:02 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Ditto Jack on ComplexPhraseQueryParser.

See also: https://issues.apache.org/jira/i#browse/LUCENE-5205

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, February 05, 2014 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard searches

Take a look at the complex phrase query parser.

See:
http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html

See also:
https://issues.apache.org/jira/browse/LUCENE-1486

-- Jack Krupansky

-Original Message-
From: raghavendra.k@barclays.com
Sent: Wednesday, February 5, 2014 6:30 PM
To: java-user@lucene.apache.org
Subject: Wildcard searches

Hi,

Can Lucene support wildcard searches such as the ones shown below?

Indexed value is "XYZ CORPORATION LIMITED".

XYZ CORPORATION LIMI*
XYZ CORPORATION *MIT*
XYZ *PORAT* LIMI*
*YZ CORPO* LIMITE*

In other words, the flexibility for the user to provide a wild card at any 
position, in a situation where they aren't sure about the exact value. 
Ignoring the performance aspect, please suggest if it is even possible. If yes, 
please provide further inputs on how to approach it such as Analyzer / 
Tokenizer to consider, whether PhraseQueries can be formed etc.

Any input is greatly appreciated.

Regards,
Raghu


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___ 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Wildcard searches

2014-02-06 Thread raghavendra.k.rao
Sorry, but I don't know what exactly you mean by compile from these locations. 
Do you mean I could download and customize the code?

Regards,
Raghu


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, February 06, 2014 2:35 PM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Sorry, you're right.  I'm not sure that it analyzes multiterm components, 
either.  The Surround query parser also has similar limitations.

Best bet might be to compile: 
https://issues.apache.org/jira/i#browse/LUCENE-5205 or 
https://issues.apache.org/jira/browse/LUCENE-1486 .


-Original Message-
From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] 
Sent: Thursday, February 06, 2014 11:49 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Thank you, Tim.

I have read that ComplexPhraseQueryParser has issues while searching in more 
than one field. In my case, I need to search the value in multiple fields of 
each document.

Do you think it is possible? Also, could you please direct me to any useful 
links for ComplexPhraseQueryParser that you may be aware of? I am looking for 
some examples. Thanks!

Regards,
Raghu


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, February 06, 2014 8:02 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Ditto Jack on ComplexPhraseQueryParser.

See also: https://issues.apache.org/jira/i#browse/LUCENE-5205

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, February 05, 2014 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard searches

Take a look at the complex phrase query parser.

See:
http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html

See also:
https://issues.apache.org/jira/browse/LUCENE-1486

-- Jack Krupansky

-Original Message-
From: raghavendra.k@barclays.com
Sent: Wednesday, February 5, 2014 6:30 PM
To: java-user@lucene.apache.org
Subject: Wildcard searches

Hi,

Can Lucene support wildcard searches such as the ones shown below?

Indexed value is "XYZ CORPORATION LIMITED".

XYZ CORPORATION LIMI*
XYZ CORPORATION *MIT*
XYZ *PORAT* LIMI*
*YZ CORPO* LIMITE*

In other words, the flexibility for the user to provide a wild card at any 
position, in a situation where they aren't sure about the exact value. 
Ignoring the performance aspect, please suggest if it is even possible. If yes, 
please provide further inputs on how to approach it such as Analyzer / 
Tokenizer to consider, whether PhraseQueries can be formed etc.

Any input is greatly appreciated.

Regards,
Raghu


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___ 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Wildcard searches

2014-02-06 Thread Allison, Timothy B.
Y, if you're game to try it, that is one option.  If svn and build tools (mvn) 
are new to you, there are some startup costs to getting it up and running (e.g. 
http://stackoverflow.com/a/10073686/2938927).

This might help:
http://lucene.472066.n3.nabble.com/How-can-I-compile-and-debug-Solr-from-source-code-td4049712.html
 

Or this for Windows 7/8: 
http://wiki.apache.org/solr/HowToCompileSolr 

I also might post jars for LUCENE-5205 and SOLR-5410 on github if there is 
interest.  

-Original Message-
From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] 
Sent: Thursday, February 06, 2014 5:19 PM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Sorry, but I don't know what exactly you mean by compile from these locations. 
Do you mean I could download and customize the code?

Regards,
Raghu


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, February 06, 2014 2:35 PM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Sorry, you're right.  I'm not sure that it analyzes multiterm components, 
either.  The Surround query parser also has similar limitations.

Best bet might be to compile: 
https://issues.apache.org/jira/i#browse/LUCENE-5205 or 
https://issues.apache.org/jira/browse/LUCENE-1486 .


-Original Message-
From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] 
Sent: Thursday, February 06, 2014 11:49 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Thank you, Tim.

I have read that ComplexPhraseQueryParser has issues while searching in more 
than one field. In my case, I need to search the value in multiple fields of 
each document.

Do you think it is possible? Also, could you please direct me to any useful 
links for ComplexPhraseQueryParser that you may be aware of? I am looking for 
some examples. Thanks!

Regards,
Raghu


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, February 06, 2014 8:02 AM
To: java-user@lucene.apache.org
Subject: RE: Wildcard searches

Ditto Jack on ComplexPhraseQueryParser.

See also: https://issues.apache.org/jira/i#browse/LUCENE-5205

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, February 05, 2014 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard searches

Take a look at the complex phrase query parser.

See:
http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html

See also:
https://issues.apache.org/jira/browse/LUCENE-1486

-- Jack Krupansky

-Original Message-
From: raghavendra.k@barclays.com
Sent: Wednesday, February 5, 2014 6:30 PM
To: java-user@lucene.apache.org
Subject: Wildcard searches

Hi,

Can Lucene support wildcard searches such as the ones shown below?

Indexed value is "XYZ CORPORATION LIMITED".

XYZ CORPORATION LIMI*
XYZ CORPORATION *MIT*
XYZ *PORAT* LIMI*
*YZ CORPO* LIMITE*

In other words, the flexibility for the user to provide a wild card at any 
position, in a situation where they aren't sure about the exact value. 
Ignoring the performance aspect, please suggest if it is even possible. If yes, 
please provide further inputs on how to approach it such as Analyzer / 
Tokenizer to consider, whether PhraseQueries can be formed etc.

Any input is greatly appreciated.

Regards,
Raghu


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___ 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Inte

Re: Regarding CorruptedIndexException in using Lucene Facet Search

2014-02-06 Thread Jebarlin Robertson
Dear Shai,

Thank you for your reply.

Actually I am using Lucene3.6 in Android. It is working fine. but with the
latest versions there are some issues.
Now I just added this Facet search library also along with the old Lucene
code.
After this Facet search integration, it is giving these Corrupted and
NullpointerExcpetion when I add the document object to the IndexWriter.

Below is the exception.

02-07 12:38:11.006: W/System.err(5411): java.lang.NullPointerException
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.facet.index.streaming.CategoryParentsStream.incrementToken(CategoryParentsStream.java:138)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.facet.index.streaming.CountingListTokenizer.incrementToken(CountingListTokenizer.java:63)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.facet.index.streaming.CategoryTokenizer.incrementToken(CategoryTokenizer.java:48)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
02-07 12:38:11.006: W/System.err(5411): at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
02-07 12:38:11.006: W/System.err(5411): at
com.example.lucene.threads.AsyncIndexWriter.addDocumentSynchronous(AsyncIndexWriter.java:343)
02-07 12:38:11.006: W/System.err(5411): at
com.example.lucene.threads.AsyncIndexWriter.addDocumentSync(AsyncIndexWriter.java:371)


Just try to help, If I am missing something.

Thanks and regards,
Jebarlin.R


On Thu, Feb 6, 2014 at 11:04 PM, Shai Erera  wrote:

> It looks like something's wrong with the index indeed. Are you sure you
> committed both the IndexWriter and TaxoWriter?
> Do you have some sort of testcase / short program which demonstrates the
> problem?
>
> I know there were few issues running Lucene on Android, so I cannot
> guarantee it works fully .. we never tested this code on Android.
>
> Shai
>
>
> On Thu, Feb 6, 2014 at 3:21 PM, Jebarlin Robertson  >wrote:
>
> > Hi,
> >
> > I am using Lucene 3.6 version for indexing and searching in Android
> > application.
> > I have implemented Facet search. But when I try to search, it is giving
> the
> > below exception  while creating the DirectoryTaxonomyReader object.
> >
> > 02-06 21:00:58.082: W/System.err(15518):
> > org.apache.lucene.index.CorruptIndexException: Missing parent data for
> > category 1
> >
> >
> > Can anyone help me to know what is the problem for this. Whether the
> > Categories are not added to the Lucene index or some other problem.
> >
> >
> > It will be better if somebody provides some sample code to use lucene
> facet
> > for 3.6 version.
> >
> >
> > --
> > Thanks & Regards,
> > Jebarlin Robertson.R
> > GSM: 91-9538106181.
> >
>



-- 
Thanks & Regards,
Jebarlin Robertson.R
GSM: 91-9538106181.


Re: Actual min and max-value of NumericField during codec flush

2014-02-06 Thread Ravikumar Govindarajan
Thanks Mike,

Will try your suggestion. I will try to describe the actual use-case itself

There is a requirement for merging time-adjacent segments [append-only,
rolling time-series data]

All Documents have a timestamp affixed and during flush I need to note down
the least timestamp for all documents, through Codec.

Then, I define a TimeMergePolicy extends LogMergePolicy and define the
segment-size=Long.MAX_VALUE - SEG_LEAST_TIME [segment-diag].

LogMergePolicy will auto-arrange levels of segments according time and
proceed with merges. Latest segments will be lesser in size and preferred
during merges than older and bigger segments

Do you think such an approach will be fine or there are better ways to
solve this?

--
Ravi


On Thu, Feb 6, 2014 at 4:34 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Somewhere in those numeric trie terms are the exact integers from your
> documents, encoded.
>
> You can use oal.util.NumericUtils.prefixCodecToInt to get the int
> value back from the BytesRef term.
>
> But you need to filter out the "higher level" terms, e.g. using
> NumericUtils.getPrefixCodedLongShift(term) == 0.  Or use
> NumericUtils.filterPrefixCodedLongs to wrap a TermsEnum.  I believe
> all the terms you want come first, so once you hit a term where
> .getPrefixCodedLongShift is > 0, that's your max term and you can stop
> checking.
>
> BTW, in 5.0, the codec API for PostingsFormat has improved, so that
> you can e.g. pull your own TermsEnum and iterate the terms yourself.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 6, 2014 at 5:16 AM, Ravikumar Govindarajan
>  wrote:
> > I use a Codec to flush data. All methods delegate to actual
> Lucene42Codec,
> > except for intercepting one single-field. This field is indexed as an
> > IntField [Numeric-Trie...], with precisionStep=4.
> >
> > The purpose of the Codec is as follows
> >
> > 1. Note the first BytesRef for this field
> > 2. During finish() call [TermsConsumer.java], note the last BytesRef for
> > this field
> > 3. Converts both the first/last BytesRef to respective integers
> > 4. Store these 2 ints in segment-info diagnostics
> >
> > The problem with this approach is that, first/last BytesRef is totally
> > different from the actual "int" values I try to index. I guess, this is
> > because Numeric-Trie explodes all the integers into it's own format of
> > BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics
> >
> > Is there a way I can record actual min/max int-values correctly in my
> codec
> > and still support NumericRange search?
> >
> > --
> > Ravi
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Regarding CorruptedIndexException in using Lucene Facet Search

2014-02-06 Thread Jebarlin Robertson
Dear Shai,

I only made the mistake by using the same directory for both IndexWriter
and FacetWriter. Now it is working fine .Thank you :)

Could you please tell me if there is any major performance difference in
using *3.6 and 4.x* *Facet *library?.
Since I use the Lucene 3.6 version, I am using Facet library also the same
version.

Kindly guide me to use the best and the working one. :)
Thank you :)


Thanks and Regards,
Jebarlin Robertson.R



On Fri, Feb 7, 2014 at 12:41 PM, Jebarlin Robertson wrote:

> Dear Shai,
>
> Thank you for your reply.
>
> Actually I am using Lucene3.6 in Android. It is working fine. but with the
> latest versions there are some issues.
> Now I just added this Facet search library also along with the old Lucene
> code.
> After this Facet search integration, it is giving these Corrupted and
> NullpointerExcpetion when I add the document object to the IndexWriter.
>
> Below is the exception.
>
> 02-07 12:38:11.006: W/System.err(5411): java.lang.NullPointerException
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.facet.index.streaming.CategoryParentsStream.incrementToken(CategoryParentsStream.java:138)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.facet.index.streaming.CountingListTokenizer.incrementToken(CountingListTokenizer.java:63)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.facet.index.streaming.CategoryTokenizer.incrementToken(CategoryTokenizer.java:48)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
> 02-07 12:38:11.006: W/System.err(5411): at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
> 02-07 12:38:11.006: W/System.err(5411): at
> com.example.lucene.threads.AsyncIndexWriter.addDocumentSynchronous(AsyncIndexWriter.java:343)
> 02-07 12:38:11.006: W/System.err(5411): at
> com.example.lucene.threads.AsyncIndexWriter.addDocumentSync(AsyncIndexWriter.java:371)
>
>
> Just try to help, If I am missing something.
>
> Thanks and regards,
> Jebarlin.R
>
>
> On Thu, Feb 6, 2014 at 11:04 PM, Shai Erera  wrote:
>
>> It looks like something's wrong with the index indeed. Are you sure you
>> committed both the IndexWriter and TaxoWriter?
>> Do you have some sort of testcase / short program which demonstrates the
>> problem?
>>
>> I know there were few issues running Lucene on Android, so I cannot
>> guarantee it works fully .. we never tested this code on Android.
>>
>> Shai
>>
>>
>> On Thu, Feb 6, 2014 at 3:21 PM, Jebarlin Robertson > >wrote:
>>
>> > Hi,
>> >
>> > I am using Lucene 3.6 version for indexing and searching in Android
>> > application.
>> > I have implemented Facet search. But when I try to search, it is giving
>> the
>> > below exception  while creating the DirectoryTaxonomyReader object.
>> >
>> > 02-06 21:00:58.082: W/System.err(15518):
>> > org.apache.lucene.index.CorruptIndexException: Missing parent data for
>> > category 1
>> >
>> >
>> > Can anyone help me to know what is the problem for this. Whether the
>> > Categories are not added to the Lucene index or some other problem.
>> >
>> >
>> > It will be better if somebody provides some sample code to use lucene
>> facet
>> > for 3.6 version.
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Jebarlin Robertson.R
>> > GSM: 91-9538106181.
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Jebarlin Robertson.R
> GSM: 91-9538106181.
>



-- 
Thanks & Regards,
Jebarlin Robertson.R
GSM: 91-9538106181.