Actual min and max-value of NumericField during codec flush
I use a Codec to flush data. All methods delegate to actual Lucene42Codec, except for intercepting one single-field. This field is indexed as an IntField [Numeric-Trie...], with precisionStep=4. The purpose of the Codec is as follows 1. Note the first BytesRef for this field 2. During finish() call [TermsConsumer.java], note the last BytesRef for this field 3. Converts both the first/last BytesRef to respective integers 4. Store these 2 ints in segment-info diagnostics The problem with this approach is that, first/last BytesRef is totally different from the actual "int" values I try to index. I guess, this is because Numeric-Trie explodes all the integers into it's own format of BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics Is there a way I can record actual min/max int-values correctly in my codec and still support NumericRange search? -- Ravi
Re: Actual min and max-value of NumericField during codec flush
Somewhere in those numeric trie terms are the exact integers from your documents, encoded. You can use oal.util.NumericUtils.prefixCodecToInt to get the int value back from the BytesRef term. But you need to filter out the "higher level" terms, e.g. using NumericUtils.getPrefixCodedLongShift(term) == 0. Or use NumericUtils.filterPrefixCodedLongs to wrap a TermsEnum. I believe all the terms you want come first, so once you hit a term where .getPrefixCodedLongShift is > 0, that's your max term and you can stop checking. BTW, in 5.0, the codec API for PostingsFormat has improved, so that you can e.g. pull your own TermsEnum and iterate the terms yourself. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 6, 2014 at 5:16 AM, Ravikumar Govindarajan wrote: > I use a Codec to flush data. All methods delegate to actual Lucene42Codec, > except for intercepting one single-field. This field is indexed as an > IntField [Numeric-Trie...], with precisionStep=4. > > The purpose of the Codec is as follows > > 1. Note the first BytesRef for this field > 2. During finish() call [TermsConsumer.java], note the last BytesRef for > this field > 3. Converts both the first/last BytesRef to respective integers > 4. Store these 2 ints in segment-info diagnostics > > The problem with this approach is that, first/last BytesRef is totally > different from the actual "int" values I try to index. I guess, this is > because Numeric-Trie explodes all the integers into it's own format of > BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics > > Is there a way I can record actual min/max int-values correctly in my codec > and still support NumericRange search? > > -- > Ravi - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Wildcard searches
Ditto Jack on ComplexPhraseQueryParser. See also: https://issues.apache.org/jira/i#browse/LUCENE-5205 -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, February 05, 2014 6:59 PM To: java-user@lucene.apache.org Subject: Re: Wildcard searches Take a look at the complex phrase query parser. See: http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html See also: https://issues.apache.org/jira/browse/LUCENE-1486 -- Jack Krupansky -Original Message- From: raghavendra.k@barclays.com Sent: Wednesday, February 5, 2014 6:30 PM To: java-user@lucene.apache.org Subject: Wildcard searches Hi, Can Lucene support wildcard searches such as the ones shown below? Indexed value is "XYZ CORPORATION LIMITED". XYZ CORPORATION LIMI* XYZ CORPORATION *MIT* XYZ *PORAT* LIMI* *YZ CORPO* LIMITE* In other words, the flexibility for the user to provide a wild card at any position, in a situation where they aren't sure about the exact value. Ignoring the performance aspect, please suggest if it is even possible. If yes, please provide further inputs on how to approach it such as Analyzer / Tokenizer to consider, whether PhraseQueries can be formed etc. Any input is greatly appreciated. Regards, Raghu ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Highlighting text, do I seriously have to reimplement this from scratch?
On 2/6/2014 12:53 AM, Earl Hood wrote: On Tue, Feb 4, 2014 at 6:05 PM, Michael Sokolov wrote: Thanks for the feedback. I think it's difficult to know what to do about attribute value highlighting in the general case - do you have any suggestions? That is a challenging one since one has to know how attribute data will be transformed for rendering purposes. I do not know the workings of Lux, so I cannot provide any specific suggestions on what Lux can do. I would need time to dive into it. However, one solution is to workaround the limitation by preprocessing the data in a form that is friendly to Lux (or at least the highligher). For example, if I have attribute data I know will be transformed into renderable content, I would transform it into element-style content, which should be more friendly for indexing and highlighting purposes. Lux's XmlHighlighter wraps matching text in an XML element tag. The name of the tag is configurable. But it won't work for attribute values since XML doesn't allow "<" in an attribute value. I think Olivier's suggestion of providing a callback is interesting; that way we can provide the user much greater control, and the "highlighter" can actually become more of a query-driven document-processing engine: you could imagine fairly complex document transformations driven by Lucene query matching. I created http://issues.luxdb.org/browse/LUX-73 to track that. If anybody is interested in continuing this discussion, I'd suggest picking it up over on Lux's mailing list at lu...@luxdb.org since this seems a little off topic here. -Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Regarding CorruptedIndexException in using Lucene Facet Search
Hi, I am using Lucene 3.6 version for indexing and searching in Android application. I have implemented Facet search. But when I try to search, it is giving the below exception while creating the DirectoryTaxonomyReader object. 02-06 21:00:58.082: W/System.err(15518): org.apache.lucene.index.CorruptIndexException: Missing parent data for category 1 Can anyone help me to know what is the problem for this. Whether the Categories are not added to the Lucene index or some other problem. It will be better if somebody provides some sample code to use lucene facet for 3.6 version. -- Thanks & Regards, Jebarlin Robertson.R GSM: 91-9538106181.
Re: Regarding CorruptedIndexException in using Lucene Facet Search
It looks like something's wrong with the index indeed. Are you sure you committed both the IndexWriter and TaxoWriter? Do you have some sort of testcase / short program which demonstrates the problem? I know there were few issues running Lucene on Android, so I cannot guarantee it works fully .. we never tested this code on Android. Shai On Thu, Feb 6, 2014 at 3:21 PM, Jebarlin Robertson wrote: > Hi, > > I am using Lucene 3.6 version for indexing and searching in Android > application. > I have implemented Facet search. But when I try to search, it is giving the > below exception while creating the DirectoryTaxonomyReader object. > > 02-06 21:00:58.082: W/System.err(15518): > org.apache.lucene.index.CorruptIndexException: Missing parent data for > category 1 > > > Can anyone help me to know what is the problem for this. Whether the > Categories are not added to the Lucene index or some other problem. > > > It will be better if somebody provides some sample code to use lucene facet > for 3.6 version. > > > -- > Thanks & Regards, > Jebarlin Robertson.R > GSM: 91-9538106181. >
RE: Wildcard searches
Thank you, Tim. I have read that ComplexPhraseQueryParser has issues while searching in more than one field. In my case, I need to search the value in multiple fields of each document. Do you think it is possible? Also, could you please direct me to any useful links for ComplexPhraseQueryParser that you may be aware of? I am looking for some examples. Thanks! Regards, Raghu -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, February 06, 2014 8:02 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Ditto Jack on ComplexPhraseQueryParser. See also: https://issues.apache.org/jira/i#browse/LUCENE-5205 -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, February 05, 2014 6:59 PM To: java-user@lucene.apache.org Subject: Re: Wildcard searches Take a look at the complex phrase query parser. See: http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html See also: https://issues.apache.org/jira/browse/LUCENE-1486 -- Jack Krupansky -Original Message- From: raghavendra.k@barclays.com Sent: Wednesday, February 5, 2014 6:30 PM To: java-user@lucene.apache.org Subject: Wildcard searches Hi, Can Lucene support wildcard searches such as the ones shown below? Indexed value is "XYZ CORPORATION LIMITED". XYZ CORPORATION LIMI* XYZ CORPORATION *MIT* XYZ *PORAT* LIMI* *YZ CORPO* LIMITE* In other words, the flexibility for the user to provide a wild card at any position, in a situation where they aren't sure about the exact value. Ignoring the performance aspect, please suggest if it is even possible. If yes, please provide further inputs on how to approach it such as Analyzer / Tokenizer to consider, whether PhraseQueries can be formed etc. Any input is greatly appreciated. Regards, Raghu ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Wildcard searches
Sorry, you're right. I'm not sure that it analyzes multiterm components, either. The Surround query parser also has similar limitations. Best bet might be to compile: https://issues.apache.org/jira/i#browse/LUCENE-5205 or https://issues.apache.org/jira/browse/LUCENE-1486 . -Original Message- From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] Sent: Thursday, February 06, 2014 11:49 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Thank you, Tim. I have read that ComplexPhraseQueryParser has issues while searching in more than one field. In my case, I need to search the value in multiple fields of each document. Do you think it is possible? Also, could you please direct me to any useful links for ComplexPhraseQueryParser that you may be aware of? I am looking for some examples. Thanks! Regards, Raghu -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, February 06, 2014 8:02 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Ditto Jack on ComplexPhraseQueryParser. See also: https://issues.apache.org/jira/i#browse/LUCENE-5205 -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, February 05, 2014 6:59 PM To: java-user@lucene.apache.org Subject: Re: Wildcard searches Take a look at the complex phrase query parser. See: http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html See also: https://issues.apache.org/jira/browse/LUCENE-1486 -- Jack Krupansky -Original Message- From: raghavendra.k@barclays.com Sent: Wednesday, February 5, 2014 6:30 PM To: java-user@lucene.apache.org Subject: Wildcard searches Hi, Can Lucene support wildcard searches such as the ones shown below? Indexed value is "XYZ CORPORATION LIMITED". XYZ CORPORATION LIMI* XYZ CORPORATION *MIT* XYZ *PORAT* LIMI* *YZ CORPO* LIMITE* In other words, the flexibility for the user to provide a wild card at any position, in a situation where they aren't sure about the exact value. Ignoring the performance aspect, please suggest if it is even possible. If yes, please provide further inputs on how to approach it such as Analyzer / Tokenizer to consider, whether PhraseQueries can be formed etc. Any input is greatly appreciated. Regards, Raghu ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Wildcard searches
Sorry, but I don't know what exactly you mean by compile from these locations. Do you mean I could download and customize the code? Regards, Raghu -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, February 06, 2014 2:35 PM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Sorry, you're right. I'm not sure that it analyzes multiterm components, either. The Surround query parser also has similar limitations. Best bet might be to compile: https://issues.apache.org/jira/i#browse/LUCENE-5205 or https://issues.apache.org/jira/browse/LUCENE-1486 . -Original Message- From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] Sent: Thursday, February 06, 2014 11:49 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Thank you, Tim. I have read that ComplexPhraseQueryParser has issues while searching in more than one field. In my case, I need to search the value in multiple fields of each document. Do you think it is possible? Also, could you please direct me to any useful links for ComplexPhraseQueryParser that you may be aware of? I am looking for some examples. Thanks! Regards, Raghu -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, February 06, 2014 8:02 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Ditto Jack on ComplexPhraseQueryParser. See also: https://issues.apache.org/jira/i#browse/LUCENE-5205 -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, February 05, 2014 6:59 PM To: java-user@lucene.apache.org Subject: Re: Wildcard searches Take a look at the complex phrase query parser. See: http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html See also: https://issues.apache.org/jira/browse/LUCENE-1486 -- Jack Krupansky -Original Message- From: raghavendra.k@barclays.com Sent: Wednesday, February 5, 2014 6:30 PM To: java-user@lucene.apache.org Subject: Wildcard searches Hi, Can Lucene support wildcard searches such as the ones shown below? Indexed value is "XYZ CORPORATION LIMITED". XYZ CORPORATION LIMI* XYZ CORPORATION *MIT* XYZ *PORAT* LIMI* *YZ CORPO* LIMITE* In other words, the flexibility for the user to provide a wild card at any position, in a situation where they aren't sure about the exact value. Ignoring the performance aspect, please suggest if it is even possible. If yes, please provide further inputs on how to approach it such as Analyzer / Tokenizer to consider, whether PhraseQueries can be formed etc. Any input is greatly appreciated. Regards, Raghu ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Wildcard searches
Y, if you're game to try it, that is one option. If svn and build tools (mvn) are new to you, there are some startup costs to getting it up and running (e.g. http://stackoverflow.com/a/10073686/2938927). This might help: http://lucene.472066.n3.nabble.com/How-can-I-compile-and-debug-Solr-from-source-code-td4049712.html Or this for Windows 7/8: http://wiki.apache.org/solr/HowToCompileSolr I also might post jars for LUCENE-5205 and SOLR-5410 on github if there is interest. -Original Message- From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] Sent: Thursday, February 06, 2014 5:19 PM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Sorry, but I don't know what exactly you mean by compile from these locations. Do you mean I could download and customize the code? Regards, Raghu -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, February 06, 2014 2:35 PM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Sorry, you're right. I'm not sure that it analyzes multiterm components, either. The Surround query parser also has similar limitations. Best bet might be to compile: https://issues.apache.org/jira/i#browse/LUCENE-5205 or https://issues.apache.org/jira/browse/LUCENE-1486 . -Original Message- From: raghavendra.k@barclays.com [mailto:raghavendra.k@barclays.com] Sent: Thursday, February 06, 2014 11:49 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Thank you, Tim. I have read that ComplexPhraseQueryParser has issues while searching in more than one field. In my case, I need to search the value in multiple fields of each document. Do you think it is possible? Also, could you please direct me to any useful links for ComplexPhraseQueryParser that you may be aware of? I am looking for some examples. Thanks! Regards, Raghu -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, February 06, 2014 8:02 AM To: java-user@lucene.apache.org Subject: RE: Wildcard searches Ditto Jack on ComplexPhraseQueryParser. See also: https://issues.apache.org/jira/i#browse/LUCENE-5205 -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, February 05, 2014 6:59 PM To: java-user@lucene.apache.org Subject: Re: Wildcard searches Take a look at the complex phrase query parser. See: http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html See also: https://issues.apache.org/jira/browse/LUCENE-1486 -- Jack Krupansky -Original Message- From: raghavendra.k@barclays.com Sent: Wednesday, February 5, 2014 6:30 PM To: java-user@lucene.apache.org Subject: Wildcard searches Hi, Can Lucene support wildcard searches such as the ones shown below? Indexed value is "XYZ CORPORATION LIMITED". XYZ CORPORATION LIMI* XYZ CORPORATION *MIT* XYZ *PORAT* LIMI* *YZ CORPO* LIMITE* In other words, the flexibility for the user to provide a wild card at any position, in a situation where they aren't sure about the exact value. Ignoring the performance aspect, please suggest if it is even possible. If yes, please provide further inputs on how to approach it such as Analyzer / Tokenizer to consider, whether PhraseQueries can be formed etc. Any input is greatly appreciated. Regards, Raghu ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Inte
Re: Regarding CorruptedIndexException in using Lucene Facet Search
Dear Shai, Thank you for your reply. Actually I am using Lucene3.6 in Android. It is working fine. but with the latest versions there are some issues. Now I just added this Facet search library also along with the old Lucene code. After this Facet search integration, it is giving these Corrupted and NullpointerExcpetion when I add the document object to the IndexWriter. Below is the exception. 02-07 12:38:11.006: W/System.err(5411): java.lang.NullPointerException 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.facet.index.streaming.CategoryParentsStream.incrementToken(CategoryParentsStream.java:138) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.facet.index.streaming.CountingListTokenizer.incrementToken(CountingListTokenizer.java:63) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.facet.index.streaming.CategoryTokenizer.incrementToken(CategoryTokenizer.java:48) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060) 02-07 12:38:11.006: W/System.err(5411): at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034) 02-07 12:38:11.006: W/System.err(5411): at com.example.lucene.threads.AsyncIndexWriter.addDocumentSynchronous(AsyncIndexWriter.java:343) 02-07 12:38:11.006: W/System.err(5411): at com.example.lucene.threads.AsyncIndexWriter.addDocumentSync(AsyncIndexWriter.java:371) Just try to help, If I am missing something. Thanks and regards, Jebarlin.R On Thu, Feb 6, 2014 at 11:04 PM, Shai Erera wrote: > It looks like something's wrong with the index indeed. Are you sure you > committed both the IndexWriter and TaxoWriter? > Do you have some sort of testcase / short program which demonstrates the > problem? > > I know there were few issues running Lucene on Android, so I cannot > guarantee it works fully .. we never tested this code on Android. > > Shai > > > On Thu, Feb 6, 2014 at 3:21 PM, Jebarlin Robertson >wrote: > > > Hi, > > > > I am using Lucene 3.6 version for indexing and searching in Android > > application. > > I have implemented Facet search. But when I try to search, it is giving > the > > below exception while creating the DirectoryTaxonomyReader object. > > > > 02-06 21:00:58.082: W/System.err(15518): > > org.apache.lucene.index.CorruptIndexException: Missing parent data for > > category 1 > > > > > > Can anyone help me to know what is the problem for this. Whether the > > Categories are not added to the Lucene index or some other problem. > > > > > > It will be better if somebody provides some sample code to use lucene > facet > > for 3.6 version. > > > > > > -- > > Thanks & Regards, > > Jebarlin Robertson.R > > GSM: 91-9538106181. > > > -- Thanks & Regards, Jebarlin Robertson.R GSM: 91-9538106181.
Re: Actual min and max-value of NumericField during codec flush
Thanks Mike, Will try your suggestion. I will try to describe the actual use-case itself There is a requirement for merging time-adjacent segments [append-only, rolling time-series data] All Documents have a timestamp affixed and during flush I need to note down the least timestamp for all documents, through Codec. Then, I define a TimeMergePolicy extends LogMergePolicy and define the segment-size=Long.MAX_VALUE - SEG_LEAST_TIME [segment-diag]. LogMergePolicy will auto-arrange levels of segments according time and proceed with merges. Latest segments will be lesser in size and preferred during merges than older and bigger segments Do you think such an approach will be fine or there are better ways to solve this? -- Ravi On Thu, Feb 6, 2014 at 4:34 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Somewhere in those numeric trie terms are the exact integers from your > documents, encoded. > > You can use oal.util.NumericUtils.prefixCodecToInt to get the int > value back from the BytesRef term. > > But you need to filter out the "higher level" terms, e.g. using > NumericUtils.getPrefixCodedLongShift(term) == 0. Or use > NumericUtils.filterPrefixCodedLongs to wrap a TermsEnum. I believe > all the terms you want come first, so once you hit a term where > .getPrefixCodedLongShift is > 0, that's your max term and you can stop > checking. > > BTW, in 5.0, the codec API for PostingsFormat has improved, so that > you can e.g. pull your own TermsEnum and iterate the terms yourself. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Feb 6, 2014 at 5:16 AM, Ravikumar Govindarajan > wrote: > > I use a Codec to flush data. All methods delegate to actual > Lucene42Codec, > > except for intercepting one single-field. This field is indexed as an > > IntField [Numeric-Trie...], with precisionStep=4. > > > > The purpose of the Codec is as follows > > > > 1. Note the first BytesRef for this field > > 2. During finish() call [TermsConsumer.java], note the last BytesRef for > > this field > > 3. Converts both the first/last BytesRef to respective integers > > 4. Store these 2 ints in segment-info diagnostics > > > > The problem with this approach is that, first/last BytesRef is totally > > different from the actual "int" values I try to index. I guess, this is > > because Numeric-Trie explodes all the integers into it's own format of > > BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics > > > > Is there a way I can record actual min/max int-values correctly in my > codec > > and still support NumericRange search? > > > > -- > > Ravi > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Regarding CorruptedIndexException in using Lucene Facet Search
Dear Shai, I only made the mistake by using the same directory for both IndexWriter and FacetWriter. Now it is working fine .Thank you :) Could you please tell me if there is any major performance difference in using *3.6 and 4.x* *Facet *library?. Since I use the Lucene 3.6 version, I am using Facet library also the same version. Kindly guide me to use the best and the working one. :) Thank you :) Thanks and Regards, Jebarlin Robertson.R On Fri, Feb 7, 2014 at 12:41 PM, Jebarlin Robertson wrote: > Dear Shai, > > Thank you for your reply. > > Actually I am using Lucene3.6 in Android. It is working fine. but with the > latest versions there are some issues. > Now I just added this Facet search library also along with the old Lucene > code. > After this Facet search integration, it is giving these Corrupted and > NullpointerExcpetion when I add the document object to the IndexWriter. > > Below is the exception. > > 02-07 12:38:11.006: W/System.err(5411): java.lang.NullPointerException > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.facet.index.streaming.CategoryParentsStream.incrementToken(CategoryParentsStream.java:138) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.facet.index.streaming.CountingListTokenizer.incrementToken(CountingListTokenizer.java:63) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.facet.index.streaming.CategoryTokenizer.incrementToken(CategoryTokenizer.java:48) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060) > 02-07 12:38:11.006: W/System.err(5411): at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034) > 02-07 12:38:11.006: W/System.err(5411): at > com.example.lucene.threads.AsyncIndexWriter.addDocumentSynchronous(AsyncIndexWriter.java:343) > 02-07 12:38:11.006: W/System.err(5411): at > com.example.lucene.threads.AsyncIndexWriter.addDocumentSync(AsyncIndexWriter.java:371) > > > Just try to help, If I am missing something. > > Thanks and regards, > Jebarlin.R > > > On Thu, Feb 6, 2014 at 11:04 PM, Shai Erera wrote: > >> It looks like something's wrong with the index indeed. Are you sure you >> committed both the IndexWriter and TaxoWriter? >> Do you have some sort of testcase / short program which demonstrates the >> problem? >> >> I know there were few issues running Lucene on Android, so I cannot >> guarantee it works fully .. we never tested this code on Android. >> >> Shai >> >> >> On Thu, Feb 6, 2014 at 3:21 PM, Jebarlin Robertson > >wrote: >> >> > Hi, >> > >> > I am using Lucene 3.6 version for indexing and searching in Android >> > application. >> > I have implemented Facet search. But when I try to search, it is giving >> the >> > below exception while creating the DirectoryTaxonomyReader object. >> > >> > 02-06 21:00:58.082: W/System.err(15518): >> > org.apache.lucene.index.CorruptIndexException: Missing parent data for >> > category 1 >> > >> > >> > Can anyone help me to know what is the problem for this. Whether the >> > Categories are not added to the Lucene index or some other problem. >> > >> > >> > It will be better if somebody provides some sample code to use lucene >> facet >> > for 3.6 version. >> > >> > >> > -- >> > Thanks & Regards, >> > Jebarlin Robertson.R >> > GSM: 91-9538106181. >> > >> > > > > -- > Thanks & Regards, > Jebarlin Robertson.R > GSM: 91-9538106181. > -- Thanks & Regards, Jebarlin Robertson.R GSM: 91-9538106181.