Re: Token Stream with Offsets (Token Sources class)

2013-04-09 Thread vempap
well, I found out the issue - it is because the maxDocCharsToAnalyze is 0 in the weightedSpanTermsExtractor by default. Works fine if I change there or use the QueryScorer which has a default limit of 51200. Thanks. - -- Phani -- View this message in context: http://lucene.472066.n3.nabble

Re: Token Stream with Offsets (Token Sources class)

2013-04-08 Thread vempap
I apologize. I did not know where exactly I needed to post this - I'll remove the others. As for indexing, I'm using Solr example docs script to post the documents & then using the mentioned code to get the token stream of that index. I've the following doc : ipod_video_1.xml : MA147LL/A

Token Stream with Offsets (Token Sources class)

2013-04-07 Thread vempap
Hi, I've the following snippet code where I'm trying to extract weighted span terms from the query (I do have term vectors enabled on the fields): File path = new File( ""); FSDirectory directory = FSDirectory.open(path);

WeightedSpanTermsExtractor

2013-04-05 Thread vempap
Hi, I've multiple fields (name, name2 - content copied below). And if I extract the weighted span terms out based on a query (the query is with a specific field) why am I not getting the positions properly out of the WeightedSpanTerm covering multiple fields ? Is it because the query is speci

To get Term Offsets of a term per document

2013-02-20 Thread vempap
Hello, Is there a way to get Term Offsets of a given term per document without enabling the termVectors ? Is it that Lucene index stores the positions but not the offsets by default - is it correct ? Thanks, Phani. -- View this message in context: http://lucene.472066.n3.nabble.com/To-get-

RE: StandardTokenizer generation from JFlex grammar

2012-10-04 Thread vempap
Thanks Steve for the pointers. I'll look into it. -- View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizer-generation-from-JFlex-grammar-tp4011940p4011944.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

StandardTokenizer generation from JFlex grammar

2012-10-04 Thread vempap
Hello, I'm trying to generate the standard tokenizer again using the jflex specification (StandardTokenizerImpl.jflex) but I'm not able to do so due to some errors (I would like to create my own jflex file using the standard tokenizer which is why I'm trying to first generate using that to get a

Re: SpanNearQuery distance issue

2012-09-19 Thread vempap
Shoot me. Thanks, I did not notice that the doc has ".. e a .." in the content. Thanks again for the reply :) -- View this message in context: http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008975p4009033.html Sent from the Lucene - Java Users mailing list archive at Nabble

SpanNearQuery distance issue

2012-09-19 Thread vempap
Hello All, I've a issue with respect to the distance measure of SpanNearQuery in Lucene. Let's say I've following two documents: DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001 1002 1003 1004 1005 1006 1007 1008 1009 1100", DocID: 7, content:"a b c d e a b c f g h i j k