Hi Sirish, StandardTokenizer does not produce a token from '#', as you suspected. Something that fits the "word" definition, but which won't ever be encountered in your documents, is what you should use for the delimiter - something like a1b2c3c2b1a .
Sentence boundary handling is clunky in Lucene right now - there has been some discussion of how to directly support this kind of thing, but no code at this point. Steve > -----Original Message----- > From: Sirish Vadala [mailto:sirishre...@gmail.com] > Sent: Thursday, October 07, 2010 7:13 PM > To: java-user@lucene.apache.org > Subject: RE: Issue with sentence specific search > > > Hi Steven, > > I have implemented sentence specific proximity search as suggested below. > However, unfortunately it still doesn't identify the sentence boundaries > for > my search. > > I am using # as a delimiter between my sentences while indexing the > content: > > ------------ > ArrayList<String> sentencesList = sentenceScanner.getAllSentences(); > StringBuffer textWithToken = new StringBuffer(); > for (String sentence : sentencesList){ > textWithToken.append(sentence + " # "); > } > addFieldToDocument(document, IFIELD_TEXT, textWithToken.toString(), true, > true); > ------------ > * Used StandardAnalyzer to initialize the indexWriter while adding the > document > > This is how I am performing my search: > > ------------ > Query query = null; > strQuery = strQuery.replaceAll("\\s+", " "); > String[] spanTerms = strQuery.split(" "); > SpanQuery[] spanQueries = new SpanQuery[spanTerms.length]; > for (int count = 0; count < spanTerms.length; count++) { > String spanTerm = spanTerms[count]; > spanQueries[count] = new SpanTermQuery(new Term(field, spanTerm)); > } > if(!withinSentence){ > SpanQuery spanQuery = new SpanNearQuery(spanQueries, span, true); > query = spanQuery; > } else if (withinSentence){ > SpanQuery queryInclude = new SpanNearQuery(spanQueries, span, true); > SpanQuery queryExclude = new SpanTermQuery(new Term(field, "#")); > SpanQuery spanNotQuery = new SpanNotQuery(queryInclude, > queryExclude); > query = spanNotQuery; > } > bQuery.add(query, BooleanClause.Occur.MUST); > > ------------ > > When I eventually read my query on the console, this is how it looks in > both > cases: > > With no sentence boundary > +(author:amanda) +spanNear([text:efficiency, text:delta], 10, true) > +(year:2009 year:2010) > > With sentence boundary > +(author:amanda) +spanNot(spanNear([text:efficiency, text:delta], 10, > true), > text:#) +(year:2009 year:2010) > > My guess is that probably, my index isn't saving the sentence boundary > value > # as a separate term. Any hints or pointers on where exactly I am > mis-implementing would be highly appreciated. > > Thanks. > -- > View this message in context: http://lucene.472066.n3.nabble.com/Issue- > with-sentence-specific-search-tp1644352p1651512.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org