RE: Issue with sentence specific search

Steven A Rowe Thu, 07 Oct 2010 19:28:30 -0700

Hi Sirish,

StandardTokenizer does not produce a token from '#', as you suspected.  
Something that fits the "word" definition, but which won't ever be encountered 
in your documents, is what you should use for the delimiter - something like 
a1b2c3c2b1a .


Sentence boundary handling is clunky in Lucene right now - there has been some 
discussion of how to directly support this kind of thing, but no code at this 
point.

Steve

> -----Original Message-----
> From: Sirish Vadala [mailto:sirishre...@gmail.com]
> Sent: Thursday, October 07, 2010 7:13 PM
> To: java-user@lucene.apache.org
> Subject: RE: Issue with sentence specific search
> 
> 
> Hi Steven,
> 
> I have implemented sentence specific proximity search as suggested below.
> However, unfortunately it still doesn't identify the sentence boundaries
> for
> my search.
> 
> I am using # as a delimiter between my sentences while indexing the
> content:
> 
> ------------
> ArrayList<String> sentencesList = sentenceScanner.getAllSentences();
> StringBuffer textWithToken = new StringBuffer();
> for (String sentence : sentencesList){
>       textWithToken.append(sentence + " # ");
> }
> addFieldToDocument(document, IFIELD_TEXT, textWithToken.toString(), true,
> true);
> ------------
> * Used StandardAnalyzer to initialize the indexWriter while adding the
> document
> 
> This is how I am performing my search:
> 
> ------------
> Query query = null;
> strQuery = strQuery.replaceAll("\\s+", " ");
> String[] spanTerms = strQuery.split(" ");
> SpanQuery[] spanQueries = new SpanQuery[spanTerms.length];
> for (int count = 0; count < spanTerms.length; count++) {
>       String spanTerm = spanTerms[count];
>       spanQueries[count] = new SpanTermQuery(new Term(field, spanTerm));
> }
> if(!withinSentence){
>       SpanQuery spanQuery = new SpanNearQuery(spanQueries, span, true);
>       query = spanQuery;
> } else if (withinSentence){
>       SpanQuery queryInclude = new SpanNearQuery(spanQueries, span, true);
>       SpanQuery queryExclude = new SpanTermQuery(new Term(field, "#"));
>       SpanQuery spanNotQuery = new SpanNotQuery(queryInclude,
> queryExclude);
>       query = spanNotQuery;
> }
> bQuery.add(query, BooleanClause.Occur.MUST);
> 
> ------------
> 
> When I eventually read my query on the console, this is how it looks in
> both
> cases:
> 
> With no sentence boundary
> +(author:amanda) +spanNear([text:efficiency, text:delta], 10, true)
> +(year:2009 year:2010)
> 
> With sentence boundary
> +(author:amanda) +spanNot(spanNear([text:efficiency, text:delta], 10,
> true),
> text:#) +(year:2009 year:2010)
> 
> My guess is that probably, my index isn't saving the sentence boundary
> value
> # as a separate term. Any hints or pointers on where exactly I am
> mis-implementing would be highly appreciated.
> 
> Thanks.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Issue-
> with-sentence-specific-search-tp1644352p1651512.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Issue with sentence specific search

Reply via email to