One simple hack which may or may not meet your objectives:

1) index each paragraph as if it were a document (this would then not allow 
Boolean across paragraphs, which could be a problem)

2) set the position increment gap to, say, 100 and then index each sentence 
within the paragraph as another value in a multivalued field.  This would then 
prevent phrasal matches across sentence boundaries if the user is searching for 
proximity < 100.

Another hack along the lines you mention would be to add in an impossible token 
"SENTENCE" or "PARAGRAPH" and then wrap the user's query as a SpanNotQuery.  
LUCENE-5205's SpanOnlyParser might be of use for this.

You may also want to look into the PostingsHighlighter's use of BreakIterator 
for ideas...It isn't immediately clear to me how that could be used for 
retrieval, but it does work for highlighting.

-----Original Message-----
From: Jigar Shah [mailto:jigaronl...@gmail.com] 
Sent: Monday, April 07, 2014 3:47 AM
To: java-user@lucene.apache.org
Subject: Proximity Search for SENTENCE and PARAGRAPH

Hello all,

I need to implement 2 features in my application:

1. "Proximity for words and phrases within the same sentence"

2. "Proximity for words and phrases within the same paragraph"

Doing some research on internet if found following things.

There is "ProximityQueryNode" which has some enum for this, but there seems
no support in parser for it.

As there are no out-of-the box support or some contrib, for such feature,
except one
https://github.com/markrmiller/qsol. which is not maintained.

There are some workarounds suggested like marking sentence/paragraph
boundaries. And then search using SpanQuery Api.

Please let me know if some work done for such features, or some proven
approach.

Thanks
Jigar Shah.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to