: One thing that I know has bogged me is when matching a phrase where I : would expect mathematical formula (which is "just a subphrase"). I : would have liked the phrase-query to extend as far as it wishes but not : passed a given token... would this be possible ? : Presumably a period token and this feature would have provided the same?
I haven't tried it myself, but my reading of SpanQueries leads me to believe you could accomplish what you want (and what Grant describes) by inserting special Terms to denote formula/sentance/paragraph/section/chapter boundaries, and then use SpanNearQueries with a high slop in conjunction with a SpanNotQuery using a SpanTermQuery for the boundary you don't want to cross. (or a SpanOrQuery containing many SpanTermQueries for the list of boundaries you don't want to cross). If you get your Tokenizer to put the special boundary terms at the exact same position as the token it marks, regular PhraseQueries should still work fine without needing any special slop, and you could do stuff like say "find me this phrase near the begining of a sentence". Or "find me this phrase near the end of a chapter" : > Was wondering what people's experience is with storing sentence (or : > other) boundary information in Lucene. For instance, for phrase : > queries, you may not want to match when two terms lie on either side : > of a sentence boundary. I know for phrase queries the common approach : > is to make the position increment larger than one, which solves that : > immediate problem, but I have other uses for such information, too. : > Should I just store some type of boundary marker at the appropriate : > position and check to see if I have a boundary marker when doing my : > processing? I know I need an Analyzer that can detect the boundaries, : > for starters. What other issues have people run up against? -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]