Keeping in mind that Hoss's input is much more valuable than mine...
It sounds like you want what I originally tgave you. You want to be able to perform complex queries with the QueryParser and you want '-' and '_' to not break words, and you want quoted words to be tokenized as one token with no extra processing. Eric's concerns are obviously valid...but you are not hacking the lucene code for the new Standard Analyzer I hope...pull it out and make a your own...that should just be a custom analyzer that acts a lot like the standard analyzer. As far as the query parser...if you want to be able to mix normal searching with your quoted requirements you are going to have to make your own queryparser...no fun on that...so why not pull the queryparser out...make the single line change...and later, if the queryparser is updated...take the new one and make the single line change. Not that big of a nightmare.
Hopefully Hoss can give you something better...but from what I understand you want the queryparser language and you want your quotes deal...and they do not go together without the change I gave you. If their is another way to do it, its hard to believe it will be easier than maintaining a single line change in QueryParser.
Keep in mind I am a lucene beginner. Both Hoss and Erick are more knowledgeable than I am about Lucene. Just putting in my two sense.
- Mark Chris Hostetter wrote:
: Thanks for your input. I'm sure I could do as you suggest (and maybe that : will end up being my best option), but I had hoped to use a string for : creating the query object, particularly as some of my queries are a bit : complex. you have to clarify what you mean by "use a string for creating the query object" ... there's nothing in what i suggested that implies you can't do that, that's exactly what i'm suggesting you do... String input = ...; Analyzer a = new YourCustomAnalyzer(); // because you know your analyzer allways produces exactly one token... Token t = a.tokenStream("yourField", new StringReader(input)).next(); Query yourQuery = new TermQuery("yourField", t.termText()); ...if your queries are more complex then just the "exactish" matching you described before, then that's a seperate issue -- what you described didn't sound like it required any special input processing -- you said you had a "string" and you wanted to find exact matches on that string (with some normalization) ... but that you didn't want your input split on whitespace, or hyphens, or any of the "special" characters QueryParser uses. If you want other things then that certainly makes things more complicated, but the basic idea is still the same ... so what exactly do you mean when you say it's more complicated? : > I haven't really been following this thread, but it's gotten so long : > i got interested. : > : > from whta i can tell skimming the discussion so far, it seems like the : > biggest confusion is about the definition of a "phrase" and what analyzers : > do with "quote" characters and what the QueryParser does with "quote" : > charcters -- when ultimately you don't seem to really care about "phrases" : > in a textual searching sense; nor do you seem to care about any of the : > "features" of the QueryParser. : > : > it seems that what you care about is: : > : > 1) making documents, and adding a list of "text chunks" to those : > documents (what you've been calling phrases) : > 2) you then want to be able to search for "almost-exact" matches on those : > "text chunks" ... these matches should be "exactish" because you don't : > want partial matches based on white spaces, or splitting on hyphens, : > but they shouldn't be truely exact because you want some simple : > normalization... : > : > : actually would like to "normalize" a phrase (spaces) or a hyphenated : > word or : > : an underscored word to the same value -- e.g. MS-WORD or ms_WORd or "MS : > : Word" --> ms_word. : > : > ...in which case, you should: : > a) write yourself an analyzer which does no "tokenizing" (ie: each input : > Field value generates a single token) but does the normalization you : > want. : > b) use this Analyzer when you add the fields to your documents, even : > though you don't want *real* tokenization, add make the field type : > TOKENIZED so your analyzer gets used. : > c) when you get some text input to serach on, pass it to the same : > Analyzer, take the Token you get back and manualy construct a : > TermQuery out of it for the neccessary field. : > : > ...that's it. that's all she wrote -- don't even look in QueryParser's : > general direction, at all. : > : > : > : > -Hoss : > : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : > : : -- : View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6128827 : Sent from the Lucene - Java Users forum at Nabble.com. : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]