Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Chris Hostetter
: Sorry for the confusion and thanks for taking the time to educate me. So, if : I am just indexing literal values, what is the best way to do that (what : analyzer)? Sounds like this approach, even though it works, is not the : preferred method. if you truely want just the literal values then

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Philip Brown
PROTECTED]> > : Reply-To: java-user@lucene.apache.org > : To: java-user@lucene.apache.org > : Subject: Re: Phrase search using quotes -- special Tokenizer > : > : > : Here's a little sample program (borrowed some code from Erick Erickson > :)). > : Whether I add a

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Chris Hostetter
rom: Philip Brown <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Re: Phrase search using quotes -- special Tokenizer : : : Here's a little sample program (borrowed some code from Erick Erickson :)). : Whether I add as TOKENIZED or UN_

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Philip Brown
Here's a little sample program (borrowed some code from Erick Erickson :)). Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference in the output. Is this what you'd expect? - Philip package com.test; import java.io.IOException; import java.util.HashSet; import java.util.regex.

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Mark Miller
Some info to help you on you're journey :) 1. If you add a field as untokenized then it will not be analyzed when added to the index. However, QueryParser will not know that this happened and will tokenize queries on that field. 2. The solution that Hoss has explained to you is to leave the defa

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Chris Hostetter
: So, if I do as you suggest below (using PerFieldAnalyzerWrapper with : StandardAnalyzer) then I still need to enclose in quotes the phrases : (keywords with spaces) when I issue the search, and they are only returned Yes, quotes will be neccessary to tell the QueryParser "this is one chunk of t

Re: Phrase search using quotes -- special Tokenizer

2006-09-04 Thread Philip Brown
So, if I do as you suggest below (using PerFieldAnalyzerWrapper with StandardAnalyzer) then I still need to enclose in quotes the phrases (keywords with spaces) when I issue the search, and they are only returned in the results if the case is identical to how it was added? (This seems to be what

Re: Phrase search using quotes -- special Tokenizer

2006-09-04 Thread Chris Hostetter
: Yeah, they are more complex than the "exactish" match -- basically, there are : more fields involved -- combined sometimes with AND and sometimes with OR, : and sometimes negated field values, sometimes groupings, etc. These other : field values are all single words (no spaces), and a search mi

Re: Phrase search using quotes -- special Tokenizer

2006-09-04 Thread Mark Miller
More to consider: perhaps there is some way to get what you want by overriding getFieldQuery(String, String) instead. I have not been able to come up with anything simple off the top of my head, but overriding getFieldQuery would free you from having to make that line change on every Lucene up

Re: Phrase search using quotes -- special Tokenizer

2006-09-04 Thread Philip Brown
Yeah, they are more complex than the "exactish" match -- basically, there are more fields involved -- combined sometimes with AND and sometimes with OR, and sometimes negated field values, sometimes groupings, etc. These other field values are all single words (no spaces), and a search might invo

Re: Phrase search using quotes -- special Tokenizer

2006-09-04 Thread Mark Miller
Keeping in mind that Hoss's input is much more valuable than mine... It sounds like you want what I originally tgave you. You want to be able to perform complex queries with the QueryParser and you want '-' and '_' to not break words, and you want quoted words to be tokenized as one token with

Re: Phrase search using quotes -- special Tokenizer

2006-09-03 Thread Chris Hostetter
: Thanks for your input. I'm sure I could do as you suggest (and maybe that : will end up being my best option), but I had hoped to use a string for : creating the query object, particularly as some of my queries are a bit : complex. you have to clarify what you mean by "use a string for creatin

Re: Phrase search using quotes -- special Tokenizer

2006-09-03 Thread Philip Brown
Thanks for your input. I'm sure I could do as you suggest (and maybe that will end up being my best option), but I had hoped to use a string for creating the query object, particularly as some of my queries are a bit complex. Thanks. Chris Hostetter wrote: > > > I haven't really been followi

Re: Phrase search using quotes -- special Tokenizer

2006-09-03 Thread Erick Erickson
Yeah, what he said On 9/3/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: I haven't really been following this thread, but it's gotten so long i got interested. from whta i can tell skimming the discussion so far, it seems like the biggest confusion is about the definition of a "phrase" a

Re: Phrase search using quotes -- special Tokenizer

2006-09-03 Thread Chris Hostetter
I haven't really been following this thread, but it's gotten so long i got interested. from whta i can tell skimming the discussion so far, it seems like the biggest confusion is about the definition of a "phrase" and what analyzers do with "quote" characters and what the QueryParser does with "q

Re: Phrase search using quotes -- special Tokenizer

2006-09-03 Thread Philip Brown
Just as you, I would PREFER not to change any of the base Lucene code -- and I imagine there is still some way to do what I want (possibly by extending some other existing class) with what is already available. Regarding point 0) -- You are right in that if I add "test phrase" to index as UN_TO

Re: Phrase search using quotes -- special Tokenizer

2006-09-03 Thread Erick Erickson
Disclaimer: Of course I'm not as familiar with your problem space as you are, so I may be way out in left field, but... I *still* think you're making waay too much work for yourself and need to examine your assumptions. 0> But when you index something UN_TOKENIZED as in your example, I don't

Re: Phrase search using quotes -- special Tokenizer

2006-09-02 Thread Philip Brown
I tend to agree with Mark. I tried a query as so... TermQuery query = new TermQuery(new Term("keywordField", "phrase test")); IndexSearcher searcher= new IndexSearcher(activeIdx); Hits hits = searcher.search(query); And this produced the expected results. Whe

Re: Phrase search using quotes -- special Tokenizer

2006-09-02 Thread Mark Miller
I think if he wants to use the queryparser to parse his search strings that he has no choice but to modify it. It will eat any pair of quotes going through it no matter what analyzer is used. - Mark Well, you're flying blind. Is the behavior rooted in the indexing or querying? Since you can't

Re: Phrase search using quotes -- special Tokenizer

2006-09-02 Thread Erick Erickson
Well, you're flying blind. Is the behavior rooted in the indexing or querying? Since you can't answer that, you're reduced to trying random things hoping that one of them works. A little like voodoo. I've wasted farr too much time trying to solve what I was *sure* was the problem only to f

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Well Philip...bad news. I should have thought of this before...I think the query parser is the problem. You are tokening "all in the quotes" to one token...but when QueryParser sees that, it doesnt matter what analyzer you use, it's going to see the quotes and strip them right off . Then it pas

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
I am out of ideas. If I'm feeling perky I'll build you one in the morning. No, I've never used Luke. Is there an easy way to examine my RAMDirectory index? I can create the index with no quoted keywords, and when I search for a keyword, I get back the expected results (just can't search for a p

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
No, I've never used Luke. Is there an easy way to examine my RAMDirectory index? I can create the index with no quoted keywords, and when I search for a keyword, I get back the expected results (just can't search for a phrase that has whitespace in it). If I create the index with phrases in quo

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Erick Erickson
OK, I've gotta ask. Have you examined your index with Luke to see if what you *think* is in the index actually *is*??? Erick On 9/1/06, Philip Brown <[EMAIL PROTECTED]> wrote: Interesting...just ran a test where I put double quotes around everything (including single keywords) of source text

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Interesting...just ran a test where I put double quotes around everything (including single keywords) of source text and then ran searches for a known keyword with and without double quotes -- doesn't find either time. Mark Miller-5 wrote: > > Sorry to hear you're having trouble. You indeed nee

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Added the to the other section and reran the javacc and imported the new files...but, I still get the same result -- no results. (Quotes are in the source text and query string.) Anything else I might be missing? Philip Mark Miller-5 wrote: > > Sorry to hear you're having trouble. You indee

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Sorry to hear you're having trouble. You indeed need the double quotes in the source text. You will also need them in the query string. Make sure they are in both places. My machine is hosed right now or I would do it for you real quick. My guess is that I forgot to mention...no only do you need t

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Well, I tried that, and it doesn't seem to work still. I would be happy to zip up the new files, so you can see what I'm using -- maybe you can get it to work. The first time, I tried building the documents without quotes surrounding each phrase. Then, I retried by enclosing every phrase within

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
That is a good point. I was just thinking that it would be a pain for searchers to have to include the quotes when searching, but I guess there is little way around it. The best you could do is have an option that specified a quoted search...and you might as well make that option be to put the

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Thanks, but I don't "think" I need that. But curious, how will it know it's a phrase if it's not enclosed in quotes? Won't all its terms be treated separately then? Philip Mark Miller-5 wrote: > > One more tip...if you would like to be able to search phrases without > putting in the quotes

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
One more tip...if you would like to be able to search phrases without putting in the quotes you must strip them with the analyzer. In standardfilter (in the standard analyzer code) add this: private static final String QUOTED_TYPE = tokenImage[QUOTED]; - youll see where to put that and youll s

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
So this will recognize anything in quotes as a single token and '_' and '-' will not break up words. There may be some repercussions for the NUM token but nothing I'd worry about. maybe you want to use Unicode for '-' and '_' as well...I wouldn't worry about it myself. - Mark TOKEN : {

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Philip Brown wrote: Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm not seeing StandardAnalyzer.jj in the Lucene source download. Mark Miller-5 wrote: Philip Brow

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm not seeing StandardAnalyzer.jj in the Lucene source download. Mark Miller-5 wrote: > > Philip Brown wrote: >> Hi, >> >

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Philip Brown wrote: Hi, After running some tests using the StandardAnalyzer, and getting 0 results from the search, I believe I need a special Tokenizer/Analyzer. Does anybody have something that parses like the following: - doesn't parse apart phrases (in quotes) - doesn't parse/separate hyph