subject:"Re\: Wildcard query with untokenized punctuation"

Re: Wildcard query with untokenized punctuation (again)

2007-06-15 Thread Erick Erickson

he query <> is parsed to > PhraseQuery("smith ann"). > And that seems right, from a user standpoint. > > In fact, considering this, I realize <> should be parsed > to MultiPhraseQuery("smith", "ann*"), not <<+smith +ann*>> as I sa

RE: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Renaud Waldura

his issue: how to get QueryParser to generate MultiPhraseQueries. Got some good ideas from it, but unfortunately no complete solution. I'll keep on hacking. --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 12:07 PM To: java-user@

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mark Miller

uot;, "ann*"), not <<+smith +ann*>> as I said earlier. B. Getting hairy. Any hope? --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 6:43 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with unt

RE: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Renaud Waldura

r [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 6:43 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation (again) Gotto agree with Erick here...best idea is just to preprocess the query before sending it to the QueryParser. My first thought i

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mark Miller

Gotto agree with Erick here...best idea is just to preprocess the query before sending it to the QueryParser. My first thought is always to get out the sledgehammer... - Mark Erick Erickson wrote: Well, perhaps the simplest thing would be to pre-process the query and make the comma into a whi

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Mathieu Lecarme

if you don't use the same tokenizer for indexing and searching, you will have troubles like this. Mixing exact match (with ") and wildcard (*) is a strange idea. Typographical rules says that you have a space after a comma, no? Your field is tokenized? M. Renaud Waldura a écrit : > My very simple

Re: Wildcard query with untokenized punctuation (again)

2007-06-14 Thread Erick Erickson

Well, perhaps the simplest thing would be to pre-process the query and make the comma into a whitespace before sending anything to the query parser. I don't know how generalizable that sort of solution is in your problem space though Best Erick On 6/13/07, Renaud Waldura <[EMAIL PROTECTED]>

Re: Wildcard query with untokenized punctuation (again)

2007-06-13 Thread Mark Miller

After taking a quick look, I don't see how you can do this without modifying the QueryParser. In QueryParser.jj you will find the conflict of interest at line 891. This line will cause a match on smith,ann* and trigger a wildcard term match on the whole piece. This is again caused by the fact

RE: Wildcard query with untokenized punctuation

2007-03-12 Thread Chris Hostetter

: You're entirely correct about the analyzer (I'm using one that breaks on : non-alphanumeric characters, so all punctuation is ignored). To be : honest, I hadn't thought about altering this, but I guess I could; just : reticent that there might be unforeseen consequences. this is where the PerF

RE: Wildcard query with untokenized punctuation

2007-03-10 Thread Doron Cohen

reasoning behind not analyzing wildcard queries is also explained in the FAQ: "Are Wildcard, Prefix, and Fuzzy queries case sensitive?" Regards, Doron > > --Colin McGuigan > > -Original Message- > From: Doron Cohen [mailto:[EMAIL PROTECTED] > Sent: Saturday, Mar

RE: Wildcard query with untokenized punctuation

2007-03-10 Thread McGuigan, Colin

arch 10, 2007 2:08 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation Hi Colin, Is it possible that you are using an analyzer that breaks words on non letters? For instance SimpleAnalyzer? if so, the doc text: pagefile.sys is indexed as two words: pagefile

Re: Wildcard query with untokenized punctuation

2007-03-10 Thread Doron Cohen

Hi Colin, Is it possible that you are using an analyzer that breaks words on non letters? For instance SimpleAnalyzer? if so, the doc text: pagefile.sys is indexed as two words: pagefile sys At search time, the query text: pagefile.sys is also parsed-tokenized into a two words query: prof

RE: Wildcard query with untokenized punctuation

2007-03-09 Thread McGuigan, Colin

-Original Message- From: Steffen Heinrich [mailto:[EMAIL PROTECTED] Sent: Fri 3/9/2007 4:31 PM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation On 9 Mar 2007 at 15:10, McGuigan, Colin wrote: >> I have a "filename" field in Luc

Re: Wildcard query with untokenized punctuation

2007-03-09 Thread Steffen Heinrich

On 9 Mar 2007 at 15:10, McGuigan, Colin wrote: > I have a "filename" field in Lucene that holds a value, like this: > pagefile.sys > Hi Colin, I'm still _very_ new to lucene, but isn't that what the un-tokenized indexing is for? Like in 1.9.1 doc.add(Field.Keyword("filename", "pagefile.sys"));

Re: Wildcard query with untokenized punctuation (again)

RE: Wildcard query with untokenized punctuation (again)

Re: Wildcard query with untokenized punctuation (again)

RE: Wildcard query with untokenized punctuation (again)

Re: Wildcard query with untokenized punctuation (again)

Re: Wildcard query with untokenized punctuation (again)

Re: Wildcard query with untokenized punctuation (again)

Re: Wildcard query with untokenized punctuation (again)

RE: Wildcard query with untokenized punctuation

RE: Wildcard query with untokenized punctuation

RE: Wildcard query with untokenized punctuation

Re: Wildcard query with untokenized punctuation

RE: Wildcard query with untokenized punctuation

Re: Wildcard query with untokenized punctuation

14 matches

Site Navigation

Mail list logo

Footer information