Re: Indexing puncuation and symbols

2007-10-01 Thread Erick Erickson
You might be able to create an analyzer that breaks your stream up (from the example) into tokens "foo" and "," and then (using the same analyzer) search on phrases with a slop of 0. That seems like it'd do what you want. Best Erick On 10/1/07, Patrick Turcotte <[EMAIL PROTECTED]> wrote: > >

Re: Indexing puncuation and symbols

2007-10-01 Thread Patrick Turcotte
Of course, it depends on the kind of query you are doing, but (I did find the query parser in the mean time) MultiFieldQueryParser mfqp = new MultiFieldQueryParser(useFields, analyzer, boosts); where analyzer can be a PerFieldAnalyzer followed by Query query = mfqp.parse(queryString); would do the

Re: Indexing puncuation and symbols

2007-10-01 Thread John Byrne
Well, the size wouldn't be a problem, we could afford the extra field. But it would seem to complicate the search quite a lot. I'd have to run the search terms through both analyzers. It would be much simpler if the characters were indexed as separate tokens. Patrick Turcotte wrote: Hi, Don'

Re: Indexing puncuation and symbols

2007-10-01 Thread Patrick Turcotte
Hi, Don't know the size of your dataset. But, couldn't you index in 2 fields, with PerFieldAnalyzer, tokenizing with Standard for 1 field, and WhiteSpace for the other. Then use multiple field query (there is a query parser for that, just don't remember the name right now). Patrick On 10/1/07,

Re: Indexing puncuation and symbols

2007-10-01 Thread John Byrne
Whitespace analyzer does preserve those symbols, but not as tokens. It simply leaves them attached to the original term. As an example of what I'm talking about, consider a document that contains (without the quotes) "foo, ". Now, using WhitespaceAnalyzer, I could only get that document by s

Re: Indexing puncuation and symbols

2007-10-01 Thread Karl Wettin
1 okt 2007 kl. 15.33 skrev John Byrne: Has anyone written an analyzer that preserves puncuation and synmbols ("£", "$", "%" etc.) as tokens? WhitespaceAnalyzer? You could also extend the lexical rules of StandardAnalyzer. -- karl

Indexing puncuation and symbols

2007-10-01 Thread John Byrne
Hi, Has anyone written an analyzer that preserves puncuation and synmbols ("£", "$", "%" etc.) as tokens? That way we could distinguish between searching for "100" and "100%" or "$100". Does anyone know of a reason why that wouldn't work? I notice that even Google doesn't support that. But