You might be able to create an analyzer that breaks your
stream up (from the example) into tokens
"foo" and "," and then (using the same analyzer)
search on phrases with a slop of 0. That seems like
it'd do what you want.
Best
Erick
On 10/1/07, Patrick Turcotte <[EMAIL PROTECTED]> wrote:
>
>
Of course, it depends on the kind of query you are doing, but (I did
find the query parser in the mean time)
MultiFieldQueryParser mfqp = new MultiFieldQueryParser(useFields,
analyzer, boosts);
where analyzer can be a PerFieldAnalyzer
followed by
Query query = mfqp.parse(queryString);
would do the
Well, the size wouldn't be a problem, we could afford the extra field.
But it would seem to complicate the search quite a lot. I'd have to run
the search terms through both analyzers. It would be much simpler if the
characters were indexed as separate tokens.
Patrick Turcotte wrote:
Hi,
Don'
Hi,
Don't know the size of your dataset. But, couldn't you index in 2
fields, with PerFieldAnalyzer, tokenizing with Standard for 1 field,
and WhiteSpace for the other.
Then use multiple field query (there is a query parser for that, just
don't remember the name right now).
Patrick
On 10/1/07,
Whitespace analyzer does preserve those symbols, but not as tokens. It
simply leaves them attached to the original term.
As an example of what I'm talking about, consider a document that
contains (without the quotes) "foo, ".
Now, using WhitespaceAnalyzer, I could only get that document by
s
1 okt 2007 kl. 15.33 skrev John Byrne:
Has anyone written an analyzer that preserves puncuation and
synmbols ("£", "$", "%" etc.) as tokens?
WhitespaceAnalyzer?
You could also extend the lexical rules of StandardAnalyzer.
--
karl
Hi,
Has anyone written an analyzer that preserves puncuation and synmbols
("£", "$", "%" etc.) as tokens?
That way we could distinguish between searching for "100" and "100%" or
"$100".
Does anyone know of a reason why that wouldn't work? I notice that even
Google doesn't support that. But