That's a great technique - thanks for sharing it!
Erik
On Mar 14, 2005, at 6:54 AM, Volodymyr Bychkoviak wrote:
Hi all.
I have large index of documents (about 1.6 millions)
One field (for example called “number”) contains string of digits.
I need to do wildcard search on this field such as “*expression*”
(i.e. all documents that contains “expression” in this field.
When I run such search with very short expression (i.e. "*321") I get
OutOfMemoryError or TooManyClauses Exception. (This case depends on
BooleanQuery.maxClauseCount setting).
So I found following workaround. I index this field as sequence of
terms, each of containing single digit from needed value. (For example
I have “123214213” value that needs to be indexed. Then it will be
indexed as sequence of “1”,”2”,”3”,”2”,”1”,”4”,”2”,”1”,”3” terms.)
This can be done by custom Analyzer class.
To search in this by “wildcard” query I do search by PhraseQuery,
which contains single digit terms.
For example: to search documents which contains “321” in field named
“number” I create following PhraseQuery:
PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.add(new Term("number ","3"));
phraseQuery.add(new Term("number ","2"));
phraseQuery.add(new Term("number ","1"));
This approach works faster in case when you need to do search by very
short expression and never run out of memory (or throws TooManyClauses
Exception).
I think this can be useful for someone who needs similar functionality.
Also any comments are appreciated.
Regards,
Volodymyr Bychkoviak
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]