RE: best practice: 1.4 billions documents

2010-11-24 Thread Uwe Schindler
ParallelMultiSearcher as subclass of MultiSearcher has the same problems. These are not crashes, but more that some queries do not return correct scored results for some queries. This effects especially all MultiTermQueries (TermRange, Fuzzy, NumericRange, Wildcard, Prefix) if they are used in a

Re: best practice: 1.4 billions documents

2010-11-24 Thread Ganesh
Since there was a debate about using multisearcher, what about using ParallelMultiSearcher? I am having indexes with 60 million documents and sometimes it grows to 100 million. I shard the DB by week. I use ParallelMultiSearcher to search across the shards. All data is in single system. Till n

[ANNOUNCE] Katta 0.6.3 released

2010-11-24 Thread Johannes Zillmann
Release 0.6.3 of Katta is now available. Katta - Lucene (or Hadoop Mapfiles or any content which can be split into shards) in the cloud. http://katta.sourceforge.net The changes of the 0.6.3 release: fix KATTA-165, fix IndexOutOfBoundsException when adding index with enabled throttling fix KAT

RE: custom attributs in tokens

2010-11-24 Thread jan.kurella
Of course: We are trying to search in documents that contain text in several languages. We are also investigating other approaches*, so this is not about finding other variants. the goal is to only match tokens from 1 or more given languages and not to match the token if it is by accident the s