Hi all,
I see in the javadoc for the ICUTokenizer that it has special handling for
Lao,Myanmar, Khmer word breaking but no details in the javadoc about what it
does with CJK, which for C and J appears to be breaking into unigrams. Is this
correct?
Tom
eheheheh,
1.4 billion of documents = 1,400,000,000 documents for almost 2T = 2
therabites = 2000 gigas on HD!
On Mon, Nov 22, 2010 at 10:16 AM, wrote:
> > of course I will distribute my index over many machines:
> > store everything on
> > one computer is just crazy, 1.4B docs is going to b
> of course I will distribute my index over many machines:
> store everything on
> one computer is just crazy, 1.4B docs is going to be an index
> of almost 2T
> (in my case)
billion = giga in english
billion = tera in non-english
2T docs = 2.000.000.000.000 docs... ;)
AFAIK 2 ^ 32 - 1 docs is
Thank you all, I really got some good hints!
of course I will distribute my index over many machines: store everything on
one computer is just crazy, 1.4B docs is going to be an index of almost 2T
(in my case)
the best solution for me at the moment (from your suggestions) seems to
identify a crit
Hi Yonik,
Can we do the same for Lucene, the problem is combining the rewritten
queries using the broken method in Query?
As far as I know, the problem is that e.g. MTQs rewrite *per searcher* so
each searcher uses a different rewritten query (with different terms). So
the scores are totally diff
On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote:
> The latest discussion was more about MultiReader vs. MultiSearcher.
>
> But you are right, 1.4 B documents is not easy to go, especially when you
> index grows and you get to the 2.1 B marker, then no MultiSearcher or
> whatever helps.
>
> O
The latest discussion was more about MultiReader vs. MultiSearcher.
But you are right, 1.4 B documents is not easy to go, especially when you
index grows and you get to the 2.1 B marker, then no MultiSearcher or
whatever helps.
On the other hand even distributed Solr has the same problems like
Mu
Am I the only one who thinks this is not the way to go, MultiReader (or
MulitiSearcher) is not going to fix your problems. Having 1.4B Documents on
one machine is a big number, does not matter how you partition them (or you
have some really expensive hardware at your disposal). Did I miss the poin
A local multithreaded search can be done in another way even for a single
index, but not using the impl of (Parallel)MultiSearcher. This may be a new
class directly extending IndexSearcher, which may even do parallel search on
e.g. different segments (because searching a MultiReader is no longer
> it has only problems.
Perhaps these known problems should be added to the doc api, so users who are
encouraged to start clean with the 3.x API don't build bad applications from
scratch?
Parallel searching is extremely powerful and should not be abandoned.
-Original Message-
From: Uw
Hello,
I'm just stuck with one problem and don't know how to figure it out. I'm
working on the indexation of the objects that are in computer memory (they
exist only in my java code). Don't have any problems with indexing it, however
I have no idea how to re-index it if they change during the e
There is no reason to use MultiSearcher instead the much more consistent and
effective MultiReader! We (Robert and me) are already planning to deprecate
it. MultiSearcher itsself has no benefit over a simple IndexSearcher on top of
a MultiReader since Lucene 2.9, it has only problems.
Use case
>> We have a couple billion docs in our archives as well...Breaking them up by
>> day worked well for us
We do not have 2 billion segments in one index We have roughly 5-10 million
documents per index. We are currently using a miltisearcher but unresolved
lucene issues in this will force us to
Are you looking at Solr? It has a lot of the infrastructure you'll be
building
yourself for Lucene already built in. Including replication, distributed
searching, etc. Yes, there's a learning curve for something new, but
your Lucene experience will help you a LOT with that. It has support
for shard
> if there is a solr newsgroup better suited form y question, please point me
> there.
http://lucene.apache.org/solr/mailing_lists.html
--
Ian.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional c
Hi,
if there is a solr newsgroup better suited form y question, please point me
there.
Using the SearchHandler with the deftype=”dismax” option enables the
DisMaxQParserPlugin. From investigating it seems, it is just tokenizing by
whitespace.
Although by looking in the code I could not find t
16 matches
Mail list logo