nguages/character in the world for *one person*...
>
> Regards,
> Mead
>
>
> On Sun, Oct 23, 2011 at 1:27 AM, Petite Abeille >wrote:
>
> >
> > On Oct 22, 2011, at 2:49 AM, Luca Rondanini wrote:
> >
> > > I usually use Nutch for this but, jus
Hi all,
I usually use Nutch for this but, just for fun, I tried to create a language
identifier based on Lucene only.
I had a really small set of "training data": 10 files (roughly 2M each) for
10 languages. I indexed those files using an NGram analyzer.
I have to say that I was not expecting mu
why not just using the StandardAnalyzer? it works pretty well even with
Asian languages!
On Wed, Jan 19, 2011 at 12:23 AM, Shai Erera wrote:
> If you index documents, each in a different language, but all its fields
> are
> of the same language, then what you can do is the following:
>
> Creat
eheheheh,
1.4 billion of documents = 1,400,000,000 documents for almost 2T = 2
therabites = 2000 gigas on HD!
On Mon, Nov 22, 2010 at 10:16 AM, wrote:
> > of course I will distribute my index over many machines:
> > store everything on
> > one computer is just crazy, 1.4B docs is going to b
> There is an acknowledged/proven bug with a small unit test, but there
> is
> > some
> > > disagreement about the internal reasons it fails. I have been unable to
> > > generate further discussion or a resolution. This was supposed to be
> > added as a
> >
t; We have a couple billion docs in our archives as well. Breaking them up by
> day worked well for us, but you'll need to do something.
>
> -----Original Message-
> From: Luca Rondanini [mailto:luca.rondan...@gmail.com]
> Sent: Sunday, November 21, 2010 8:13 P
> On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini
> wrote:
> > Hi everybody,
> >
> > I really need some good advice! I need to index in lucene something like
> 1.4
> > billions documents. I had experience in lucene but I've never worked with
> > such a bi
Hi everybody,
I really need some good advice! I need to index in lucene something like 1.4
billions documents. I had experience in lucene but I've never worked with
such a big number of documents. Also this is just the number of docs at
"start-up": they are going to grow and fast.
I don't have to
Sometimes I feel stupid! ;)
Thank you very much!
Luca
testn wrote:
Boost must be Map
Luca123 wrote:
Hi all,
I've always used the MultiFieldQueryParser class without problems but
now i'm experiencing a strange problem.
This is my code:
Map boost = new HashMap();
boost.put("field1",5);
boos
Hi all,
I've always used the MultiFieldQueryParser class without problems but
now i'm experiencing a strange problem.
This is my code:
Map boost = new HashMap();
boost.put("field1",5);
boost.put("field2",1);
Analyzer analyzer = new StandardAnalyzer(STOP_WORDS);
String[] s_fields = new String[2
10 matches
Mail list logo