Anshum:
On 23/04/2008, Anshum <[EMAIL PROTECTED]> wrote:
> The issue seems to be with the initialization of the index writer, try
> initializing it with a the last parameter as false i.e.
> *writer = new IndexWriter(indexLocation, new StandardAnalyzer(), false);
writer = new IndexWriter(indexL
Hi there
I am using the latest version of Lucene and have ten threads indexing
documents. I am getting the following errors appearing on a continual
basis during the indexing process:
Exception in thread "Thread-569"
org.apache.lucene.index.MergePolicy$MergeException:
java.io.FileNotFoundEx
Hi Hasan,
The issue seems to be with the initialization of the index writer, try
initializing it with a the last parameter as false i.e.
*writer = new IndexWriter(indexLocation, new StandardAnalyzer(), false);
*If you initialize it with the last argument as true, it creates a new
index each time
Hi Glen,
I am using Red Hat Enterprise Linux ES release 4, kernel : 2.6.9-55.ELsmp.
Its a 32 bit Dual processor, HT enabled machine with 12G of RAM.
The JVM would be : Java HotSpot(TM) Client VM (build 1.6.0_02-ea-b02, mixed
mode)
and Yes, I am using a single searcher instance for all searches.'
writer = new IndexWriter(indexLocation, new
StandardAnalyzer(), true);
String string = request.getParameter("text");
this.log("Text is "+string);
Date date = new Date();
String dateString = DateTools.dateToString(date,
Hi Mathieu,
*What do you wont to do?*
An spell checker and related keyword suggestion
If you wont an ngram => popularity map, just use a berkley DB, and use this
information in your Lucene application. Lucene is a reversed index, Berkeley
DB an index.
*Great ideia! Berkeley DB is definitely a t
Thanks Julien,
I´ll definitely give it a try!!!
[]s
Rafael
On Wed, Apr 23, 2008 at 8:38 AM, Julien Nioche <
[EMAIL PROTECTED]> wrote:
> Hi Raphael,
>
> We initially tried to do the same but ended up developing our own API for
> querying the Web 1T. You can find more details on
> http://digita
This is a patch I made to be able to boost the terms with a specific factor
beside the relevancy returned by MoreLikeThis. This is helpful when having
more then 1 MoreLikeThis in the query, so words in the field A (i.e. Title)
can be boosted more than words in the field B (i.e. Description).
Any f
Hi Stu,
I just committed the fix for this, on 2.4 & 2.3.2. If you're able to
test that this fixes your hang that'd be great. If not that's fine
(I got a unit test to reproduce the issue).
It's quite easy:
svn checkout https://svn.apache.org/repos/asf/lucene/java/branches/
lucene_2_3
Hi Anshum,
2008/4/23 Anshum <[EMAIL PROTECTED]>:
> Hi Glen,
>
> As far as stats for index/search are concerned, here they are:
> * Yes, it is a web based application
> * I am currently facing issues when the number of concurrent searches goes
> high. The search is not able to handle over 2.5 s
Jonathan Ariel skrev:
Yes, it will be too much to do in real time, but it is a good idea tough.
I don't know if a vector of term frequencies is stored with the document.
Because I could search on the index to get the subset of documents and then
take the term frequencies from there.
In that case
Stu Hood wrote:
Thank you very much for looking into this issue!
You're welcome! Thank you for catching it & reporting it.
I originally switched to the SerialMergeScheduler to try and work
around this bug: http://lucene.markmail.org/message/
awkkunr7j24nh4qj . I switched back to the Conc
Rafael Turk a écrit :
Hi Folks,
I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus contains
English word n-grams and their observed frequency counts. The length of the
n-grams ranges from unigrams(single words) to five-grams)
I´m loading each ngram (each row is a ngram) as an
Yes, it will be too much to do in real time, but it is a good idea tough.
I don't know if a vector of term frequencies is stored with the document.
Because I could search on the index to get the subset of documents and then
take the term frequencies from there.
In that case I could change MoreLike
Hi Raphael,
We initially tried to do the same but ended up developing our own API for
querying the Web 1T. You can find more details on
http://digitalpebble.com/resources.html
There could be a way to reuse elements from Lucene e.g. the Term index only
but I could not find an obvious way to achieve
I modified some lucene's code to make lucene have the new use like:
doc=new Document();
byte[] additionalInfo=new byte[]{'x','x','x'};
doc.add(new Field("field1","aa
aa",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
I change the *.frp file as:
if
Hi Folks,
I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus contains
English word n-grams and their observed frequency counts. The length of the
n-grams ranges from unigrams(single words) to five-grams)
I´m loading each ngram (each row is a ngram) as an individual Document.
Th
Jonathan Ariel skrev:
Smart idea, but it won't help me. I have almost 50 categories and eventually
I would like to "filter" not just on category but maybe also on language,
etc.
Karl: what do you mean by measure the distance between the term vectors and
cluster them in real time?
I mean exactly
On Tue, 2008-04-22 at 09:40 +0530, Anshum wrote:
> Any other suggestions for handling a concurrency of over 7 search requests
> per second for an index size of over 15Gigs containing over 13 million
> records?
Our index is 30GB+ with 9 million records and a machine handles an
average search in abo
19 matches
Mail list logo