Re: indexdir/segments (No such file or directory) lock file present..

2005-05-12 Thread Ramya
I have chenged my logic as suggested.. boolean create = !(new File("indexdir/segments").exists()); System.out.println("Number of objects writing to index=" + count); index = new IndexWriter(d, new StandardAnalyzer(), create); But still one problem persists..When i do a huge number of documents, th

java.io.IOException: read past EOF

2005-05-12 Thread Matt Magoffin
I ran into the following exception during an index update process (1.4.3 on Windows)... and was wondering if any one had an idea what might cause this. It occured after a new IndexWriter was opened, a document added, and then IndexWriter.close() was called. This occurs after many successful updates

Re: Search Theory Book

2005-05-12 Thread David Spencer
Anna Bing wrote: Firstly the Lucene in Action Book is great. It really helped me with implementing search for a project. Sorry if this is the wrong forum but as you are all search people. I wondered if you could recommend any good books about search theory/algorithms, readable if that is possible

Lucene Search Capabilities.

2005-05-12 Thread Goel, Nikhil
Hi, I have two questions regarding the search capability of Lucene. 1) Lucene does the inverted indexing by which we mean it keeps how many times a particular token is used. Is there a way to find out the list of most frequently used words in the descending order. For example:- Suppose I have

RE: indexdir/segments (No such file or directory) lock file present..

2005-05-12 Thread Monsur Hossain
Ramya. I don't have an answer to your specific lock file question, but a couple thoughts. You say you're using multiple threads to index 50,000 documents. Have you tried a single thread version first? I'd try that, and then scale out to multiple threads as needed. We index over ten times that

indexdir/segments (No such file or directory) lock file present..

2005-05-12 Thread Ramya
1. I am trying to pump in large number of documents( to the tune of 5) ... I use muliple threads and i depend on the internal locks of lucene to synchronize the write access to the index. try { index = new IndexWriter(d, new StandardAnalyzer(), false); }

Re: Search Theory Book

2005-05-12 Thread Paul Libbrecht
How about: http://www.dcs.gla.ac.uk/Keith/Preface.html quite an old one but a recognized one, I think. Also, browse http://www.lt-world.org/ I think. paul Le 12 mai 05, à 14:04, Pasha Bizhan a écrit : Hi, Managing Gigabytes http://www.amazon.com/exec/obidos/tg/detail/-/1558605703/ qid=1115

Re: Search Theory Book

2005-05-12 Thread Otis Gospodnetic
I believe there are links to everal Information Retrieval books on Lucene's Wiki. Otis --- Anna Bing <[EMAIL PROTECTED]> wrote: > Firstly the Lucene in Action Book is great. It really helped me with > implementing search for a project. > > Sorry if this is the wrong forum but as you are all sea

Search Theory Book

2005-05-12 Thread Anna Bing
Firstly the Lucene in Action Book is great. It really helped me with implementing search for a project. Sorry if this is the wrong forum but as you are all search people. I wondered if you could recommend any good books about search theory/algorithms (readable if that is possible in an algorithm

RE: Search Theory Book

2005-05-12 Thread Pasha Bizhan
Hi, Managing Gigabytes http://www.amazon.com/exec/obidos/tg/detail/-/1558605703/qid=1115898416/sr=8 -1/ref=pd_csp_1/104-0210366-8377506?v=glance&s=books&n=507846 Pasha Bizhan http://lucenedotnet.com > -Original Message- > From: Anna Bing [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 1

Re: Search Theory Book

2005-05-12 Thread Gary Moore
Salton, Gerald and McGill, Michael J. /Introduction to Modern Information Retrieval/. McGraw-Hill, 1983. -Gary Anna Bing wrote: Firstly the Lucene in Action Book is great. It really helped me with implementing search for a project. Sorry if this is the wrong forum but as you are all search peo

Re: Top most frequent words

2005-05-12 Thread Sven Duzont
Hi, yeah, i just added it into simpy when i read René post ;) congrats for simpy Sven Le jeudi 12 mai 2005 à 09:59:18, vous écriviez : OG> Somebody asked about this today, and I just found this through Simpy: OG> http://www.unine.ch/info/clef/ OG> Scroll half-way through the page, look on th

Search Theory Book

2005-05-12 Thread Anna Bing
Firstly the Lucene in Action Book is great. It really helped me with implementing search for a project. Sorry if this is the wrong forum but as you are all search people. I wondered if you could recommend any good books about search theory/algorithms, readable if that is possible in an algorithm

Re: Top most frequent words

2005-05-12 Thread René Hackl
Hi John, > >from a slightly skewed source -- newspapers in a fixed interval > perhaps. (I don't think "Los Angeles" makes it into every day parlance You're right there. Most possibly the frequencies in that list are based on a volume of the Los Angeles Times, that's one of the standard CLEF-Co

Re: Top most frequent words

2005-05-12 Thread Ahmet Aksoy
Hi John, I haven't investigated the sources yet, but you might be right. However, as you stated, those type of lists directly depend on the subject, and the source. Anyway, it is not very important for my study, and I'm sure it will help me very much. I will prepare optimized lists if I can obtai

AW: How does Lucene to compute score ?

2005-05-12 Thread Kai Gulzau
>I suppose this question has been asking before but there is no way to >search such a thing in the archive. You can search the lists on: http://www.mail-archive.com/java-user%40lucene.apache.org/ http://www.mail-archive.com/java-dev%40lucene.apache.org/ ...but a lucene search on the mailing l

Re: Top most frequent words

2005-05-12 Thread Ahmet Aksoy
Hi Otis, Thank you for the source address. Best regards, Ahmet Otis Gospodnetic wrote: Somebody asked about this today, and I just found this through Simpy: http://www.unine.ch/info/clef/ Scroll half-way through the page, look on the right side: 1,000 most frequent words for several languages. Ot

Re: I need 100 most frequently used words in different languages.

2005-05-12 Thread Ahmet Aksoy
Hi René, That is an excellent source! There exist, more than I wanted. Thanks a lot. Best regards, Ahmet René Hackl wrote: Hi Ahmet, Now I have only Turkish, English, German, and Finnish lists. At http://www.unine.ch/info/clef/ you can find some more lists you might find useful. Best regard

Re: Top most frequent words

2005-05-12 Thread John Haxby
Otis Gospodnetic wrote: Somebody asked about this today, and I just found this through Simpy: http://www.unine.ch/info/clef/ Scroll half-way through the page, look on the right side: 1,000 most frequent words for several languages. Hmm. I'm not sure how valuable that is. For English "los" a

Top most frequent words

2005-05-12 Thread Otis Gospodnetic
Somebody asked about this today, and I just found this through Simpy: http://www.unine.ch/info/clef/ Scroll half-way through the page, look on the right side: 1,000 most frequent words for several languages. Otis Simpy -- si

How does Lucene to compute score ?

2005-05-12 Thread Bertrand VENZAL
Hi, I suppose this question has been asking before but there is no way to search such a thing in the archive. Anyway, I need to merge to different type of search but I am not really sure that the calculus of the score is the same. So if someone can indicate me how does Lucene to compute it or

Re: I need 100 most frequently used words in different languages.

2005-05-12 Thread René Hackl
Hi Ahmet, > Now I have only Turkish, English, German, and Finnish lists. At http://www.unine.ch/info/clef/ you can find some more lists you might find useful. Best regards, René Sorry Ahmet for the double post, I meant to address the mailing list... -- +++ Neu: Echte DSL-Flatrates von GMX - S

RE: Real time indexing with RAMDirectory

2005-05-12 Thread Otis Gospodnetic
Yes, try minMergeDocs = 1 but keep in mind that you'll have to be re-opening a new IndexReader every time your index changes, and it sounds like this will be very frequent/constant. This may cost you... Otis --- Rifflard Mickaël <[EMAIL PROTECTED]> wrote: > Hi Otis, > > If I swap these tw

RE: Real time indexing with RAMDirectory

2005-05-12 Thread Rifflard Mickaël
Hi Otis, If I swap these two lines, the result is the same. I want to build an indexing process as fast as possible and I can't use a greater minMergeDocs because my need is to know all documents of my index in real time. Do you think that a FSDirectory with minMergeDocs equals to 1 can be th