Re: Can I delete without shuffling document IDs?

2007-06-28 Thread karl wettin
29 jun 2007 kl. 05.08 skrev Daniel Noll: I just wanted to put the question out in case someone has solved the exact same problem already. I've posted some experiments in the LUCENE-879. The patch replace delted documents with a new dummy document. The second patch contains some merge

Re: Lucene as primary object storage

2007-06-28 Thread karl wettin
28 jun 2007 kl. 15.37 skrev Emmanuel Bernard: I don't really like the idea actually: I'm much comfortable with having my data in a relational DB :) If you don't mind, please develop that a bit further. I think Lucene is suited pretty well for object storage if you also need it as an index.

Can I delete without shuffling document IDs?

2007-06-28 Thread Daniel Noll
Hi all. Is there currently any way to delete documents from the middle of a text index without a risk of the document IDs changing later? I'm aware that they probably won't change unless we optimise or unless the user adds more data, but unfortunately adding more data is now a potential occurr

Re: Adding Documents to index in a batch process

2007-06-28 Thread Kai Weber
* Erick Erickson <[EMAIL PROTECTED]>: > I guess I don't understand the problem. Can you build the documents > from within a loop or not? If you can, it's simple... > > open indexwriter > while (build a document) >write to index > > close/optimize. > > Or are you saying that you can't build f

Re: Rewrite one phrase to another in search query

2007-06-28 Thread Mark Miller
You might try my Query Parser, Qsol. http://myhardshadow.com/qsol.php There is a find/replace feature that will do what you want. FindReplace takes the find string, the replace string, boolean for case sensitive, boolean to indicate the replacement will act as an operator (allows for correct de

Re: Lucene as primary object storage

2007-06-28 Thread Otis Gospodnetic
Karl, you might want to have a look at Zoe (the email app from several years ago that uses Lucene as its storage). Also, there is DbDirectory for Lucene, which should have XA support. Andi will know. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com

Re: Searching over multiple indexes with 1:m relationship

2007-06-28 Thread Erick Erickson
Chris is spot-on. Your data set is so small that I wouldn't worry about speed unless and until you have proof that it's a problem. The complexity you'll introduce by having multiple indexes just won't be worth it. In your case, following Chris's advice and de-normalizing the data would be the fir

Re: LUCENE on Eclipse

2007-06-28 Thread Chris Hostetter
When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less atten

Re: queryparser

2007-06-28 Thread Erik Hatcher
On Jun 28, 2007, at 1:29 PM, pratik shinghal wrote: i m using lucene(org.apache.lucene) and i want the java code for parsing single character string.. my code is : QueryParser qp = new QueryParser("",analyser); String str = " track 9"; Query que = qp.parse(str); System.out.println(que);

Re: queryparser

2007-06-28 Thread Erick Erickson
What do you get if you do a System.out.println(que.toString())? And what analyzer are you using? Erick On 6/28/07, pratik shinghal <[EMAIL PROTECTED]> wrote: i m using lucene(org.apache.lucene) and i want the java code for parsing single character string.. my code is : QueryParser qp = new

Re: Adding Documents to index in a batch process

2007-06-28 Thread Erick Erickson
I guess I don't understand the problem. Can you build the documents from within a loop or not? If you can, it's simple... open indexwriter while (build a document) write to index close/optimize. Or are you saying that you can't build from within a loop? Best Erick On 6/28/07, Kai Weber <[E

Re: inserting millions of entries

2007-06-28 Thread Erick Erickson
Yes, opening/closing will be very costly. But I *believe*, although I haven't tried it, that IndexModifier (2.1) will work for you. But do NOT take my word for it as I haven't tried to do what you're doing. But it should be easy to write a short test or two to prove that you can find recently-ins

Re: Luke faster + Index Searcher is slow

2007-06-28 Thread Chris Hostetter
: Are you opening the IndexSearcher every time you query? This is a : costly operation. just repeating the above line because it's important. also... : > The code i use is : > File indexFile = new File(fileName); : >FSDirectory dir = FSDirectory.getDirecto

Re: Searching over multiple indexes with 1:m relationship

2007-06-28 Thread Chris Lu
What you should do is denorm the 1:m relationships. Don't try to mimic the database. If you need to, you can keep the original 2 indexes and create a third one. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo:

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Chris Lu
Basically you need to separate your web app from your searching, for a scalable solution. Searching is a different concern. You can develop more kinds of search when new requirement comes in. Technorati's way is very similar to one of DBSight configuration. One machine is dedicated for indexing,

queryparser

2007-06-28 Thread pratik shinghal
i m using lucene(org.apache.lucene) and i want the java code for parsing single character string.. my code is : QueryParser qp = new QueryParser("",analyser); String str = " track 9"; Query que = qp.parse(str); System.out.println(que); and i want the answer as :track , 9 but i m gett

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Grant Ingersoll
Hadoop is not designed for this type of scenario. Have a look at Solr (http://lucene.apache.org/solr), this is pretty much one of it's main use cases. I think it will do what you need to do and will more than likely work w/ a minimal of configuration on your existing index (but don't hold

Re: Luke faster + Index Searcher is slow

2007-06-28 Thread Grant Ingersoll
Are you opening the IndexSearcher every time you query? This is a costly operation. -Grant On Jun 28, 2007, at 12:03 PM, Nott wrote: I have an index in one file that has a size of abt 18GB of data When i run some queries on Luke the response comes in < 40 ms but the same when I use Inde

Adding Documents to index in a batch process

2007-06-28 Thread Kai Weber
Hello, In my application I have to add documents to the index as follows: 1. build the document to add from a repository 2. obtain an IndexWriter 2. add document to index 4. write and optimize index, close writer 5. goto 1 until no documents left I must work with a legacy code witch does the doc

Luke faster + Index Searcher is slow

2007-06-28 Thread Nott
I have an index in one file that has a size of abt 18GB of data When i run some queries on Luke the response comes in < 40 ms but the same when I use IndexSearcher gives me in 300ms -600 ms Any suggestions ? The code i use is File indexFile = new File(fileName);

Re: several existential issues about Lucene's filesystem

2007-06-28 Thread Grant Ingersoll
On Jun 28, 2007, at 9:06 AM, Samuel LEMOINE wrote: Grant Ingersoll a écrit : On Jun 28, 2007, at 5:29 AM, Samuel LEMOINE wrote: Thanks for the resources about payloads, I'll have a look over it. About the positions/offsets in .tvf, please tell me if I've well understood: The . (quote) Fi

Re: inserting millions of entries

2007-06-28 Thread Mathieu Lecarme
stop writing scp index to another computer play with it scp indexModified to the server mv indexModified indexCurrent all done. mv is atomic. Jens Grivolla a écrit : > Hi, > > I have a Lucene index with a few million entries, and I will need to > add batches of a few hundred thousand or a few mil

inserting millions of entries

2007-06-28 Thread Jens Grivolla
Hi, I have a Lucene index with a few million entries, and I will need to add batches of a few hundred thousand or a few million additional entries. Unfortunately, I absolutely need to have all indexed entries available when inserting a new one, even within one batch, in order to do some duplicat

AW: Searching over multiple indexes with 1:m relationship

2007-06-28 Thread Michael Böckling
Hi Erickson, thanks for your reply. Of course you are right that its a bit insane to mimic a database-schema with indices, but thats how it is. The primary index is already in use, the extended requirements came later. The Index isn't really that big, the primary one has 2-3 MB of data, I don't

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Mathieu Lecarme
Samuel LEMOINE a écrit : > I'm acutely interrested by this issue too, as I'm working on > distributed architecture of Lucene. I'm only at the very beginning of > my study so that I can't help you much, but Hadoop maybe could fit to > your requirements. It's a sub-project of Lucene aiming to paralle

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Samuel LEMOINE
Chun Wei Ho a écrit : Hi, We are currently running a Tomcat web application serving searches over our Lucene index (10GB) on a single server machine (Dual 3GHz CPU, 4GB RAM). Due to performance issues and to scale up to handle more traffic/search requests, we are getting another server machine.

Re: Scaling up to several machines with Lucene

2007-06-28 Thread Mathieu Lecarme
Server One handle website Server Two is a light version of tomcat wich handle Lucene Search In front, a lighttpd which use server two for /search, and server one for all others things You can add lucene server with round robin in lighttpd with this scheme. Careful with fault tolerance and index

Re: Searching over multiple indexes with 1:m relationship

2007-06-28 Thread Erick Erickson
I do have an off-the-wall question.. Why have two indexes? There are, of course, good reasons, but they're things like size and speed. Where I'm going here is that Lucene does NOT require that all documents have the same fields. So it's perfectly reasonable to index heterogeneous data (or differi

Scaling up to several machines with Lucene

2007-06-28 Thread Chun Wei Ho
Hi, We are currently running a Tomcat web application serving searches over our Lucene index (10GB) on a single server machine (Dual 3GHz CPU, 4GB RAM). Due to performance issues and to scale up to handle more traffic/search requests, we are getting another server machine. We are looking at two

Searching over multiple indexes with 1:m relationship

2007-06-28 Thread Michael Böckling
Hi folks! I know there is a MultiSearcher for searching over multiple indices, but my requirement is a bit special. I have two indices whose documents have a 1:m relationship. Most queries will only use the primary index, but some will have to look for detailed information in the secondary index (

Re: Lucene as primary object storage

2007-06-28 Thread Emmanuel Bernard
Hibernate Search (formerly known as Hibernate Lucene) is not designed to use Lucene as the primary and only backend. It is designed to complement a database. I don't really like the idea actually: I'm much comfortable with having my data in a relational DB :) So this product will not help f

Re: several existential issues about Lucene's filesystem

2007-06-28 Thread Samuel LEMOINE
Grant Ingersoll a écrit : On Jun 28, 2007, at 5:29 AM, Samuel LEMOINE wrote: Thanks for the resources about payloads, I'll have a look over it. About the positions/offsets in .tvf, please tell me if I've well understood: The .tvd provides the needed informations concerning the occurrences of

LUCENE on Eclipse

2007-06-28 Thread spilirit
hello; i would like if you could help me finding some documentation about how to import lucene source into eclipse IDE. I'm a new user for this API, and i would like to learn how to use it as i seams powerful... Thank you for your answers. I would be very grateful if somebody have any tutorial

Re: several existential issues about Lucene's filesystem

2007-06-28 Thread Grant Ingersoll
On Jun 28, 2007, at 5:29 AM, Samuel LEMOINE wrote: Thanks for the resources about payloads, I'll have a look over it. About the positions/offsets in .tvf, please tell me if I've well understood: The .tvd provides the needed informations concerning the occurrences of each term in documents, a

Re: several existential issues about Lucene's filesystem

2007-06-28 Thread Samuel LEMOINE
Grant Ingersoll a écrit : On Jun 27, 2007, at 8:51 AM, Samuel LEMOINE wrote: Hi everyone ! I'm working on bibliographical researches on Lucene as an intern in Lingway (which uses Lucene in its main product), and I'm currently studying Lucene's file system. There are several things I don't c