Re: Document ids in Lucene index

2008-04-11 Thread Chris Hostetter
: I am wondering if there are possible "holes" in set of index documents : ids. Being more specific - is it possible that there exist integer i : between 0 and IndexReader.maxDoc() such that : reader.document(i) == null : and : reader.isDeleted(i)==false : ??? That should not ever happen ... if i

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab
Thanks Karl. How do we specify the primary key or doc id so that newly added document will use the same doc id. Do you have any sample code that makes use of this patch? Secondly, there was a comment saying it is a proof of concept and not a real project. Is anyone using this patch on their produ

Re: Lucene index on relational data

2008-04-11 Thread Karl Wettin
Rajesh parab skrev: https://issues.apache.org/jira/browse/LUCENE-879 <> As per the hack you mentioned inside JIRA, if some of the documents are deleted and re-inserted into secondary index, the other documents inside the index do not change their doc id. However, the newly added documents will

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab
<> How much data do you have? I have a hard time to understand the relationship between your objects and what sort of normalized data you add to the documents. If you are lucky it is just a single or few fields that needs to be updated and you can manage to keep it in RAM and rebuild the whole thin

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab
While going over the forum, I found one more thread where Otis has asked similar question around the syncronization of doc ids between 2 indexes. http://www.gossamer-threads.com/lists/lucene/java-user/50227?search_string=parallelreader;#50227 Otis, Have you found the answer to your question? Reg

Re: Lucene index on relational data

2008-04-11 Thread Karl Wettin
How much data do you have? I have a hard time to understand the relationship between your objects and what sort of normalized data you add to the documents. If you are lucky it is just a single or few fields that needs to be updated and you can manage to keep it in RAM and rebuild the whole th

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab
Thanks Mathieu, On your comments on partitioning of data - <> Yes. You can index unfolded data, wich take lot of space, or use two query in two index. The first build a Filter for the second, just like with the previous JDBC example. You can even cache the filter, like Solr does with its faceted

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab
Thanks for details Karl. I was looking for something like it. However, I have a question around the warning mentioned in javadoc of parallelReader. It says - It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need t

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Antony Bowesman
Paul Elschot wrote: Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme: Use Filter and BitSet. From the personnal data, you build a Filter (http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Fil ter.html) wich is used in the main index. With 1 billion mails, and possibly

Re: Lucene index on relational data

2008-04-11 Thread Mathieu Lecarme
Le 11 avr. 08 à 19:29, Rajesh parab a écrit : Thanks for these pointers Mathieu. We have earlier looked at Compass, but the main issue with database index is DB vendor support for BLOB locator. I understand that Oracle provides has this support to get the partial data from BLOB, but I guess the

Re: Lucene index on relational data

2008-04-11 Thread Karl Wettin
Hi Rajesh, I think you are looking for ParallelReader. public class ParallelReader extends IndexReader An IndexReader which reads multiple, parallel indexes. Each index added must have the same number of doc

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab
Thanks for these pointers Mathieu. We have earlier looked at Compass, but the main issue with database index is DB vendor support for BLOB locator. I understand that Oracle provides has this support to get the partial data from BLOB, but I guess the simiar support is not available in SQL Server an

RE: designing a dictionary filter with multiple word entries

2008-04-11 Thread Allen Atamer
Mathieu, Your suggestion pointed me in the right direction. I was using a private queue instead of using the inherited TokenStream from the superclass. Thanks. However that still didn't stop the PhraseQuery problem from happening. I instead needed to add one more item into the code below

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Paul Elschot
Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme: > Antony Bowesman a écrit : > > We're planning to archive email over many years and have been > > looking at using DB to store mail meta data and Lucene for the > > indexed mail data, or just Lucene on its own with email data and > > structu

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Mathieu Lecarme
Antony Bowesman a écrit : We're planning to archive email over many years and have been looking at using DB to store mail meta data and Lucene for the indexed mail data, or just Lucene on its own with email data and structure stored as XML and the raw message stored in the file system. For so

Re: about NullPointerException in DocumentsWriter$ThreadState.init(DocumentsWriter.java:751)

2008-04-11 Thread Michael McCandless
I think this is something you just shouldn't do? Ie, if you call Document.addField(null), it is silently accepted and then causes that exception when added the document, but my feeling is you shouldn't do that. Mike kai.hu wrote: i got a problem yesterday, java.lang.NullPointerException

Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Antony Bowesman
We're planning to archive email over many years and have been looking at using DB to store mail meta data and Lucene for the indexed mail data, or just Lucene on its own with email data and structure stored as XML and the raw message stored in the file system. For some customers, the volumes a

Re: Lucene index on relational data

2008-04-11 Thread Mathieu Lecarme
Have a look at Compass 2.0M3 http://www.kimchy.org/searchable-cascading-mapping/ Your multiple index will be nice for massive write. In a classical read/write ratio, Compass will be much easier. M. Rajesh parab a écrit : Hi, We are using Lucene 2.0 to index data stored inside relational dat