Using analyzer while constructing Lucene queries
Hi, For proper results during searches, the recommendation is to use same analyzer for indexing and querying. We can achieve this by passing the same analyzer, which was used for indexing, to QueryParser to construct Lucene query and use this query while searching the index. The question is - How can we use the analyzer that was used for indexing, if we want to construct Lucene queries manually using Query classes (like BooleanQuery, TermQuery, PhraseQuery, etc) instead of using QueryParser? Is there any way to achieve it? Regards, Rajesh - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Using analyzer while constructing Lucene queries
Thanks Ian. I agree with you on lowercasing of characters. My main concern is specific to stemming done by analyzers. For example, StandardAnalyzer will stem words like playing, played, plays, etc. to a common tokan "play" which will be stored in the index. Now, during searches, we would need same stemming to be performed on search tokens so that we can use equals searches and get correct results back. In this example, the search term "playing" or "plays" may not return the document as it is indexed with token "play". What I am not really getting it how I can use the same analyzer during searches is I am constructing queries manually. Regards, Rajesh --- On Tue, 1/13/09, Ian Lea wrote: > From: Ian Lea > Subject: Re: Using analyzer while constructing Lucene queries > To: java-user@lucene.apache.org, rajesh_para...@yahoo.com > Date: Tuesday, January 13, 2009, 9:33 AM > If you are building queries manually, bypassing analysis, > you just > need to make sure that you know what you are doing. As a > trivial > example, if you are indexing with an analyzer that > downcases > everything then you need to pass lowercase terms to > TermQuery. > > You can still use an analyzer where appropriate e.g. to > parse a string > into a Query that you add to a BooleanQuery. > > > -- > Ian. > > > On Tue, Jan 13, 2009 at 1:43 AM, Rajesh parab > wrote: > > Hi, > > > > For proper results during searches, the recommendation > is to use same analyzer for indexing and querying. We can > achieve this by passing the same analyzer, which was used > for indexing, to QueryParser to construct Lucene query and > use this query while searching the index. > > > > The question is - How can we use the analyzer that was > used for indexing, if we want to construct Lucene queries > manually using Query classes (like BooleanQuery, TermQuery, > PhraseQuery, etc) instead of using QueryParser? > > > > Is there any way to achieve it? > > > > Regards, > > Rajesh - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Lucene 2.3.0 and NFS
Hi, We are currently using Lucene 2.0 for full-text searches within our enterprise application, which can be deployed in clustered environment. We generate Lucene index for data stored inside relational database. As Lucene 2.0 did not have solid NFS support and as we wanted Lucene based searches to work properly in Clustered environment, we had decided on following approach: 1. The index generation happens on a machine (could be one of the cluster nodes or a separate machine) and once the Lucene index is generated, we copy all the index files to the database. 2. The index search request on each cluster node retrieves the index files from database (during first search or after index update), copies to the file system and use it for searches. 3. Thus, each cluster node has its own copy of the index and it keeps on picking up latest version if it is available inside database. This has worked fine for us till now, though we will not be able continue with this model in future as we want to support Lucene based searches across our application and also want to index large components inside our application like Wiki, forums, etc. As the index will grow, storing and retrieving index files from database will not be an efficient operation. My questions are: - Will we be able to use NFS if we move to Lucene 2.3.0? - Will there be any significant performance impact on index generation and searches if we move to NFS? - Is Lucene + NFS combination supported for all operating systems? (We support Windows, Solaris, AIX, HP-UX, Red Hat Linux) - Is there any other alternative available other than NFS? I will really appreciate your comments/thoughts on this topic. Regards, Rajesh You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.3.0 and NFS
Hi Michael, Thanks a lot for your suggestions. I was looking at rsync; as per this link (http://samba.anu.edu.au/rsync/features.html), rsync is a file transfer program for UNIX. Is there rsync support for Windows as well? I found few rsync programs that works for Windows, but I am not sure if they will server the purpose. Has anyone using rsync on Windows? Regards, Rajesh --- Michael McCandless <[EMAIL PROTECTED]> wrote: > > Rajesh parab wrote: > > Hi, > > > > We are currently using Lucene 2.0 for full-text > > searches within our enterprise application, which > can > > be deployed in clustered environment. We generate > > Lucene index for data stored inside relational > > database. > > > > As Lucene 2.0 did not have solid NFS support and > as we > > wanted Lucene based searches to work properly in > > Clustered environment, we had decided on following > > approach: > > 1. The index generation happens on a machine > (could be > > one of the cluster nodes or a separate machine) > and > > once the Lucene index is generated, we copy all > the > > index files to the database. > > Note that you can do also incremental replication: > often, the Lucene > index changes in minor ways (eg a single new segment > is flushed and a > new segments_N and segments.gen is written) so you > should only sync > the files that are new (and remove the ones that are > now gone). > Lucene's write-once approach makes this very simple > (you just have to > compare file names, not the contents of each file). > > It's also possible to replicate without using a DB. > EG rsync does a > great job. > > > 2. The index search request on each cluster node > > retrieves the index files from database (during > first > > search or after index update), copies to the file > > system and use it for searches. > > 3. Thus, each cluster node has its own copy of the > > index and it keeps on picking up latest version if > it > > is available inside database. > > > > This has worked fine for us till now, though we > will > > not be able continue with this model in future as > we > > want to support Lucene based searches across our > > application and also want to index large > components > > inside our application like Wiki, forums, etc. As > the > > index will grow, storing and retrieving index > files > > from database will not be an efficient operation. > > > > My questions are: > > - Will we be able to use NFS if we move to Lucene > > 2.3.0? > > Make sure you update to 2.3.1, not 2.3.0. > > > - Will there be any significant performance impact > on > > index generation and searches if we move to NFS? > > - Is Lucene + NFS combination supported for all > > operating systems? (We support Windows, Solaris, > AIX, > > HP-UX, Red Hat Linux) > > NFS *should* work, however: > >* It's not widely used, so, test thoroughly in > your particular setup. > >* Most likely to work is if you use a single > machine writing to > the index, and many readers. > >* Performance is likely not great, especially on > searching, but > you should test in your specific situation. > > > - Is there any other alternative available other > than > > NFS? > > > > It's also possible to replicate without using a DB. > EG rsync does > agreat job. > > You should look at Solr, since it already has all > the infrastructure > toaccept updates, replicate index changes to remote > machines, etc. > > > I will really appreciate your comments/thoughts on > > this topic. > > > > Regards, > > Rajesh > > > > > > > > > __ > > > __ > > You rock. That's why Blockbuster's offering you > one month of > > Blockbuster Total Access, No Cost. > > http://tc.deals.yahoo.com/tc/blockbuster/text5.com > > > > > - > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.3.0 and NFS
Hi All, Has anyone used rsync or similar utilities on Windows OS to replicate Lucene index across multiple machines? Any pointers on it will be very useful? Regards, Rajesh --- Rajesh parab <[EMAIL PROTECTED]> wrote: > Hi Michael, > > Thanks a lot for your suggestions. > > I was looking at rsync; as per this link > (http://samba.anu.edu.au/rsync/features.html), rsync > is a file transfer program for UNIX. Is there rsync > support for Windows as well? I found few rsync > programs that works for Windows, but I am not sure > if > they will server the purpose. Has anyone using rsync > on Windows? > > Regards, > Rajesh > > --- Michael McCandless <[EMAIL PROTECTED]> > wrote: > > > > > Rajesh parab wrote: > > > Hi, > > > > > > We are currently using Lucene 2.0 for full-text > > > searches within our enterprise application, > which > > can > > > be deployed in clustered environment. We > generate > > > Lucene index for data stored inside relational > > > database. > > > > > > As Lucene 2.0 did not have solid NFS support and > > as we > > > wanted Lucene based searches to work properly in > > > Clustered environment, we had decided on > following > > > approach: > > > 1. The index generation happens on a machine > > (could be > > > one of the cluster nodes or a separate machine) > > and > > > once the Lucene index is generated, we copy all > > the > > > index files to the database. > > > > Note that you can do also incremental replication: > > often, the Lucene > > index changes in minor ways (eg a single new > segment > > is flushed and a > > new segments_N and segments.gen is written) so you > > should only sync > > the files that are new (and remove the ones that > are > > now gone). > > Lucene's write-once approach makes this very > simple > > (you just have to > > compare file names, not the contents of each > file). > > > > It's also possible to replicate without using a > DB. > > EG rsync does a > > great job. > > > > > 2. The index search request on each cluster node > > > retrieves the index files from database (during > > first > > > search or after index update), copies to the > file > > > system and use it for searches. > > > 3. Thus, each cluster node has its own copy of > the > > > index and it keeps on picking up latest version > if > > it > > > is available inside database. > > > > > > This has worked fine for us till now, though we > > will > > > not be able continue with this model in future > as > > we > > > want to support Lucene based searches across our > > > application and also want to index large > > components > > > inside our application like Wiki, forums, etc. > As > > the > > > index will grow, storing and retrieving index > > files > > > from database will not be an efficient > operation. > > > > > > My questions are: > > > - Will we be able to use NFS if we move to > Lucene > > > 2.3.0? > > > > Make sure you update to 2.3.1, not 2.3.0. > > > > > - Will there be any significant performance > impact > > on > > > index generation and searches if we move to NFS? > > > - Is Lucene + NFS combination supported for all > > > operating systems? (We support Windows, Solaris, > > AIX, > > > HP-UX, Red Hat Linux) > > > > NFS *should* work, however: > > > >* It's not widely used, so, test thoroughly in > > your particular setup. > > > >* Most likely to work is if you use a single > > machine writing to > > the index, and many readers. > > > >* Performance is likely not great, especially > on > > searching, but > > you should test in your specific situation. > > > > > - Is there any other alternative available other > > than > > > NFS? > > > > > > > It's also possible to replicate without using a > DB. > > EG rsync does > > agreat job. > > > > You should look at Solr, since it already has all > > the infrastructure > > toaccept updates, replicate index changes to > remote > > machines, etc. > > > > > I will really appreciate your comments/thoughts > on > > > this topic. > > > > > > Regards, > > > Rajesh > &
Lucene index on relational data
Hi, We are using Lucene 2.0 to index data stored inside relational database. Like any relational database, our database has quite a few one-to-one and one-to-many relationships. For example, lets say an Object A has one-to-many relationship with Object X and Object Y. As we need to de-normalize relational data as key-value pairs before storing it inside Lucene index, we have de-normalized these relationships (Object X and Object Y) while building an index on Object A. We have large no of such object relationships and most of the times, the related objects are modified more frequently than the base objects. For example, in our above case, objects X and Y are updated in the system very frequently, whereas Object A is not updated that often. Still, we will need to update Object A entries inside the index, every time its related objects X and/or Y are modified. To avoid the above situation, we were thinking of having 2 separate indexes first index will only index data of base objects (Object A in above example) and second index will contain data about its relationship objects (Object X and Y above), which are updated more frequently. This way, the more frequent updates to Object X and Y will only impact second index that stores relationship information and reduce the cost to re-index object A. However, I dont think, MultiSearcher will be helpful if we want to search for data which spans across both indexes (e.g. some fields of Object A in first index and some fields of Object X or Y in second index). Do we have any option in Lucene to handle such scenario? Can we search across multiple indexes which have some relationships between them and search for fields that span across these indexes? Regards, Rajesh __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Thanks for these pointers Mathieu. We have earlier looked at Compass, but the main issue with database index is DB vendor support for BLOB locator. I understand that Oracle provides has this support to get the partial data from BLOB, but I guess the simiar support is not available in SQL Server and DB2. Our application currently supports all these 3 databases. Secondly I am reading that search performance degrades drastically with database index. Will it be possible to partition data like main index and relationship index using File System Lucne index and search across these indexes? Regards, Rajesh --- Mathieu Lecarme <[EMAIL PROTECTED]> wrote: > Have a look at Compass 2.0M3 > http://www.kimchy.org/searchable-cascading-mapping/ > > Your multiple index will be nice for massive write. > In a classical > read/write ratio, Compass will be much easier. > > M. > > Rajesh parab a écrit : > > Hi, > > > > We are using Lucene 2.0 to index data stored > inside > > relational database. Like any relational database, > our > > database has quite a few one-to-one and > one-to-many > > relationships. For example, letâs say an Object > A has > > one-to-many relationship with Object X and Object > Y. > > As we need to de-normalize relational data as > > key-value pairs before storing it inside Lucene > index, > > we have de-normalized these relationships (Object > X > > and Object Y) while building an index on Object A. > > > > We have large no of such object relationships and > most > > of the times, the related objects are modified > more > > frequently than the base objects. For example, in > our > > above case, objects X and Y are updated in the > system > > very frequently, whereas Object A is not updated > that > > often. Still, we will need to update Object A > entries > > inside the index, every time its related objects X > > and/or Y are modified. > > > > To avoid the above situation, we were thinking of > > having 2 separate indexes â first index will > only > > index data of base objects (Object A in above > example) > > and second index will contain data about its > > relationship objects (Object X and Y above), which > are > > updated more frequently. This way, the more > frequent > > updates to Object X and Y will only impact second > > index that stores relationship information and > reduce > > the cost to re-index object A. However, I donât > think, > > MultiSearcher will be helpful if we want to search > for > > data which spans across both indexes (e.g. some > fields > > of Object A in first index and some fields of > Object X > > or Y in second index). > > > > Do we have any option in Lucene to handle such > > scenario? Can we search across multiple indexes > which > > have some relationships between them and search > for > > fields that span across these indexes? > > > > Regards, > > Rajesh > > > > __ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > > - > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Thanks for details Karl. I was looking for something like it. However, I have a question around the warning mentioned in javadoc of parallelReader. It says - It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior. So now, if I want to update one of the index document from my dynamic index, I will have to delete the document and insert it again as Lucene does not allow updating the document. Correct? If this is the case, re-insert of document in dynamic index will change the order of the index with static index, which is not modified. How should we take care of this situation? Am I missing something here? Regards, Rajesh --- Karl Wettin <[EMAIL PROTECTED]> wrote: > Hi Rajesh, > > I think you are looking for ParallelReader. > > <http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/index/ParallelReader.html> > > public class ParallelReader > extends IndexReader > > An IndexReader which reads multiple, parallel > indexes. Each index added > must have the same number of documents, but > typically each contains > different fields. Each document contains the union > of the fields of all > documents with the same document number. When > searching, matches for a > query term are from the first index added that has > the field. > > This is useful, e.g., with collections that have > large fields which > change rarely and small fields that change more > frequently. The smaller > fields may be re-indexed in a new index and both > indexes may be searched > together. > > Warning: It is up to you to make sure all indexes > are created and > modified the same way. For example, if you add > documents to one index, > you need to add the same documents in the same order > to the other > indexes. Failure to do so will result in undefined > behavior. > > > > karl > > Rajesh parab skrev: > > Hi, > > > > We are using Lucene 2.0 to index data stored > inside > > relational database. Like any relational database, > our > > database has quite a few one-to-one and > one-to-many > > relationships. For example, lets say an Object A > has > > one-to-many relationship with Object X and Object > Y. > > As we need to de-normalize relational data as > > key-value pairs before storing it inside Lucene > index, > > we have de-normalized these relationships (Object > X > > and Object Y) while building an index on Object A. > > > > We have large no of such object relationships and > most > > of the times, the related objects are modified > more > > frequently than the base objects. For example, in > our > > above case, objects X and Y are updated in the > system > > very frequently, whereas Object A is not updated > that > > often. Still, we will need to update Object A > entries > > inside the index, every time its related objects X > > and/or Y are modified. > > > > To avoid the above situation, we were thinking of > > having 2 separate indexes first index will only > > index data of base objects (Object A in above > example) > > and second index will contain data about its > > relationship objects (Object X and Y above), which > are > > updated more frequently. This way, the more > frequent > > updates to Object X and Y will only impact second > > index that stores relationship information and > reduce > > the cost to re-index object A. However, I dont > think, > > MultiSearcher will be helpful if we want to search > for > > data which spans across both indexes (e.g. some > fields > > of Object A in first index and some fields of > Object X > > or Y in second index). > > > > Do we have any option in Lucene to handle such > > scenario? Can we search across multiple indexes > which > > have some relationships between them and search > for > > fields that span across these indexes? > > > > Regards, > > Rajesh > > > > __ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > > - > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Thanks Mathieu, On your comments on partitioning of data - <> Yes. You can index unfolded data, wich take lot of space, or use two query in two index. The first build a Filter for the second, just like with the previous JDBC example. You can even cache the filter, like Solr does with its faceted search. <> I am looking for a way to use single query to run across two indexes (static and dynamic index) and the search query will have fields from both these indexes. Rajesh --- Mathieu Lecarme <[EMAIL PROTECTED]> wrote: > > Le 11 avr. 08 à 19:29, Rajesh parab a écrit : > > Thanks for these pointers Mathieu. > > > > We have earlier looked at Compass, but the main > issue > > with database index is DB vendor support for BLOB > > locator. I understand that Oracle provides has > this > > support to get the partial data from BLOB, but I > guess > > the simiar support is not available in SQL Server > and > > DB2. Our application currently supports all these > 3 > > databases. > You misanderstood something. Compass can use JDBC > Index, but it's only > an option, classical file index is available too. > Other specific index > is GigaSpace and Terracotta, for cluster > environment. > > > Secondly I am reading that search performance > degrades > > drastically with database index. > You can build a Filter from JDBC query to mix it > with Lucene search. > If your JDBC query use too much join, it will be > slow, so, your Lucene > search, wich wait its Filter, will be slow two. > Building a Filter > froma Set of id is not slow. > > > Will it be possible to partition data like main > index > > and relationship index using File System Lucne > index > > and search across these indexes? > Yes. You can index unfolded data, wich take lot of > space, or use two > query in two index. The first build a Filter for the > second, just like > with the previous JDBC example. > You can even cache the filter, like Solr does with > its faceted search. > > M. > > > > > > > Regards, > > Rajesh > > > > --- Mathieu Lecarme <[EMAIL PROTECTED]> > wrote: > > > >> Have a look at Compass 2.0M3 > >> > http://www.kimchy.org/searchable-cascading-mapping/ > >> > >> Your multiple index will be nice for massive > write. > >> In a classical > >> read/write ratio, Compass will be much easier. > >> > >> M. > >> > >> Rajesh parab a écrit : > >>> Hi, > >>> > >>> We are using Lucene 2.0 to index data stored > >> inside > >>> relational database. Like any relational > database, > >> our > >>> database has quite a few one-to-one and > >> one-to-many > >>> relationships. For example, letâs say an > Object > >> A has > >>> one-to-many relationship with Object X and > Object > >> Y. > >>> As we need to de-normalize relational data as > >>> key-value pairs before storing it inside Lucene > >> index, > >>> we have de-normalized these relationships > (Object > >> X > >>> and Object Y) while building an index on Object > A. > >>> > >>> We have large no of such object relationships > and > >> most > >>> of the times, the related objects are modified > >> more > >>> frequently than the base objects. For example, > in > >> our > >>> above case, objects X and Y are updated in the > >> system > >>> very frequently, whereas Object A is not updated > >> that > >>> often. Still, we will need to update Object A > >> entries > >>> inside the index, every time its related objects > X > >>> and/or Y are modified. > >>> > >>> To avoid the above situation, we were thinking > of > >>> having 2 separate indexes â first index will > >> only > >>> index data of base objects (Object A in above > >> example) > >>> and second index will contain data about its > >>> relationship objects (Object X and Y above), > which > >> are > >>> updated more frequently. This way, the more > >> frequent > >>> updates to Object X and Y will only impact > second > >>> index that stores relationship information and > >> reduce > >>> the cost to re-index object A. However,
Re: Lucene index on relational data
While going over the forum, I found one more thread where Otis has asked similar question around the syncronization of doc ids between 2 indexes. http://www.gossamer-threads.com/lists/lucene/java-user/50227?search_string=parallelreader;#50227 Otis, Have you found the answer to your question? Regards, Rajesh --- Rajesh parab <[EMAIL PROTECTED]> wrote: > Thanks for details Karl. > > I was looking for something like it. However, I have > a > question around the warning mentioned in javadoc of > parallelReader. > > It says - > It is up to you to make sure all indexes are created > and modified the same way. For example, if you add > documents to one index, you need to add the same > documents in the same order to the other indexes. > Failure to do so will result in undefined behavior. > > > So now, if I want to update one of the index > document > from my dynamic index, I will have to delete the > document and insert it again as Lucene does not > allow > updating the document. Correct? If this is the case, > re-insert of document in dynamic index will change > the > order of the index with static index, which is not > modified. How should we take care of this situation? > Am I missing something here? > > Regards, > Rajesh > > --- Karl Wettin <[EMAIL PROTECTED]> wrote: > > > Hi Rajesh, > > > > I think you are looking for ParallelReader. > > > > > <http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/index/ParallelReader.html> > > > > public class ParallelReader > > extends IndexReader > > > > An IndexReader which reads multiple, parallel > > indexes. Each index added > > must have the same number of documents, but > > typically each contains > > different fields. Each document contains the union > > of the fields of all > > documents with the same document number. When > > searching, matches for a > > query term are from the first index added that has > > the field. > > > > This is useful, e.g., with collections that have > > large fields which > > change rarely and small fields that change more > > frequently. The smaller > > fields may be re-indexed in a new index and both > > indexes may be searched > > together. > > > > Warning: It is up to you to make sure all indexes > > are created and > > modified the same way. For example, if you add > > documents to one index, > > you need to add the same documents in the same > order > > to the other > > indexes. Failure to do so will result in undefined > > behavior. > > > > > > > > karl > > > > Rajesh parab skrev: > > > Hi, > > > > > > We are using Lucene 2.0 to index data stored > > inside > > > relational database. Like any relational > database, > > our > > > database has quite a few one-to-one and > > one-to-many > > > relationships. For example, lets say an Object > A > > has > > > one-to-many relationship with Object X and > Object > > Y. > > > As we need to de-normalize relational data as > > > key-value pairs before storing it inside Lucene > > index, > > > we have de-normalized these relationships > (Object > > X > > > and Object Y) while building an index on Object > A. > > > > > > We have large no of such object relationships > and > > most > > > of the times, the related objects are modified > > more > > > frequently than the base objects. For example, > in > > our > > > above case, objects X and Y are updated in the > > system > > > very frequently, whereas Object A is not updated > > that > > > often. Still, we will need to update Object A > > entries > > > inside the index, every time its related objects > X > > > and/or Y are modified. > > > > > > To avoid the above situation, we were thinking > of > > > having 2 separate indexes first index will > only > > > index data of base objects (Object A in above > > example) > > > and second index will contain data about its > > > relationship objects (Object X and Y above), > which > > are > > > updated more frequently. This way, the more > > frequent > > > updates to Object X and Y will only impact > second > > > index that stores relationship information and > > reduce > > > the cost to re-index object A. However, I dont > > think, > > > MultiSearcher will be helpful if
Re: Lucene index on relational data
<> How much data do you have? I have a hard time to understand the relationship between your objects and what sort of normalized data you add to the documents. If you are lucky it is just a single or few fields that needs to be updated and you can manage to keep it in RAM and rebuild the whole thing everytime something happends or on some schedule. <> Regarding data and its relationships - the use case I am trying to solve is to partition my data into 2 indexes, a primary index that will contains majority of the data and it is fairly static. The secondary index will have related information for the same data set in primary index and this related information inside secondary index will change very frequently. The no of documents in each index will go in millions and hence, re-building index in memory will not work :-( <> There are some hacks in the JIRA that allows you to replace a document at a certain position at index optimization time. You might want to update a number of document every time you do that. https://issues.apache.org/jira/browse/LUCENE-879 <> As per the hack you mentioned inside JIRA, if some of the documents are deleted and re-inserted into secondary index, the other documents inside the index do not change their doc id. However, the newly added documents will have different doc ids and hence, we will have to sync them with primary index doc ids. Is my understanind correct? If this is the case, then we will have to update both the indexes every time something inside secondary index changes. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Thanks Karl. How do we specify the primary key or doc id so that newly added document will use the same doc id. Do you have any sample code that makes use of this patch? Secondly, there was a comment saying it is a proof of concept and not a real project. Is anyone using this patch on their production environments? Will this fix get rolled into latest Lucene release? Regards, Rajesh --- Karl Wettin <[EMAIL PROTECTED]> wrote: > Rajesh parab skrev: > > > https://issues.apache.org/jira/browse/LUCENE-879 > > <> > > As per the hack you mentioned inside JIRA, if some > of > > the documents are deleted and re-inserted into > > secondary index, the other documents inside the > index > > do not change their doc id. However, the newly > added > > documents will have different doc ids and hence, > we > > will have to sync them with primary index doc ids. > Is > > my understanind correct? If this is the case, then > we > > will have to update both the indexes every time > > something inside secondary index changes. > > From the JIRA comments to the second patch in > there: > > This new patch allows consumer to, based on a > primary key, delete a > document and add a new document with the same > document number as the > deleted. The events will occur on merging. > > > karl > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Hi Mathieu, I can definitely store the foreign key inside the dynamic index. However if I understand correctly, for ParallelReader to work properly, doc ids for all documents in both primary and secondary (dynamic) index should be in same order. How can we achieve it if there are frequest changes to the dynamic index? The doc ids will keep on changing as we delete and re-insert records in dynamic index. As Karl pointed out, there is a hack available in JIRA that can take care of this doc id update issue, but it is not an official patch and not tested for performance. How are people updating their indexes when used in conjuction with ParallelReader. I think ParallelReader will work well for data partitioned between 2 indexes (static and dynamic). However, I am not finding any better approach to just update the dynamic index. Regards, Rajesh __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Thanks Karl. I think your solution would be useful in case we would like to partition the index into two indexes and use ParallelReader to query both indexes simultaneously. If this solution is not getting including inside future Lucene releases, what other options we have to update just one of the two indexes and keep doc ids in sync so that we can use ParallelReader? Regards, Rajesh --- Karl Wettin <[EMAIL PROTECTED]> wrote: > Rajesh parab skrev: > > How do we specify the primary key or doc id so > that > > newly added document will use the same doc id. Do > you > > have any sample code that makes use of this patch? > > Sorry, there is only the test case in the patch. > > > > > Secondly, there was a comment saying it is a proof > of > > concept and not a real project. Is anyone using > this > > patch on their production environments? Will this > fix > > get rolled into latest Lucene release? > > I very much doubt this patch would ever be rolled > in. It is just > something I did do see if it was possible to solve > some way without > doing major changes to the core architecture. > > It works though. Feel free to report back in the > issue with any results > you get in case you try it out. > > > karl > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene index on relational data
Hi Everyone, Any help around this topic will be very useful. Is anyone partitioning the data into 2 or more indexes and using parallelReader to search these indexes? If yes, how do you handle updates to the indexes and make sure the doc ids for all indexes are in same order? Regards, Rajesh --- Rajesh parab <[EMAIL PROTECTED]> wrote: > Hi Mathieu, > > I can definitely store the foreign key inside the > dynamic index. However if I understand correctly, > for > ParallelReader to work properly, doc ids for all > documents in both primary and secondary (dynamic) > index should be in same order. > > How can we achieve it if there are frequest changes > to > the dynamic index? The doc ids will keep on changing > as we delete and re-insert records in dynamic index. > As Karl pointed out, there is a hack available in > JIRA > that can take care of this doc id update issue, but > it > is not an official patch and not tested for > performance. > > How are people updating their indexes when used in > conjuction with ParallelReader. I think > ParallelReader > will work well for data partitioned between 2 > indexes > (static and dynamic). However, I am not finding any > better approach to just update the dynamic index. > > Regards, > Rajesh > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ParalleReader and synchronization between indexes
Hi, This is from javadoc of ParallelReader: == An IndexReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but typically each contains different fields. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field. This is useful, e.g., with collections that have large fields which change rarely and small fields that change more frequently. The smaller fields may be re-indexed in a new index and both indexes may be searched together. == I have a similar use case as mentioned above and hence would like to use ParallelReader to search across multiple indexes. I have an object that has 50 fields. Out of these 50 fields, 45 are relatively static and other 5 are modified very often. So, I am planning to partition this objects data into 2 indexes such that 45 static fields will be part of one index and remaining 5 dynamic fields will constitute second index. While generating the index for the first time, I can make sure that the document order for documents inside both these indexes is same and hence ParallelReader will work properly with it. The question is - What if the data inside second (smaller) index changes? In order to update index document, I will have to delete it and re-insert it again as Lucene does not support document update. This action (of delete and re-insert) will change internal document id for updated document inside second index and in order to sync it with first index, I will have to also modify first (relatively big and static) index. If we will have to update both the indexes, how it is different from having a single index with all the fields? What is the use case in which ParallelReader will get used? As per documentation, I was thinking that it will apply for my use case, but synchronizing the indexes seems to be a problem. Please help. Regards, Rajesh Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ParalleReader and synchronization between indexes
Hi All, Any suggestions/comments on my questions in this thread will be really helpful. We are planning to use Lucene indexes throughout the application and exploring possibilites of partitioning data between multiple indexes. Regards, Rajesh --- Rajesh parab <[EMAIL PROTECTED]> wrote: > Hi, > > This is from javadoc of ParallelReader: > > == > > An IndexReader which reads multiple, parallel > indexes. > Each index added must have the same number of > documents, but typically each contains different > fields. Each document contains the union of the > fields > of all documents with the same document number. When > searching, matches for a query term are from the > first > index added that has the field. > > This is useful, e.g., with collections that have > large > fields which change rarely and small fields that > change more frequently. The smaller fields may be > re-indexed in a new index and both indexes may be > searched together. > > == > > I have a similar use case as mentioned above and > hence > would like to use ParallelReader to search across > multiple indexes. > > I have an object that has 50 fields. Out of these 50 > fields, 45 are relatively static and other 5 are > modified very often. So, I am planning to partition > this objects data into 2 indexes such that 45 static > fields will be part of one index and remaining 5 > dynamic fields will constitute second index. While > generating the index for the first time, I can make > sure that the document order for documents inside > both > these indexes is same and hence ParallelReader will > work properly with it. > > The question is - > What if the data inside second (smaller) index > changes? In order to update index document, I will > have to delete it and re-insert it again as Lucene > does not support document update. This action (of > delete and re-insert) will change internal document > id > for updated document inside second index and in > order > to sync it with first index, I will have to also > modify first (relatively big and static) index. If > we > will have to update both the indexes, how it is > different from having a single index with all the > fields? What is the use case in which ParallelReader > will get used? As per documentation, I was thinking > that it will apply for my use case, but > synchronizing > the indexes seems to be a problem. > > Please help. > > Regards, > Rajesh > > > > > > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ParalleReader and synchronization between indexes
Hi Guys, Any comments on this? I was looking into Lucene archive and came across this thread what asks the same question. http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477 Any pointers will be helpful. Regards, Rajesh --- Rajesh parab <[EMAIL PROTECTED]> wrote: > Hi All, > > Any suggestions/comments on my questions in this > thread will be really helpful. > > We are planning to use Lucene indexes throughout the > application and exploring possibilites of > partitioning > data between multiple indexes. > > Regards, > Rajesh > > --- Rajesh parab <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > This is from javadoc of ParallelReader: > > > > > == > > > > An IndexReader which reads multiple, parallel > > indexes. > > Each index added must have the same number of > > documents, but typically each contains different > > fields. Each document contains the union of the > > fields > > of all documents with the same document number. > When > > searching, matches for a query term are from the > > first > > index added that has the field. > > > > This is useful, e.g., with collections that have > > large > > fields which change rarely and small fields that > > change more frequently. The smaller fields may be > > re-indexed in a new index and both indexes may be > > searched together. > > > > > == > > > > I have a similar use case as mentioned above and > > hence > > would like to use ParallelReader to search across > > multiple indexes. > > > > I have an object that has 50 fields. Out of these > 50 > > fields, 45 are relatively static and other 5 are > > modified very often. So, I am planning to > partition > > this objects data into 2 indexes such that 45 > static > > fields will be part of one index and remaining 5 > > dynamic fields will constitute second index. While > > generating the index for the first time, I can > make > > sure that the document order for documents inside > > both > > these indexes is same and hence ParallelReader > will > > work properly with it. > > > > The question is - > > What if the data inside second (smaller) index > > changes? In order to update index document, I will > > have to delete it and re-insert it again as Lucene > > does not support document update. This action (of > > delete and re-insert) will change internal > document > > id > > for updated document inside second index and in > > order > > to sync it with first index, I will have to also > > modify first (relatively big and static) index. If > > we > > will have to update both the indexes, how it is > > different from having a single index with all the > > fields? What is the use case in which > ParallelReader > > will get used? As per documentation, I was > thinking > > that it will apply for my use case, but > > synchronizing > > the indexes seems to be a problem. > > > > Please help. > > > > Regards, > > Rajesh > > > > > > > > > > > > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > > - > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ParalleReader and synchronization between indexes
My apologies for quick follow-ups and thanks for pointers/suggestions Grant and Otis. I did check various threads on Java user forum around this topic, but could not find a solution. Some most relevant topics that end with same question I am currently having. http://www.gossamer-threads.com/lists/lucene/java-user/15063?search_string=parallelreader;#15063 http://www.gossamer-threads.com/lists/lucene/java-user/31435?search_string=parallelreader;#31435 http://www.gossamer-threads.com/lists/lucene/java-user/50164?search_string=parallelreader;#50164 Otis, During incremental indexing, option of re-creating second index entirely will not work well in our case as we will be dealing with millions of documents. I am sorry for creating confusion by referring index as "small" index. I should have referred to it as index with less no of fields, which change very often. So, if first index with large no fields is not changing and second index with small set of fields requires constant updates due to frequent changes, is there a way to keep document ids of both indexes in sync without either re-creating second index entirely or modifying both indexes? Can we somehow keep internal document id same after updating (i.e. delete and re-insert) index document? Regards, Rajesh --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Bravo Grant! > > Rajesh, I believe the following will work: > - delete your small index > - optimize your big index (needed? Not 100% sure, > but I think it is) > - loop through the docs in your "big" index > - for each document in the big index, add a document > to the small index > > When you are done you have big+small with docIDs in > sync. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - > Nutch > > - Original Message > > From: Grant Ingersoll <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Wednesday, April 30, 2008 5:48:33 PM > > Subject: Re: ParalleReader and synchronization > between indexes > > > > Rajesh, > > > > You are asking a fairly complicated question on a > seldom used piece of > > functionality. Constantly pinging the list is > just making it less > > likely that someone will respond with an answer. > The likelihood that > > the 1 person who understand that code (and trust > me, it really is > > likely very few people who know how to practically > employ it) enough > > to give practical advice have read it in the time > period you have > > alloted us to respond is next to nil. We are all > volunteers with day > > jobs. > > > > Have you bothered to search the dev and user > mailing list for > > information on the class in question? I would > look for threads from > > Doug or Chuck Williams. > > > > -Grant > > > > > > On Apr 30, 2008, at 5:00 PM, Rajesh parab wrote: > > > > > Hi Guys, > > > > > > Any comments on this? > > > > > > I was looking into Lucene archive and came > across this > > > thread what asks the same question. > > > > > > > > > http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477 > > > > > > Any pointers will be helpful. > > > > > > Regards, > > > Rajesh > > > > > > --- Rajesh parab wrote: > > > > > >> Hi All, > > >> > > >> Any suggestions/comments on my questions in > this > > >> thread will be really helpful. > > >> > > >> We are planning to use Lucene indexes > throughout the > > >> application and exploring possibilites of > > >> partitioning > > >> data between multiple indexes. > > >> > > >> Regards, > > >> Rajesh > > >> > > >> --- Rajesh parab wrote: > > >> > > >>> Hi, > > >>> > > >>> This is from javadoc of ParallelReader: > > >>> > > >>> > > >> > > > > == > > >>> > > >>> An IndexReader which reads multiple, parallel > > >>> indexes. > > >>> Each index added must have the same number of > > >>> documents, but typically each contains > different > > >>> fields. Each document contains the union of > the > > >>> fields > > >>> of all documents with the same document > number. > > >> When > > >>> searching, matches for a query term
Re: ParalleReader and synchronization between indexes
Thanks Yonik. So, if rebuilding the second index is not an option due to large no of documents, then ParallelReader will not work :-( And I believe there is no other way than parallelReader to search across multiple indexes that contain related data. Is there any other alternative? I think, MultiSearcher or MultiReader will only work with multiple, unrelated indexes. Regards, Rajesh Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ParalleReader and synchronization between indexes
One trick I can think of is somehow keeping internal document id of Lucene document same after document is updated (i.e. deleted and re-inserted). I am not sure if we have this capability in Lucene. Regards, Rajesh --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > That's correct, Rajesh. ParallelReader has its > uses, but I guess your case is not one of them, > unless we are all missing some key aspect of PR or a > trick to make it work in your case. > > Otis > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - > Nutch > > ----- Original Message > > From: Rajesh parab <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Thursday, May 1, 2008 6:55:00 PM > > Subject: Re: ParalleReader and synchronization > between indexes > > > > Thanks Yonik. > > > > So, if rebuilding the second index is not an > option > > due to large no of documents, then ParallelReader > will > > not work :-( > > > > And I believe there is no other way than > > parallelReader to search across multiple indexes > that > > contain related data. Is there any other > alternative? > > I think, MultiSearcher or MultiReader will only > work > > with multiple, unrelated indexes. > > > > Regards, > > Rajesh > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Question on storing object hierarchy in Lucene Index
Hi, Lets consider the following object structure. X | - Y | - Z The objects Y and Z does not have an existance on their own. They are owned by object X. How do we effectively search such object structure using Lucene? The way I see is to denormalize this object structure and save the values of X, Y and Z in same field separated by some separator. During searh, again combine the values of X, Y and Z while constructing the query. Are there any best practices around storing the such data structure inside Lucene Index? Regards, Rajesh - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Modelling relational data in Lucene Index?
Hi, As I understand, Lucene has a flat structure where you can define multiple fields inside the document. There is no relationship between any field. I would like to enable index based search for some of the components inside relational database. For exmaple, let say "Folder" Object. The Folder object can have relationship with File object. The File object, in turn, can have attributes like is image, is text file, etc. So, the stricture is Folder -- > File | --- > is image, is text file, .. I would like to enable a search to find a Folder with File of type image. How can we model such relational data inside Lucene index? Regards, Rajesh - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Modelling relational data in Lucene Index?
Thanks Mark. Can you please tell me more about the Lucene add-on you are talking about? Are you talking about Compass? Regards, Rajesh - Original Message From: Mark Miller <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, November 2, 2006 7:29:10 PM Subject: Re: Modelling relational data in Lucene Index? Lucene is probably not the solution if you are looking for a relational model. You should be using a database for that. If you want to combine Lucene with a relational model, check out Hibernate and the new EJB annotations that it supports...there is a cool little Lucene add-on that lets you declare fields to be indexed (and how) with annotations. - Mark Rajesh parab wrote: > Hi, > > As I understand, Lucene has a flat structure where you can define multiple > fields inside the document. There is no relationship between any field. > > I would like to enable index based search for some of the components inside > relational database. For exmaple, let say "Folder" Object. The Folder object > can have relationship with File object. The File object, in turn, can have > attributes like is image, is text file, etc. So, the stricture is > > Folder -- > File > | > --- > is image, is text file, .. > > > I would like to enable a search to find a Folder with File of type image. How > can we model such relational data inside Lucene index? > > Regards, > Rajesh > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Modelling relational data in Lucene Index?
Thanks for feedback Chris. I agree with you. The data set should be flattened out to store inside Lucene index. The Folder-File was just an example. As you know, in relational database, we can have more complex relationships. I understand that this model may not work for deeper relationships. What I am mainly interested in is just one level deep relationship. But, I would like to search on the additional attributes of the related object. For example, in the relationship for Folder-File, I would like to use additional file attributes as search criteria along with file name while searching for folders. The way I see is having single filed for the related object and all its additional attributes and use some separator while capturing this data inside Lucene Field object. For example - new Field("file", "abc.txtimage"); But, I am not quite sure if this model will work. BTW. I did not understand what you meant by the detached approach. Can you please elaborate? Regards, Rajesh - Original Message From: Chris Lu <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, November 2, 2006 7:57:46 PM Subject: Re: Modelling relational data in Lucene Index? For this specific question, you can create index on files, search files that of type image, and from matched files, find the unique directories(can be done in lucene or you can do it via java). Of course this does not scale to deeper relationships. Usually you do need to flattern the database objects in order to use lucene. It's just trading space for speed. I would prefer a detached approach instead of Hibernate or EJB's approach, which is kind of too tightly coupled with any system. How to rebuild if the index is corrupted, or you have a new Analyzer, or schema evolves? How to make it multi-thread safe? -- Chris Lu - Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 11/2/06, Mark Miller <[EMAIL PROTECTED]> wrote: > Lucene is probably not the solution if you are looking for a relational > model. You should be using a database for that. If you want to combine > Lucene with a relational model, check out Hibernate and the new EJB > annotations that it supports...there is a cool little Lucene add-on that > lets you declare fields to be indexed (and how) with annotations. > > - Mark > > Rajesh parab wrote: > > Hi, > > > > As I understand, Lucene has a flat structure where you can define multiple > > fields inside the document. There is no relationship between any field. > > > > I would like to enable index based search for some of the components inside > > relational database. For exmaple, let say "Folder" Object. The Folder > > object can have relationship with File object. The File object, in turn, > > can have attributes like is image, is text file, etc. So, the stricture is > > > > Folder -- > File > > | > > --- > is image, is text file, .. > > > > > > I would like to enable a search to find a Folder with File of type image. > > How can we model such relational data inside Lucene index? > > > > Regards, > > Rajesh > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Transaction support in Lucene
Hi, Does anyone know if there is any plan in adding transaction support in Lucene? Regards, Rajesh - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Transaction support in Lucene
Hi Mike, Thanks for the feedback. I am talking about transaction support in Lucene only. If there is a failure during insert/update/delete of document inside the index, there is no way to roll back the operation and this will keep the index in in-consistent state. I read about Compass providing transaction support on top of Lucene APIs. But, I am not sure, if it is a good idea to use Compass instead of directly using Lucene APIs. There will always be a dependency on Compass to support the latest version/additions to Lucene. Regards, Rajesh - Original Message From: Michael McCandless <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, November 14, 2006 10:09:49 AM Subject: Re: Transaction support in Lucene Rajesh parab wrote: > Does anyone know if there is any plan in adding transaction support in Lucene? I don't know of specific plans. This has been discussed before on user & dev lists. I know the Compass project builds transactional support on top of Lucene. Are you asking for transaction support shared with something external (eg a database)? Meaning updates to the DB and to Lucene either atomically succeed or fail, together? Or, are you asking for transactional behaviour of updates just to Lucene, eg, you want to do a bunch of adds & deletes but have them not be visible (committed) until the end of your transaction and roll back on any failure? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Index generation failure
Hi, I have a question on index generation. What if the index generation fails for some reason, may be disk full, or any other reason? Does it make the index corrupt? I mean, can we still use the index created so far or we need to re-generate the entire index? Secondly, what are possible scenarios for index generation failure apart from desk full, too many open files, etc? Regards, Rajesh Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail beta. http://new.mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Double Quotes and TermQuery
Hi Everyone, I understand that QueryParser allows searches using double quote characters. I was wondering if the double quote will also work with TermQuery. I am not using QueryParser in my application and constructing queries (TermQuery, RangeQuery, BooleanQuery, etc.) explicitly. But, it looks like double quotes are not working with TermQuery. For example: query = new TermQuery(new Term("location", "\"san mateo\"")) Any help/pointers will be much appreciated. Regards, Rajesh Finding fabulous fares is fun. Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains. http://farechase.yahoo.com/promo-generic-14795097 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustered Indexing on common network filesystem
One more alternative, though I am not sure if anyone is using it. Apache Compass has added a plug-in to allow storing Lucene index files inside the database. This should work in clustered environment as all nodes will share the same database instance. I am not sure the impact it will have on performance. Is anyone using DB for index storage? Any drawbacks of this approach? Regards, Rajesh --- Zach Bailey <[EMAIL PROTECTED]> wrote: > Thanks for your response -- > > Based on my understanding, hadoop and nutch are > essentially the same > thing, with nutch being derived from hadoop, and are > primarily intended > to be standalone applications. > > We are not looking for a standalone application, > rather we must use a > framework to implement search inside our current > content management > application. Currently the application search > functionality is designed > and built around Lucene, so migrating frameworks at > this point is not > feasible. > > We are currently re-working our back-end to support > clustering (in > tomcat) and we are looking for information on the > migration of Lucene > from a single node filesystem index (which is what > we use now and hope > to continue to use for clients with a single-node > deployment) to a > shared filesystem index on a mounted network share. > > We prefer to use this strategy because it means we > do not have to have > two disparate methods of managing indexes for > clients who run in a > single-node, non-clustered environment versus > clients who run in a > multiple-node, clustered environment. > > So, hopefully here are some easy questions someone > could shed some light on: > > Is this not a recommended method of managing indexes > across multiple nodes? > > At this point would people recommend storing an > individual index on each > node and propagating index updates via a JMS > framework rather than > attempting to handle it transparently with a single > shared index? > > Is the Lucene index code so intimately tied to > filesystem semantics that > using a shared/networked file system is infeasible > at this point in time? > > What would be the quickest time-to-implementation of > these strategies > (JMS vs. shared FS)? The most robust/least > error-prone? > > I really appreciate any insight or response anyone > can provide, even if > it is a short answer to any of the related topics, > "i.e. we implemented > clustered search using per-node indexing with JMS > update propagation and > it works great", or even something as simple as > "don't use a shared > filesystem at this point". > > Cheers, > -Zach > > testn wrote: > > Why don't you check out Hadoop and Nutch? It > should provide what you are > > looking for. > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]