from:"Rajesh parab"

Using analyzer while constructing Lucene queries

2009-01-12 Thread Rajesh parab

Hi,

For proper results during searches, the recommendation is to use same analyzer 
for indexing and querying. We can achieve this by passing the same analyzer, 
which was used for indexing, to QueryParser to construct Lucene query and use 
this query while searching the index.

The question is - How can we use the analyzer that was used for indexing, if we 
want to construct Lucene queries manually using Query classes (like 
BooleanQuery, TermQuery, PhraseQuery, etc) instead of using QueryParser?

Is there any way to achieve it?

Regards,
Rajesh


  

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using analyzer while constructing Lucene queries

2009-01-14 Thread Rajesh parab

Thanks Ian.

I agree with you on lowercasing of characters. My main concern is specific to 
stemming done by analyzers.

For example, StandardAnalyzer will stem words like playing, played, plays, etc. 
to a common tokan "play" which will be stored in the index. Now, during 
searches, we would need same stemming to be performed on search tokens so that 
we can use equals searches and get correct results back. In this example, the 
search term "playing" or "plays" may not return the document as it is indexed 
with token "play".

What I am not really getting it how I can use the same analyzer during searches 
is I am constructing queries manually.

Regards,
Rajesh

--- On Tue, 1/13/09, Ian Lea  wrote:

> From: Ian Lea 
> Subject: Re: Using analyzer while constructing Lucene queries
> To: java-user@lucene.apache.org, rajesh_para...@yahoo.com
> Date: Tuesday, January 13, 2009, 9:33 AM
> If you are building queries manually, bypassing analysis,
> you just
> need to make sure that you know what you are doing.  As a
> trivial
> example, if you are indexing with an analyzer that
> downcases
> everything then you need to pass lowercase terms to
> TermQuery.
> 
> You can still use an analyzer where appropriate e.g. to
> parse a string
> into a Query that you add to a BooleanQuery.
> 
> 
> --
> Ian.
> 
> 
> On Tue, Jan 13, 2009 at 1:43 AM, Rajesh parab
>  wrote:
> > Hi,
> >
> > For proper results during searches, the recommendation
> is to use same analyzer for indexing and querying. We can
> achieve this by passing the same analyzer, which was used
> for indexing, to QueryParser to construct Lucene query and
> use this query while searching the index.
> >
> > The question is - How can we use the analyzer that was
> used for indexing, if we want to construct Lucene queries
> manually using Query classes (like BooleanQuery, TermQuery,
> PhraseQuery, etc) instead of using QueryParser?
> >
> > Is there any way to achieve it?
> >
> > Regards,
> > Rajesh

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene 2.3.0 and NFS

2008-04-03 Thread Rajesh parab

Hi,

We are currently using Lucene 2.0 for full-text
searches within our enterprise application, which can
be deployed in clustered environment. We generate
Lucene index for data stored inside relational
database.

As Lucene 2.0 did not have solid NFS support and as we
wanted Lucene based searches to work properly in
Clustered environment, we had decided on following
approach:
1. The index generation happens on a machine (could be
one of the cluster nodes or a separate machine) and
once the Lucene index is generated, we copy all the
index files to the database.
2. The index search request on each cluster node
retrieves the index files from database (during first
search or after index update), copies to the file
system and use it for searches.
3. Thus, each cluster node has its own copy of the
index and it keeps on picking up latest version if it
is available inside database.

This has worked fine for us till now, though we will
not be able continue with this model in future as we
want to support Lucene based searches across our
application and also want to index large components
inside our application like Wiki, forums, etc. As the
index will grow, storing and retrieving index files
from database will not be an efficient operation.

My questions are:
- Will we be able to use NFS if we move to Lucene
2.3.0?
- Will there be any significant performance impact on
index generation and searches if we move to NFS?
- Is Lucene + NFS combination supported for all
operating systems? (We support Windows, Solaris, AIX,
HP-UX, Red Hat Linux)
- Is there any other alternative available other than
NFS?

I will really appreciate your comments/thoughts on
this topic.

Regards,
Rajesh


  

You rock. That's why Blockbuster's offering you one month of Blockbuster Total 
Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 2.3.0 and NFS

2008-04-05 Thread Rajesh parab

Hi Michael,

Thanks a lot for your suggestions.

I was looking at rsync; as per this link
(http://samba.anu.edu.au/rsync/features.html), rsync
is a file transfer program for UNIX. Is there rsync
support for Windows as well? I found few rsync
programs that works for Windows, but I am not sure if
they will server the purpose. Has anyone using rsync
on Windows?

Regards,
Rajesh

--- Michael McCandless <[EMAIL PROTECTED]>
wrote:

> 
> Rajesh parab wrote:
> > Hi,
> >
> > We are currently using Lucene 2.0 for full-text
> > searches within our enterprise application, which
> can
> > be deployed in clustered environment. We generate
> > Lucene index for data stored inside relational
> > database.
> >
> > As Lucene 2.0 did not have solid NFS support and
> as we
> > wanted Lucene based searches to work properly in
> > Clustered environment, we had decided on following
> > approach:
> > 1. The index generation happens on a machine
> (could be
> > one of the cluster nodes or a separate machine)
> and
> > once the Lucene index is generated, we copy all
> the
> > index files to the database.
> 
> Note that you can do also incremental replication:
> often, the Lucene  
> index changes in minor ways (eg a single new segment
> is flushed and a  
> new segments_N and segments.gen is written) so you
> should only sync  
> the files that are new (and remove the ones that are
> now gone).   
> Lucene's write-once approach makes this very simple
> (you just have to  
> compare file names, not the contents of each file).
> 
> It's also possible to replicate without using a DB. 
> EG rsync does a  
> great job.
> 
> > 2. The index search request on each cluster node
> > retrieves the index files from database (during
> first
> > search or after index update), copies to the file
> > system and use it for searches.
> > 3. Thus, each cluster node has its own copy of the
> > index and it keeps on picking up latest version if
> it
> > is available inside database.
> >
> > This has worked fine for us till now, though we
> will
> > not be able continue with this model in future as
> we
> > want to support Lucene based searches across our
> > application and also want to index large
> components
> > inside our application like Wiki, forums, etc. As
> the
> > index will grow, storing and retrieving index
> files
> > from database will not be an efficient operation.
> >
> > My questions are:
> > - Will we be able to use NFS if we move to Lucene
> > 2.3.0?
> 
> Make sure you update to 2.3.1, not 2.3.0.
> 
> > - Will there be any significant performance impact
> on
> > index generation and searches if we move to NFS?
> > - Is Lucene + NFS combination supported for all
> > operating systems? (We support Windows, Solaris,
> AIX,
> > HP-UX, Red Hat Linux)
> 
> NFS *should* work, however:
> 
>* It's not widely used, so, test thoroughly in
> your particular setup.
> 
>* Most likely to work is if you use a single
> machine writing to  
> the index, and many readers.
> 
>* Performance is likely not great, especially on
> searching, but  
> you should test in your specific situation.
> 
> > - Is there any other alternative available other
> than
> > NFS?
> >
> 
> It's also possible to replicate without using a DB. 
> EG rsync does  
> agreat job.
> 
> You should look at Solr, since it already has all
> the infrastructure  
> toaccept updates, replicate index changes to remote
> machines, etc.
> 
> > I will really appreciate your comments/thoughts on
> > this topic.
> >
> > Regards,
> > Rajesh
> >
> >
> >
> >
>
__
> 
> > __
> > You rock. That's why Blockbuster's offering you
> one month of  
> > Blockbuster Total Access, No Cost.
> > http://tc.deals.yahoo.com/tc/blockbuster/text5.com
> >
> >
>
-
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



  

You rock. That's why Blockbuster's offering you one month of Blockbuster Total 
Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 2.3.0 and NFS

2008-04-09 Thread Rajesh parab

Hi All,

Has anyone used rsync or similar utilities on Windows
OS to replicate Lucene index across multiple machines?

Any pointers on it will be very useful?

Regards,
Rajesh

--- Rajesh parab <[EMAIL PROTECTED]> wrote:

> Hi Michael,
> 
> Thanks a lot for your suggestions.
> 
> I was looking at rsync; as per this link
> (http://samba.anu.edu.au/rsync/features.html), rsync
> is a file transfer program for UNIX. Is there rsync
> support for Windows as well? I found few rsync
> programs that works for Windows, but I am not sure
> if
> they will server the purpose. Has anyone using rsync
> on Windows?
> 
> Regards,
> Rajesh
> 
> --- Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> 
> > 
> > Rajesh parab wrote:
> > > Hi,
> > >
> > > We are currently using Lucene 2.0 for full-text
> > > searches within our enterprise application,
> which
> > can
> > > be deployed in clustered environment. We
> generate
> > > Lucene index for data stored inside relational
> > > database.
> > >
> > > As Lucene 2.0 did not have solid NFS support and
> > as we
> > > wanted Lucene based searches to work properly in
> > > Clustered environment, we had decided on
> following
> > > approach:
> > > 1. The index generation happens on a machine
> > (could be
> > > one of the cluster nodes or a separate machine)
> > and
> > > once the Lucene index is generated, we copy all
> > the
> > > index files to the database.
> > 
> > Note that you can do also incremental replication:
> > often, the Lucene  
> > index changes in minor ways (eg a single new
> segment
> > is flushed and a  
> > new segments_N and segments.gen is written) so you
> > should only sync  
> > the files that are new (and remove the ones that
> are
> > now gone).   
> > Lucene's write-once approach makes this very
> simple
> > (you just have to  
> > compare file names, not the contents of each
> file).
> > 
> > It's also possible to replicate without using a
> DB. 
> > EG rsync does a  
> > great job.
> > 
> > > 2. The index search request on each cluster node
> > > retrieves the index files from database (during
> > first
> > > search or after index update), copies to the
> file
> > > system and use it for searches.
> > > 3. Thus, each cluster node has its own copy of
> the
> > > index and it keeps on picking up latest version
> if
> > it
> > > is available inside database.
> > >
> > > This has worked fine for us till now, though we
> > will
> > > not be able continue with this model in future
> as
> > we
> > > want to support Lucene based searches across our
> > > application and also want to index large
> > components
> > > inside our application like Wiki, forums, etc.
> As
> > the
> > > index will grow, storing and retrieving index
> > files
> > > from database will not be an efficient
> operation.
> > >
> > > My questions are:
> > > - Will we be able to use NFS if we move to
> Lucene
> > > 2.3.0?
> > 
> > Make sure you update to 2.3.1, not 2.3.0.
> > 
> > > - Will there be any significant performance
> impact
> > on
> > > index generation and searches if we move to NFS?
> > > - Is Lucene + NFS combination supported for all
> > > operating systems? (We support Windows, Solaris,
> > AIX,
> > > HP-UX, Red Hat Linux)
> > 
> > NFS *should* work, however:
> > 
> >* It's not widely used, so, test thoroughly in
> > your particular setup.
> > 
> >* Most likely to work is if you use a single
> > machine writing to  
> > the index, and many readers.
> > 
> >* Performance is likely not great, especially
> on
> > searching, but  
> > you should test in your specific situation.
> > 
> > > - Is there any other alternative available other
> > than
> > > NFS?
> > >
> > 
> > It's also possible to replicate without using a
> DB. 
> > EG rsync does  
> > agreat job.
> > 
> > You should look at Solr, since it already has all
> > the infrastructure  
> > toaccept updates, replicate index changes to
> remote
> > machines, etc.
> > 
> > > I will really appreciate your comments/thoughts
> on
> > > this topic.
> > >
> > > Regards,
> > > Rajesh
> &

Lucene index on relational data

2008-04-10 Thread Rajesh parab

Hi,

We are using Lucene 2.0 to index data stored inside
relational database. Like any relational database, our
database has quite a few one-to-one and one-to-many
relationships. For example, lets say an Object A has
one-to-many relationship with Object X and Object Y.
As we need to de-normalize relational data as
key-value pairs before storing it inside Lucene index,
we have de-normalized these relationships (Object X
and Object Y) while building an index on Object A.

We have large no of such object relationships and most
of the times, the related objects are modified more
frequently than the base objects. For example, in our
above case, objects X and Y are updated in the system
very frequently, whereas Object A is not updated that
often. Still, we will need to update Object A entries
inside the index, every time its related objects X
and/or Y are modified.

To avoid the above situation, we were thinking of
having 2 separate indexes  first index will only
index data of base objects (Object A in above example)
and second index will contain data about its
relationship objects (Object X and Y above), which are
updated more frequently. This way, the more frequent
updates to Object X and Y will only impact second
index that stores relationship information and reduce
the cost to re-index object A. However, I dont think,
MultiSearcher will be helpful if we want to search for
data which spans across both indexes (e.g. some fields
of Object A in first index and some fields of Object X
or Y in second index).

Do we have any option in Lucene to handle such
scenario? Can we search across multiple indexes which
have some relationships between them and search for
fields that span across these indexes?

Regards,
Rajesh

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab

Thanks for these pointers Mathieu.

We have earlier looked at Compass, but the main issue
with database index is DB vendor support for BLOB
locator. I understand that Oracle provides has this
support to get the partial data from BLOB, but I guess
the simiar support is not available in SQL Server and
DB2. Our application currently supports all these 3
databases.

Secondly I am reading that search performance degrades
drastically with database index.

Will it be possible to partition data like main index
and relationship index using File System Lucne index
and search across these indexes?

Regards,
Rajesh

--- Mathieu Lecarme <[EMAIL PROTECTED]> wrote:

> Have a look at Compass 2.0M3
> http://www.kimchy.org/searchable-cascading-mapping/
> 
> Your multiple index will be nice for massive write.
> In a classical 
> read/write ratio, Compass will be much easier.
> 
> M.
> 
> Rajesh parab a Ã©crit :
> > Hi,
> >
> > We are using Lucene 2.0 to index data stored
> inside
> > relational database. Like any relational database,
> our
> > database has quite a few one-to-one and
> one-to-many
> > relationships. For example, letâs say an Object
> A has
> > one-to-many relationship with Object X and Object
> Y.
> > As we need to de-normalize relational data as
> > key-value pairs before storing it inside Lucene
> index,
> > we have de-normalized these relationships (Object
> X
> > and Object Y) while building an index on Object A.
> >
> > We have large no of such object relationships and
> most
> > of the times, the related objects are modified
> more
> > frequently than the base objects. For example, in
> our
> > above case, objects X and Y are updated in the
> system
> > very frequently, whereas Object A is not updated
> that
> > often. Still, we will need to update Object A
> entries
> > inside the index, every time its related objects X
> > and/or Y are modified.
> >
> > To avoid the above situation, we were thinking of
> > having 2 separate indexes â first index will
> only
> > index data of base objects (Object A in above
> example)
> > and second index will contain data about its
> > relationship objects (Object X and Y above), which
> are
> > updated more frequently. This way, the more
> frequent
> > updates to Object X and Y will only impact second
> > index that stores relationship information and
> reduce
> > the cost to re-index object A. However, I donât
> think,
> > MultiSearcher will be helpful if we want to search
> for
> > data which spans across both indexes (e.g. some
> fields
> > of Object A in first index and some fields of
> Object X
> > or Y in second index).
> >
> > Do we have any option in Lucene to handle such
> > scenario? Can we search across multiple indexes
> which
> > have some relationships between them and search
> for
> > fields that span across these indexes?
> >
> > Regards,
> > Rajesh
> >
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> >
> >
>
-
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> >
> >   
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab

Thanks for details Karl.

I was looking for something like it. However, I have a
question around the warning mentioned in javadoc of
parallelReader. 

It says -
It is up to you to make sure all indexes are created
and modified the same way. For example, if you add
documents to one index, you need to add the same
documents in the same order to the other indexes.
Failure to do so will result in undefined behavior.


So now, if I want to update one of the index document
from my dynamic index, I will have to delete the
document and insert it again as Lucene does not allow
updating the document. Correct? If this is the case,
re-insert of document in dynamic index will change the
order of the index with static index, which is not
modified. How should we take care of this situation?
Am I missing something here?

Regards,
Rajesh

--- Karl Wettin <[EMAIL PROTECTED]> wrote:

> Hi Rajesh,
> 
> I think you are looking for ParallelReader.
> 
>
<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/index/ParallelReader.html>
> 
> public class ParallelReader
> extends IndexReader
> 
> An IndexReader which reads multiple, parallel
> indexes. Each index added 
> must have the same number of documents, but
> typically each contains 
> different fields. Each document contains the union
> of the fields of all 
> documents with the same document number. When
> searching, matches for a 
> query term are from the first index added that has
> the field.
> 
> This is useful, e.g., with collections that have
> large fields which 
> change rarely and small fields that change more
> frequently. The smaller 
> fields may be re-indexed in a new index and both
> indexes may be searched 
> together.
> 
> Warning: It is up to you to make sure all indexes
> are created and 
> modified the same way. For example, if you add
> documents to one index, 
> you need to add the same documents in the same order
> to the other 
> indexes. Failure to do so will result in undefined
> behavior.
> 
> 
> 
>  karl
> 
> Rajesh parab skrev:
> > Hi,
> > 
> > We are using Lucene 2.0 to index data stored
> inside
> > relational database. Like any relational database,
> our
> > database has quite a few one-to-one and
> one-to-many
> > relationships. For example, lets say an Object A
> has
> > one-to-many relationship with Object X and Object
> Y.
> > As we need to de-normalize relational data as
> > key-value pairs before storing it inside Lucene
> index,
> > we have de-normalized these relationships (Object
> X
> > and Object Y) while building an index on Object A.
> > 
> > We have large no of such object relationships and
> most
> > of the times, the related objects are modified
> more
> > frequently than the base objects. For example, in
> our
> > above case, objects X and Y are updated in the
> system
> > very frequently, whereas Object A is not updated
> that
> > often. Still, we will need to update Object A
> entries
> > inside the index, every time its related objects X
> > and/or Y are modified.
> > 
> > To avoid the above situation, we were thinking of
> > having 2 separate indexes  first index will only
> > index data of base objects (Object A in above
> example)
> > and second index will contain data about its
> > relationship objects (Object X and Y above), which
> are
> > updated more frequently. This way, the more
> frequent
> > updates to Object X and Y will only impact second
> > index that stores relationship information and
> reduce
> > the cost to re-index object A. However, I dont
> think,
> > MultiSearcher will be helpful if we want to search
> for
> > data which spans across both indexes (e.g. some
> fields
> > of Object A in first index and some fields of
> Object X
> > or Y in second index).
> > 
> > Do we have any option in Lucene to handle such
> > scenario? Can we search across multiple indexes
> which
> > have some relationships between them and search
> for
> > fields that span across these indexes?
> > 
> > Regards,
> > Rajesh
> > 
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > 
> >
>
-
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > 
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab

Thanks Mathieu,

On your comments on partitioning of data -

<>
Yes. You can index unfolded data, wich take lot of
space, or use two query in two index. The first build
a Filter for the second, just like with the previous
JDBC example. You can even cache the filter, like Solr
does with its faceted search.

<>
I am looking for a way to use single query to run
across two indexes (static and dynamic index) and the
search query will have fields from both these indexes.

Rajesh

--- Mathieu Lecarme <[EMAIL PROTECTED]> wrote:

> 
> Le 11 avr. 08 à 19:29, Rajesh parab a écrit :
> > Thanks for these pointers Mathieu.
> >
> > We have earlier looked at Compass, but the main
> issue
> > with database index is DB vendor support for BLOB
> > locator. I understand that Oracle provides has
> this
> > support to get the partial data from BLOB, but I
> guess
> > the simiar support is not available in SQL Server
> and
> > DB2. Our application currently supports all these
> 3
> > databases.
> You misanderstood something. Compass can use JDBC
> Index, but it's only  
> an option, classical file index is available too.
> Other specific index  
> is GigaSpace and Terracotta, for cluster
> environment.
> 
> > Secondly I am reading that search performance
> degrades
> > drastically with database index.
> You can build a Filter from JDBC query to mix it
> with Lucene search.  
> If your JDBC query use too much join, it will be
> slow, so, your Lucene  
> search, wich wait its Filter, will be slow two.
> Building a Filter  
> froma Set of id is not slow.
> 
> > Will it be possible to partition data like main
> index
> > and relationship index using File System Lucne
> index
> > and search across these indexes?
> Yes. You can index unfolded data, wich take lot of
> space, or use two  
> query in two index. The first build a Filter for the
> second, just like  
> with the previous JDBC example.
> You can even cache the filter, like Solr does with
> its faceted search.
> 
> M.
> 
> >
> >
> > Regards,
> > Rajesh
> >
> > --- Mathieu Lecarme <[EMAIL PROTECTED]>
> wrote:
> >
> >> Have a look at Compass 2.0M3
> >>
> http://www.kimchy.org/searchable-cascading-mapping/
> >>
> >> Your multiple index will be nice for massive
> write.
> >> In a classical
> >> read/write ratio, Compass will be much easier.
> >>
> >> M.
> >>
> >> Rajesh parab a Ã©crit :
> >>> Hi,
> >>>
> >>> We are using Lucene 2.0 to index data stored
> >> inside
> >>> relational database. Like any relational
> database,
> >> our
> >>> database has quite a few one-to-one and
> >> one-to-many
> >>> relationships. For example, letâs say an
> Object
> >> A has
> >>> one-to-many relationship with Object X and
> Object
> >> Y.
> >>> As we need to de-normalize relational data as
> >>> key-value pairs before storing it inside Lucene
> >> index,
> >>> we have de-normalized these relationships
> (Object
> >> X
> >>> and Object Y) while building an index on Object
> A.
> >>>
> >>> We have large no of such object relationships
> and
> >> most
> >>> of the times, the related objects are modified
> >> more
> >>> frequently than the base objects. For example,
> in
> >> our
> >>> above case, objects X and Y are updated in the
> >> system
> >>> very frequently, whereas Object A is not updated
> >> that
> >>> often. Still, we will need to update Object A
> >> entries
> >>> inside the index, every time its related objects
> X
> >>> and/or Y are modified.
> >>>
> >>> To avoid the above situation, we were thinking
> of
> >>> having 2 separate indexes â first index will
> >> only
> >>> index data of base objects (Object A in above
> >> example)
> >>> and second index will contain data about its
> >>> relationship objects (Object X and Y above),
> which
> >> are
> >>> updated more frequently. This way, the more
> >> frequent
> >>> updates to Object X and Y will only impact
> second
> >>> index that stores relationship information and
> >> reduce
> >>> the cost to re-index object A. However,

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab

While going over the forum, I found one more thread
where Otis has asked similar question around the
syncronization of doc ids between 2 indexes.

http://www.gossamer-threads.com/lists/lucene/java-user/50227?search_string=parallelreader;#50227

Otis,
Have you found the answer to your question?

Regards,
Rajesh

--- Rajesh parab <[EMAIL PROTECTED]> wrote:

> Thanks for details Karl.
> 
> I was looking for something like it. However, I have
> a
> question around the warning mentioned in javadoc of
> parallelReader. 
> 
> It says -
> It is up to you to make sure all indexes are created
> and modified the same way. For example, if you add
> documents to one index, you need to add the same
> documents in the same order to the other indexes.
> Failure to do so will result in undefined behavior.
> 
> 
> So now, if I want to update one of the index
> document
> from my dynamic index, I will have to delete the
> document and insert it again as Lucene does not
> allow
> updating the document. Correct? If this is the case,
> re-insert of document in dynamic index will change
> the
> order of the index with static index, which is not
> modified. How should we take care of this situation?
> Am I missing something here?
> 
> Regards,
> Rajesh
> 
> --- Karl Wettin <[EMAIL PROTECTED]> wrote:
> 
> > Hi Rajesh,
> > 
> > I think you are looking for ParallelReader.
> > 
> >
>
<http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/index/ParallelReader.html>
> > 
> > public class ParallelReader
> > extends IndexReader
> > 
> > An IndexReader which reads multiple, parallel
> > indexes. Each index added 
> > must have the same number of documents, but
> > typically each contains 
> > different fields. Each document contains the union
> > of the fields of all 
> > documents with the same document number. When
> > searching, matches for a 
> > query term are from the first index added that has
> > the field.
> > 
> > This is useful, e.g., with collections that have
> > large fields which 
> > change rarely and small fields that change more
> > frequently. The smaller 
> > fields may be re-indexed in a new index and both
> > indexes may be searched 
> > together.
> > 
> > Warning: It is up to you to make sure all indexes
> > are created and 
> > modified the same way. For example, if you add
> > documents to one index, 
> > you need to add the same documents in the same
> order
> > to the other 
> > indexes. Failure to do so will result in undefined
> > behavior.
> > 
> > 
> > 
> >  karl
> > 
> > Rajesh parab skrev:
> > > Hi,
> > > 
> > > We are using Lucene 2.0 to index data stored
> > inside
> > > relational database. Like any relational
> database,
> > our
> > > database has quite a few one-to-one and
> > one-to-many
> > > relationships. For example, lets say an Object
> A
> > has
> > > one-to-many relationship with Object X and
> Object
> > Y.
> > > As we need to de-normalize relational data as
> > > key-value pairs before storing it inside Lucene
> > index,
> > > we have de-normalized these relationships
> (Object
> > X
> > > and Object Y) while building an index on Object
> A.
> > > 
> > > We have large no of such object relationships
> and
> > most
> > > of the times, the related objects are modified
> > more
> > > frequently than the base objects. For example,
> in
> > our
> > > above case, objects X and Y are updated in the
> > system
> > > very frequently, whereas Object A is not updated
> > that
> > > often. Still, we will need to update Object A
> > entries
> > > inside the index, every time its related objects
> X
> > > and/or Y are modified.
> > > 
> > > To avoid the above situation, we were thinking
> of
> > > having 2 separate indexes  first index will
> only
> > > index data of base objects (Object A in above
> > example)
> > > and second index will contain data about its
> > > relationship objects (Object X and Y above),
> which
> > are
> > > updated more frequently. This way, the more
> > frequent
> > > updates to Object X and Y will only impact
> second
> > > index that stores relationship information and
> > reduce
> > > the cost to re-index object A. However, I dont
> > think,
> > > MultiSearcher will be helpful if

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab

<>
How much data do you have? I have a hard time to
understand the relationship between your objects and
what sort of normalized data you add to the documents.
If you are lucky it is just a single or few fields
that needs to be updated and you can manage to keep it
in RAM and rebuild the whole thing everytime something
happends or on some schedule.
<>
Regarding data and its relationships - the use case I
am trying to solve is to partition my data into 2
indexes, a primary index that will contains majority
of the data and it is fairly static. The secondary
index will have related information for the same data
set in primary index and this related information
inside secondary index will change very frequently.

The no of documents in each index will go in millions
and hence, re-building index in memory will not work
:-(


<>
There are some hacks in the JIRA that allows you to
replace a document at a certain position at index
optimization time. You might want to update a number
of document every time you do that.
 https://issues.apache.org/jira/browse/LUCENE-879
<>
As per the hack you mentioned inside JIRA, if some of
the documents are deleted and re-inserted into
secondary index, the other documents inside the index
do not change their doc id. However, the newly added
documents will have different doc ids and hence, we
will have to sync them with primary index doc ids. Is
my understanind correct? If this is the case, then we
will have to update both the indexes every time
something inside secondary index changes.


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-11 Thread Rajesh parab

Thanks Karl.

How do we specify the primary key or doc id so that
newly added document will use the same doc id. Do you
have any sample code that makes use of this patch?

Secondly, there was a comment saying it is a proof of
concept and not a real project. Is anyone using this
patch on their production environments? Will this fix
get rolled into latest Lucene release?

Regards,
Rajesh

--- Karl Wettin <[EMAIL PROTECTED]> wrote:

> Rajesh parab skrev:
> 
> >  https://issues.apache.org/jira/browse/LUCENE-879
> > <>
> > As per the hack you mentioned inside JIRA, if some
> of
> > the documents are deleted and re-inserted into
> > secondary index, the other documents inside the
> index
> > do not change their doc id. However, the newly
> added
> > documents will have different doc ids and hence,
> we
> > will have to sync them with primary index doc ids.
> Is
> > my understanind correct? If this is the case, then
> we
> > will have to update both the indexes every time
> > something inside secondary index changes.
> 
>  From the JIRA comments to the second patch in
> there:
> 
> This new patch allows consumer to, based on a
> primary key, delete a 
> document and add a new document with the same
> document number as the 
> deleted. The events will occur on merging.
> 
> 
>  karl
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-13 Thread Rajesh parab

Hi Mathieu,

I can definitely store the foreign key inside the
dynamic index. However if I understand correctly, for
ParallelReader to work properly, doc ids for all
documents in both primary and secondary (dynamic)
index should be in same order.

How can we achieve it if there are frequest changes to
the dynamic index? The doc ids will keep on changing
as we delete and re-insert records in dynamic index.
As Karl pointed out, there is a hack available in JIRA
that can take care of this doc id update issue, but it
is not an official patch and not tested for
performance.

How are people updating their indexes when used in
conjuction with ParallelReader. I think ParallelReader
will work well for data partitioned between 2 indexes
(static and dynamic). However, I am not finding any
better approach to just update the dynamic index.

Regards,
Rajesh

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-13 Thread Rajesh parab

Thanks Karl. I think your solution would be useful in
case we would like to partition the index into two
indexes and use ParallelReader to query both indexes
simultaneously. 

If this solution is not getting including inside
future Lucene releases, what other options we have to
update just one of the two indexes and keep doc ids in
sync so that we can use ParallelReader?

Regards,
Rajesh

--- Karl Wettin <[EMAIL PROTECTED]> wrote:

> Rajesh parab skrev:
> > How do we specify the primary key or doc id so
> that
> > newly added document will use the same doc id. Do
> you
> > have any sample code that makes use of this patch?
> 
> Sorry, there is only the test case in the patch.
> 
> > 
> > Secondly, there was a comment saying it is a proof
> of
> > concept and not a real project. Is anyone using
> this
> > patch on their production environments? Will this
> fix
> > get rolled into latest Lucene release?
> 
> I very much doubt this patch would ever be rolled
> in. It is just 
> something I did do see if it was possible to solve
> some way without 
> doing major changes to the core architecture.
> 
> It works though. Feel free to report back in the
> issue with any results 
> you get in case you try it out.
> 
> 
>  karl
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene index on relational data

2008-04-14 Thread Rajesh parab

Hi Everyone,

Any help around this topic will be very useful. Is
anyone partitioning the data into 2 or more indexes
and using parallelReader to search these indexes? If
yes, how do you handle updates to the indexes and make
sure the doc ids for all indexes are in same order?

Regards,
Rajesh

--- Rajesh parab <[EMAIL PROTECTED]> wrote:

> Hi Mathieu,
> 
> I can definitely store the foreign key inside the
> dynamic index. However if I understand correctly,
> for
> ParallelReader to work properly, doc ids for all
> documents in both primary and secondary (dynamic)
> index should be in same order.
> 
> How can we achieve it if there are frequest changes
> to
> the dynamic index? The doc ids will keep on changing
> as we delete and re-insert records in dynamic index.
> As Karl pointed out, there is a hack available in
> JIRA
> that can take care of this doc id update issue, but
> it
> is not an official patch and not tested for
> performance.
> 
> How are people updating their indexes when used in
> conjuction with ParallelReader. I think
> ParallelReader
> will work well for data partitioned between 2
> indexes
> (static and dynamic). However, I am not finding any
> better approach to just update the dynamic index.
> 
> Regards,
> Rajesh
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

ParalleReader and synchronization between indexes

2008-04-29 Thread Rajesh parab

Hi,

This is from javadoc of ParallelReader:

==

An IndexReader which reads multiple, parallel indexes.
Each index added must have the same number of
documents, but typically each contains different
fields. Each document contains the union of the fields
of all documents with the same document number. When
searching, matches for a query term are from the first
index added that has the field. 

This is useful, e.g., with collections that have large
fields which change rarely and small fields that
change more frequently. The smaller fields may be
re-indexed in a new index and both indexes may be
searched together.

==

I have a similar use case as mentioned above and hence
would like to use ParallelReader to search across
multiple indexes.

I have an object that has 50 fields. Out of these 50
fields, 45 are relatively static and other 5 are
modified very often. So, I am planning to partition
this objects data into 2 indexes such that 45 static
fields will be part of one index and remaining 5
dynamic fields will constitute second index. While
generating the index for the first time, I can make
sure that the document order for documents inside both
these indexes is same and hence ParallelReader will
work properly with it.

The question is -
What if the data inside second (smaller) index
changes? In order to update index document, I will
have to delete it and re-insert it again as Lucene
does not support document update. This action (of
delete and re-insert) will change internal document id
for updated document inside second index and in order
to sync it with first index, I will have to also
modify first (relatively big and static) index. If we
will have to update both the indexes, how it is
different from having a single index with all the
fields? What is the use case in which ParallelReader
will get used? As per documentation, I was thinking
that it will apply for my use case, but synchronizing
the indexes seems to be a problem.

Please help.

Regards,
Rajesh



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ParalleReader and synchronization between indexes

2008-04-29 Thread Rajesh parab

Hi All,

Any suggestions/comments on my questions in this
thread will be really helpful.

We are planning to use Lucene indexes throughout the
application and exploring possibilites of partitioning
data between multiple indexes.

Regards,
Rajesh

--- Rajesh parab <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> This is from javadoc of ParallelReader:
> 
>
==
> 
> An IndexReader which reads multiple, parallel
> indexes.
> Each index added must have the same number of
> documents, but typically each contains different
> fields. Each document contains the union of the
> fields
> of all documents with the same document number. When
> searching, matches for a query term are from the
> first
> index added that has the field. 
> 
> This is useful, e.g., with collections that have
> large
> fields which change rarely and small fields that
> change more frequently. The smaller fields may be
> re-indexed in a new index and both indexes may be
> searched together.
> 
>
==
> 
> I have a similar use case as mentioned above and
> hence
> would like to use ParallelReader to search across
> multiple indexes.
> 
> I have an object that has 50 fields. Out of these 50
> fields, 45 are relatively static and other 5 are
> modified very often. So, I am planning to partition
> this objects data into 2 indexes such that 45 static
> fields will be part of one index and remaining 5
> dynamic fields will constitute second index. While
> generating the index for the first time, I can make
> sure that the document order for documents inside
> both
> these indexes is same and hence ParallelReader will
> work properly with it.
> 
> The question is -
> What if the data inside second (smaller) index
> changes? In order to update index document, I will
> have to delete it and re-insert it again as Lucene
> does not support document update. This action (of
> delete and re-insert) will change internal document
> id
> for updated document inside second index and in
> order
> to sync it with first index, I will have to also
> modify first (relatively big and static) index. If
> we
> will have to update both the indexes, how it is
> different from having a single index with all the
> fields? What is the use case in which ParallelReader
> will get used? As per documentation, I was thinking
> that it will apply for my use case, but
> synchronizing
> the indexes seems to be a problem.
> 
> Please help.
> 
> Regards,
> Rajesh
> 
> 
> 
>  
>

> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ParalleReader and synchronization between indexes

2008-04-30 Thread Rajesh parab

Hi Guys,

Any comments on this?

I was looking into Lucene archive and came across this
thread what asks the same question.

http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477

Any pointers will be helpful.

Regards,
Rajesh

--- Rajesh parab <[EMAIL PROTECTED]> wrote:

> Hi All,
> 
> Any suggestions/comments on my questions in this
> thread will be really helpful.
> 
> We are planning to use Lucene indexes throughout the
> application and exploring possibilites of
> partitioning
> data between multiple indexes.
> 
> Regards,
> Rajesh
> 
> --- Rajesh parab <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > This is from javadoc of ParallelReader:
> > 
> >
>
==
> > 
> > An IndexReader which reads multiple, parallel
> > indexes.
> > Each index added must have the same number of
> > documents, but typically each contains different
> > fields. Each document contains the union of the
> > fields
> > of all documents with the same document number.
> When
> > searching, matches for a query term are from the
> > first
> > index added that has the field. 
> > 
> > This is useful, e.g., with collections that have
> > large
> > fields which change rarely and small fields that
> > change more frequently. The smaller fields may be
> > re-indexed in a new index and both indexes may be
> > searched together.
> > 
> >
>
==
> > 
> > I have a similar use case as mentioned above and
> > hence
> > would like to use ParallelReader to search across
> > multiple indexes.
> > 
> > I have an object that has 50 fields. Out of these
> 50
> > fields, 45 are relatively static and other 5 are
> > modified very often. So, I am planning to
> partition
> > this objects data into 2 indexes such that 45
> static
> > fields will be part of one index and remaining 5
> > dynamic fields will constitute second index. While
> > generating the index for the first time, I can
> make
> > sure that the document order for documents inside
> > both
> > these indexes is same and hence ParallelReader
> will
> > work properly with it.
> > 
> > The question is -
> > What if the data inside second (smaller) index
> > changes? In order to update index document, I will
> > have to delete it and re-insert it again as Lucene
> > does not support document update. This action (of
> > delete and re-insert) will change internal
> document
> > id
> > for updated document inside second index and in
> > order
> > to sync it with first index, I will have to also
> > modify first (relatively big and static) index. If
> > we
> > will have to update both the indexes, how it is
> > different from having a single index with all the
> > fields? What is the use case in which
> ParallelReader
> > will get used? As per documentation, I was
> thinking
> > that it will apply for my use case, but
> > synchronizing
> > the indexes seems to be a problem.
> > 
> > Please help.
> > 
> > Regards,
> > Rajesh
> > 
> > 
> > 
> >  
> >
>

> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
> >
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > 
> >
>
-
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> > 
> > 
> 
> 
> 
>  
>

> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ParalleReader and synchronization between indexes

2008-04-30 Thread Rajesh parab

My apologies for quick follow-ups and thanks for
pointers/suggestions Grant and Otis.

I did check various threads on Java user forum around
this topic, but could not find a solution. Some most
relevant topics that end with same question I am
currently having.

http://www.gossamer-threads.com/lists/lucene/java-user/15063?search_string=parallelreader;#15063
http://www.gossamer-threads.com/lists/lucene/java-user/31435?search_string=parallelreader;#31435
http://www.gossamer-threads.com/lists/lucene/java-user/50164?search_string=parallelreader;#50164


Otis,
During incremental indexing, option of re-creating
second index entirely will not work well in our case
as we will be dealing with millions of documents.

I am sorry for creating confusion by referring index
as "small" index. I should have referred to it as
index with less no of fields, which change very often.

So, if first index with large no fields is not
changing and second index with small set of fields
requires constant updates due to frequent changes, is
there a way to keep document ids of both indexes in
sync without either re-creating second index entirely
or modifying both indexes? Can we somehow keep
internal document id same after updating (i.e. delete
and re-insert) index document?

Regards,
Rajesh

--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> Bravo Grant!
> 
> Rajesh, I believe the following will work:
> - delete your small index
> - optimize your big index  (needed?  Not 100% sure,
> but I think it is)
> - loop through the docs in your "big" index
> - for each document in the big index, add a document
> to the small index
> 
> When you are done you have big+small with docIDs in
> sync.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr -
> Nutch
> 
> - Original Message 
> > From: Grant Ingersoll <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Wednesday, April 30, 2008 5:48:33 PM
> > Subject: Re: ParalleReader and synchronization
> between indexes
> > 
> > Rajesh,
> > 
> > You are asking a fairly complicated question on a
> seldom used piece of  
> > functionality.  Constantly pinging the list is
> just making it less  
> > likely that someone will respond with an answer. 
> The likelihood that  
> > the 1 person who understand that code (and trust
> me, it really is  
> > likely very few people who know how to practically
> employ it) enough  
> > to give practical advice have read it in the time
> period you have  
> > alloted us to respond is next to nil.   We are all
> volunteers with day  
> > jobs.
> > 
> > Have you bothered to search the dev and user
> mailing list for  
> > information on the class in question?  I would
> look for threads from  
> > Doug or Chuck Williams.
> > 
> > -Grant
> > 
> > 
> > On Apr 30, 2008, at 5:00 PM, Rajesh parab wrote:
> > 
> > > Hi Guys,
> > >
> > > Any comments on this?
> > >
> > > I was looking into Lucene archive and came
> across this
> > > thread what asks the same question.
> > >
> > > 
> >
>
http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477
> > >
> > > Any pointers will be helpful.
> > >
> > > Regards,
> > > Rajesh
> > >
> > > --- Rajesh parab  wrote:
> > >
> > >> Hi All,
> > >>
> > >> Any suggestions/comments on my questions in
> this
> > >> thread will be really helpful.
> > >>
> > >> We are planning to use Lucene indexes
> throughout the
> > >> application and exploring possibilites of
> > >> partitioning
> > >> data between multiple indexes.
> > >>
> > >> Regards,
> > >> Rajesh
> > >>
> > >> --- Rajesh parab  wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> This is from javadoc of ParallelReader:
> > >>>
> > >>>
> > >>
> > >
>
==
> > >>>
> > >>> An IndexReader which reads multiple, parallel
> > >>> indexes.
> > >>> Each index added must have the same number of
> > >>> documents, but typically each contains
> different
> > >>> fields. Each document contains the union of
> the
> > >>> fields
> > >>> of all documents with the same document
> number.
> > >> When
> > >>> searching, matches for a query term

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Rajesh parab

Thanks Yonik.

So, if rebuilding the second index is not an option
due to large no of documents, then ParallelReader will
not work :-(

And I believe there is no other way than
parallelReader to search across multiple indexes that
contain related data. Is there any other alternative?
I think, MultiSearcher or MultiReader will only work
with multiple, unrelated indexes.

Regards,
Rajesh


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ParalleReader and synchronization between indexes

2008-05-01 Thread Rajesh parab


One trick I can think of is somehow keeping internal
document id of Lucene document same after document is
updated (i.e. deleted and re-inserted). I am not sure
if we have this capability in Lucene.

Regards,
Rajesh

--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> That's correct, Rajesh.  ParallelReader has its
> uses, but I guess your case is not one of them,
> unless we are all missing some key aspect of PR or a
> trick to make it work in your case.
> 
> Otis 
> 
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr -
> Nutch
> 
> ----- Original Message 
> > From: Rajesh parab <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Thursday, May 1, 2008 6:55:00 PM
> > Subject: Re: ParalleReader and synchronization
> between indexes
> > 
> > Thanks Yonik.
> > 
> > So, if rebuilding the second index is not an
> option
> > due to large no of documents, then ParallelReader
> will
> > not work :-(
> > 
> > And I believe there is no other way than
> > parallelReader to search across multiple indexes
> that
> > contain related data. Is there any other
> alternative?
> > I think, MultiSearcher or MultiReader will only
> work
> > with multiple, unrelated indexes.
> > 
> > Regards,
> > Rajesh
> 
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Question on storing object hierarchy in Lucene Index

2006-10-31 Thread Rajesh parab

Hi,

Lets consider the following object structure.

   X
|
- Y
 |
 - Z

The objects Y and Z does not have an existance on their own. They are owned by 
object X.

How do we effectively search such object structure using Lucene? The way I see 
is to denormalize this object structure and save the values of X, Y and Z in 
same field separated by some separator. During searh, again combine the values 
of X, Y and Z while constructing the query.

Are there any best practices around storing the such data structure inside 
Lucene Index?

Regards,
Rajesh





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Modelling relational data in Lucene Index?

2006-11-02 Thread Rajesh parab

Hi,

As I understand, Lucene has a flat structure where you can define multiple 
fields inside the document. There is no relationship between any field.

I would like to enable index based search for some of the components inside 
relational database. For exmaple, let say "Folder" Object. The Folder object 
can have relationship with File object. The File object, in turn, can have 
attributes like is image, is text file, etc. So, the stricture is 

Folder -- > File
 |
 --- > is image, is text file, ..


I would like to enable a search to find a Folder with File of type image. How 
can we model such relational data inside Lucene index?

Regards,
Rajesh




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Modelling relational data in Lucene Index?

2006-11-02 Thread Rajesh parab

Thanks Mark.

Can you please tell me more about the Lucene add-on you are talking about? Are 
you talking about Compass?

Regards,
Rajesh

- Original Message 
From: Mark Miller <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, November 2, 2006 7:29:10 PM
Subject: Re: Modelling relational data in Lucene Index?

Lucene is probably not the solution if you are looking for a relational 
model. You should be using a database for that. If you want to combine 
Lucene with a relational model, check out Hibernate and the new EJB 
annotations that it supports...there is a cool little Lucene add-on that 
lets you declare fields to be indexed (and how) with annotations.

- Mark

Rajesh parab wrote:
> Hi,
>
> As I understand, Lucene has a flat structure where you can define multiple 
> fields inside the document. There is no relationship between any field.
>
> I would like to enable index based search for some of the components inside 
> relational database. For exmaple, let say "Folder" Object. The Folder object 
> can have relationship with File object. The File object, in turn, can have 
> attributes like is image, is text file, etc. So, the stricture is 
> 
> Folder -- > File
>  |
>  --- > is image, is text file, ..
>
>
> I would like to enable a search to find a Folder with File of type image. How 
> can we model such relational data inside Lucene index?
>
> Regards,
> Rajesh
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>   

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Modelling relational data in Lucene Index?

2006-11-02 Thread Rajesh parab

Thanks for feedback Chris.

I agree with you. The data set should be flattened out to store inside Lucene 
index. The Folder-File was just an example. As you know, in relational 
database, we can have more complex relationships. I understand that this model 
may not work for deeper relationships.

What I am mainly interested in is just one level deep relationship. But, I 
would like to search on the additional attributes of the related object. For 
example, in the relationship for Folder-File, I would like to use additional 
file attributes as search criteria along with file name while searching for 
folders.

The way I see is having single filed for the related object and all its 
additional attributes and use some separator while capturing this data inside 
Lucene Field object. For example -

new Field("file", "abc.txtimage");

But, I am not quite sure if this model will work.

BTW. I did not understand what you meant by the detached approach. Can you 
please elaborate?

Regards,
Rajesh

- Original Message 
From: Chris Lu <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, November 2, 2006 7:57:46 PM
Subject: Re: Modelling relational data in Lucene Index?

For this specific question, you can create index on files, search
files that of type image, and from matched files, find the unique
directories(can be done in lucene or you can do it via java).

Of course this does not scale to deeper relationships. Usually you do
need to flattern the database objects in order to use lucene. It's
just trading space for speed.

I would prefer a detached approach instead of Hibernate or EJB's
approach, which is kind of too tightly coupled with any system. How to
rebuild if the index is corrupted, or you have a new Analyzer, or
schema evolves? How to make it multi-thread safe?

-- 
Chris Lu
-
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com

On 11/2/06, Mark Miller <[EMAIL PROTECTED]> wrote:
> Lucene is probably not the solution if you are looking for a relational
> model. You should be using a database for that. If you want to combine
> Lucene with a relational model, check out Hibernate and the new EJB
> annotations that it supports...there is a cool little Lucene add-on that
> lets you declare fields to be indexed (and how) with annotations.
>
> - Mark
>
> Rajesh parab wrote:
> > Hi,
> >
> > As I understand, Lucene has a flat structure where you can define multiple 
> > fields inside the document. There is no relationship between any field.
> >
> > I would like to enable index based search for some of the components inside 
> > relational database. For exmaple, let say "Folder" Object. The Folder 
> > object can have relationship with File object. The File object, in turn, 
> > can have attributes like is image, is text file, etc. So, the stricture is
> >
> > Folder -- > File
> >  |
> >  --- > is image, is text file, ..
> >
> >
> > I would like to enable a search to find a Folder with File of type image. 
> > How can we model such relational data inside Lucene index?
> >
> > Regards,
> > Rajesh
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Transaction support in Lucene

2006-11-14 Thread Rajesh parab

Hi,

Does anyone know if there is any plan in adding transaction support in Lucene?

Regards,
Rajesh



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Transaction support in Lucene

2006-11-14 Thread Rajesh parab

Hi Mike,

Thanks for the feedback.

I am talking about transaction support in Lucene only. If there is a failure 
during insert/update/delete of document inside the index, there is no way to 
roll back the operation and this will keep the index in in-consistent state.

I read about Compass providing transaction support on top of Lucene APIs. But, 
I am not sure, if it is a good idea to use Compass instead of directly using 
Lucene APIs. There will always be a dependency on Compass to support the latest 
version/additions to Lucene.

Regards,
Rajesh

- Original Message 
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, November 14, 2006 10:09:49 AM
Subject: Re: Transaction support in Lucene

Rajesh parab wrote:

> Does anyone know if there is any plan in adding transaction support in Lucene?

I don't know of specific plans.

This has been discussed before on user & dev lists.  I know the
Compass project builds transactional support on top of Lucene.

Are you asking for transaction support shared with something external
(eg a database)?  Meaning updates to the DB and to Lucene either
atomically succeed or fail, together?

Or, are you asking for transactional behaviour of updates just to
Lucene, eg, you want to do a bunch of adds & deletes but have them not
be visible (committed) until the end of your transaction and roll back
on any failure?

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Index generation failure

2006-11-15 Thread Rajesh parab

Hi,

I have a question on index generation. What if the index generation fails for 
some reason, may be disk full, or any other reason? Does it make the index 
corrupt? I mean, can we still use the index created so far or we need to 
re-generate the entire index?

Secondly, what are possible scenarios for index generation failure apart from 
desk full, too many open files, etc?

Regards,
Rajesh




 

Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Double Quotes and TermQuery

2007-02-13 Thread Rajesh parab

Hi Everyone,

I understand that QueryParser allows searches using
double quote characters.

I was wondering if the double quote will also work
with TermQuery.

I am not using QueryParser in my application and
constructing queries (TermQuery, RangeQuery,
BooleanQuery, etc.) explicitly. But, it looks like
double quotes are not working with TermQuery.

For example:
query = new TermQuery(new Term("location", "\"san
mateo\""))

Any help/pointers will be much appreciated.

Regards,
Rajesh


 

Finding fabulous fares is fun.  
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel 
bargains.
http://farechase.yahoo.com/promo-generic-14795097

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Rajesh parab

One more alternative, though I am not sure if anyone
is using it.

Apache Compass has added a plug-in to allow storing
Lucene index files inside the database. This should
work in clustered environment as all nodes will share
the same database instance.

I am not sure the impact it will have on performance.

Is anyone using DB for index storage? Any drawbacks of
this approach?

Regards,
Rajesh

--- Zach Bailey <[EMAIL PROTECTED]> wrote:

> Thanks for your response --
> 
> Based on my understanding, hadoop and nutch are
> essentially the same 
> thing, with nutch being derived from hadoop, and are
> primarily intended 
> to be standalone applications.
> 
> We are not looking for a standalone application,
> rather we must use a 
> framework to implement search inside our current
> content management 
> application. Currently the application search
> functionality is designed 
> and built around Lucene, so migrating frameworks at
> this point is not 
> feasible.
> 
> We are currently re-working our back-end to support
> clustering (in 
> tomcat) and we are looking for information on the
> migration of Lucene 
> from a single node filesystem index (which is what
> we use now and hope 
> to continue to use for clients with a single-node
> deployment) to a 
> shared filesystem index on a mounted network share.
> 
> We prefer to use this strategy because it means we
> do not have to have 
> two disparate methods of managing indexes for
> clients who run in a 
> single-node, non-clustered environment versus
> clients who run in a 
> multiple-node, clustered environment.
> 
> So, hopefully here are some easy questions someone
> could shed some light on:
> 
> Is this not a recommended method of managing indexes
> across multiple nodes?
> 
> At this point would people recommend storing an
> individual index on each 
> node and propagating index updates via a JMS
> framework rather than 
> attempting to handle it transparently with a single
> shared index?
> 
> Is the Lucene index code so intimately tied to
> filesystem semantics that 
> using a shared/networked file system is infeasible
> at this point in time?
> 
> What would be the quickest time-to-implementation of
> these strategies 
> (JMS vs. shared FS)? The most robust/least
> error-prone?
> 
> I really appreciate any insight or response anyone
> can provide, even if 
> it is a short answer to any of the related topics,
> "i.e. we implemented 
> clustered search using per-node indexing with JMS
> update propagation and 
> it works great", or even something as simple as
> "don't use a shared 
> filesystem at this point".
> 
> Cheers,
> -Zach
> 
> testn wrote:
> > Why don't you check out Hadoop and Nutch? It
> should provide what you are
> > looking for.
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



   

Building a website is a piece of cake. Yahoo! Small Business gives you all the 
tools to get online.
http://smallbusiness.yahoo.com/webhosting 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Using analyzer while constructing Lucene queries

Re: Using analyzer while constructing Lucene queries

Lucene 2.3.0 and NFS

Re: Lucene 2.3.0 and NFS

Re: Lucene 2.3.0 and NFS

Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

Re: Lucene index on relational data

ParalleReader and synchronization between indexes

Re: ParalleReader and synchronization between indexes

Re: ParalleReader and synchronization between indexes

Re: ParalleReader and synchronization between indexes

Re: ParalleReader and synchronization between indexes

Re: ParalleReader and synchronization between indexes

Question on storing object hierarchy in Lucene Index

Modelling relational data in Lucene Index?

Re: Modelling relational data in Lucene Index?

Re: Modelling relational data in Lucene Index?

Transaction support in Lucene

Re: Transaction support in Lucene

Index generation failure

Double Quotes and TermQuery

Re: Clustered Indexing on common network filesystem

30 matches

Site Navigation

Mail list logo

Footer information