Re: Lucene Seaches VS. Relational database Queries

Ananth T. Sarathy Thu, 13 Apr 2006 10:33:39 -0700

Also we need to address the Join Between A and B and C, which I don't know
see how with out taking out values out of the hit list.





On 4/13/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote:
>
> I am not sure having an index each for each table solves the problem.
>
> (Going by the schema I put in the earlier mail)
> You have an index each for tables A, B and C. What is the
> lucene-equivalent of the db query
> A.field1 == value1 and B.field2 == value2 and C.field3 == value3.
>
> You cant use MultiSearcher as MultiSearcher executes the same query
> across all indexes. Its not going to return any results.
>
> The remaining alternative is to split the query ourselves into 3 parts,
> one for each index and then do the AND-operation in our program.
> Needless to say, this can get terribly inefficient as query size and
> nesting increases.
>
>
> -----Original Message-----
> From: Chris Lu [mailto:[EMAIL PROTECTED]
> Sent: Thursday, April 13, 2006 10:21 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Seaches VS. Relational database Queries
>
>
> I agree with Jelda.
>
> Lucene is more document-centric. Storing the relationship is not a
> good idea. It's better to simply have 2 indexes. Usually when users
> search, they can choose which index they want.
>
> Of course, building the indexes will take more time to process-data.
>
> Lucene can not replace relational DB altogether. One reason is Lucene
> is more like object-oriented.
>
> Chris Lu
> ---------------------------------------
> Full-Text Lucene Search on Any Databases
> http://www.dbsight.net
> Faster to Setup than reading marketing materials!
>
> On 4/13/06, Ramana Jelda <[EMAIL PROTECTED]> wrote:
> > No.. I don't see your solution is performant..
> > If each lucene Document corresponds to a row in 'A join B' then Index
> > explodes..
> > Index size drastically increases.
> >
> > Why not then creating two indexs A and B.
> > And search for A and then from obtained A documents information search
> in B.
> >
> > It seems for me more performant than indexing all 'A join B'
> documents.
> >
> > Any commenters?
> >
> > Jelda
> >
> > > -----Original Message-----
> > > From: Satuluri, Venu_Madhav [mailto:[EMAIL PROTECTED]
> > > Sent: Thursday, April 13, 2006 6:15 PM
> > > To: java-user@lucene.apache.org
> > > Subject: RE: Lucene Seaches VS. Relational database Queries
> > >
> > > I think you are asking if we can retain 1:n relationships in lucene.
> > >
> > > Ok, I'll go out on a limb and give my solution. Say you have
> > > a table A and table B with B having multiple rows associated
> > > to each row in A.
> > > Also your documents are centered around A, i.e. all your
> > > queries return some row(s) of A, not B, but you should be
> > > able to query on fields in B.
> > >
> > >
> > > In such a case, you need to have multiple documents for each row in
> A.
> > > To be more specific, if a row in A has 5 corresponding rows
> > > in B, then there must be 5 Documents in lucene index
> > > corresponding to A. In other words, each lucene Document
> > > corresponds to a row in 'A join B'.
> > >
> > > I am not sure of this scheme. If there are more tables, then
> > > this quickly explodes the no. of documents. We'll have as
> > > many documents as will be there in {A join B join C join D..
> > > }. Plus, we'll need to remove Documents which correspond
> > > logically to the same row in A from the Hits.
> > >
> > > Is there a better way to do this? Or I don't make sense?
> > >
> > >
> > > -----Original Message-----
> > > From: Ananth T. Sarathy [mailto:[EMAIL PROTECTED]
> > > Sent: Thursday, April 13, 2006 9:04 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Lucene Seaches VS. Relational database Queries
> > >
> > >
> > > Ok,
> > >  Some of the stuff makes  some sense. I was a little loopy
> > > from lack of
> > > sleep and some of these solutions don't really cover my concerns....
> > >
> > >
> > > Let's take this movie example. If each member of a production Crew
> can
> > > have
> > > multiple titles that come from a lookup table of Distinct Jobs
> > >
> > > Titles
> > > Assistant Producer
> > > Producer
> > > Executive Producer
> > > Director
> > > Director Trainee
> > > Stunt Director
> > >
> > > In the Database there would be a Assocation Table Linking each Crew
> > > member
> > > the titles they had
> > >
> > > Crew_Titles
> > > Crew_ID   Title
> > > 1             Producer
> > > 1
> > >
> > > On 4/12/06, Nadav Har'El <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Chris Hostetter <[EMAIL PROTECTED]> wrote on 12/04/2006
> > > 01:41:37
> > > > AM:
> > > > > : them in one field).  One of the problems I see would be with
> > > values
> > > > that
> > > > > : over lap (Example, name where one name is Jason Bateman, and
> one
> > > is
> > > > Jason
> > > > > : Bateman Black, and it would be hard to replicate the Discrete
> > > Search
> > > > for
> > > > >
> > > > > they way field values are "analyzed" is extremely configurable
> --
> > > down
> > > > to
> > > > > the individual field level.  Which means that while you
> > > can have an
> > > > actor
> > > > > field where you can do loose text searching for "bateman" and
> get
> > > back
> > > > > movies staring "Jason Bateman" and "Jason Bateman Black" (and
> even
> > > Guido
> > > > > Batemans" if you use stemming) you can also have another
> > > field using
> > > a
> > > > > KeywordAnalyzer such that a record with teh values "Jason
> Bateman"
> > > and
> > > > > "Jack Black" will only be matched if hte user searches for
> "Jason
> > > > Bateman"
> > > > > or "Jack Black" ... searching for "Bateman Jack" or "Black
> Jason"
> > > will
> > > > not
> > > > > work.
> > > >
> > > > Another possible trick is to have one field, but mark its end with
> > > special
> > > > tokens, say "^" and "$", so that "Jason Bateman" gets
> > > indexed as four
> > > > tokens:
> > > >      ^ Jason Bateman $
> > > > Then, if you want to search for the name Jason Bateman and that
> name
> > > only,
> > > > just search for the phrase "^ Jason Bateman $" - and only this
> entry
> > > will
> > > > match. (you can also continue to search this field normally)
> > > >
> > > > If you'll think about this, you'll notice that you don't
> > > actually need
> > > > the beginning-of-field marker ("^") because it's easy to
> > > recognize the
> > > > beginning of a field because the position there is 0.
> Unfortunately,
> > > > I don't know how to match position 0 using the standard
> QueryParser,
> > > > but you can do it with the SpanFirstQuery: for example if we index
> > > > Jason Bateman as the three tokens
> > > >      Jason Bateman $
> > > > then we can search for it using something like
> > > >      SpanQuery[] terms = {
> > > >            new SpanTermQuery(new Term("actor", "Jason")),
> > > >            new SpanTermQuery(new Term("actor", "Bateman")),
> > > >            new SpanTermQuery(new Term("actor", "$")) };
> > > >      new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3);
> > > > (or something like that... I didn't test this)
> > > >
> > > >
> > > > --
> > > > Nadav Har'El
> > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > > >
> > >
> > >
> > > --
> > > Ananth T Sarathy
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Ananth T Sarathy

Re: Lucene Seaches VS. Relational database Queries

Reply via email to