Re: Query joining 2 indexes

frer Fri, 18 Dec 2009 11:25:52 -0800

Hey Erick,

Ok that's what I thought.  Unfortunately doing that will pretty much take
the same time as what I was doing (at least I think it will)


What I ended up doing is fetching them all with a big query.  I obviously
needed to change the MaxClauseCount but in my case this is an acceptable
solution... unless you (or anyone else) has a better idea.

Here is my resulting code:

            QueryParser parser = new QueryParser("ID", new
StandardAnalyzer());
            StringBuffer idQueryBuffer = new StringBuffer();
            Iterator<String> i = hourlyIds.iterator();
            while(i.hasNext()) {
                String id = i.next();
                idQueryBuffer.append(id).append(" ");
            }
            Query query = parser.parse(idQueryBuffer.toString());

            BooleanQuery booleanQuery = new BooleanQuery();
            booleanQuery.add(query, BooleanClause.Occur.MUST);
            
            TopDocs countDocs = searcher.search(booleanQuery, null, 1);


Thanks a lot for your help Erick and happy holidays to everyone,

François



Erick Erickson wrote:
> 
> From your original post...
> 
> <<<My Daily index contains many fields and some of them are IDs to my
> Hourly
> Index>>>
> 
> 
> So it's just the IDs to your hourly index contained in each Daily
> document.
> Problem here is that this ID is probably NOT the Lucene ID, so my original
> idea needs some refinement. Assuming that your IDs to the hourly index
> are contained in your daily document, you should be able to use TermDocs
> to get the Hourly documents efficiently.
> 
> Note to self: Engage brain.
> 
> FWIW
> Erick
> 
> On Fri, Dec 18, 2009 at 11:42 AM, frer <francois.e...@polymtl.ca> wrote:
> 
>>
>> Thanks for your answer,
>>
>> I didn't think that using:          Document doc irHourly.doc(<id in
>> hit>);
>> would be much faster than using the searcher. I will try that.
>>
>> I have one question though: what is the <id in hit> you reffer to.  Since
>> I
>> have searched in the daily index, what is the corresponding hit on the
>> hourly index if i haven't searched in the hourly index yet?
>>
>> Thanks for your help,
>>
>> Francois
>>
>> PS: Concerning your idea to merge the hourly and daily index, I did think
>> about that but the quantity of info I have is already way too large in
>> both
>> indexes (about 100 fields in each) to repeat that information so many
>> times.
>> My Index is already of 10GB.
>>
>>
>>
>> Erick Erickson wrote:
>> >
>> > Well, making a large OR clause is definitely more efficient than making
>> > N different requests, but you would have to search the results. It
>> doesn't
>> > sound very performant.
>> >
>> > Could you go to 50,000 ids? yes, but you have to fiddle with
>> > setMaxClauseCount
>> > because Lucene defaults to a max of 1,024.
>> >
>> > There's no way I know of to emulate a DB join. Usually, whenever I try
>> > the answer is to flatten my data but I don't see a good way to do that
>> > in this case.
>> >
>> > Hmmm, what do you think would happen if you opened an IndexReader
>> > into your Hourly index straight from your Daily query? Something like:
>> >
>> > IndexReader irHourly = <open index reader to Hourly>.
>> > for (each hit in Daily) {
>> >     for (each id in this Hourly doc) {
>> >          Document doc irHourly.doc(<id in hit>);
>> >          addToResponse(doc); // your method here.....
>> >     }
>> > }
>> >
>> >
>> > You *probably* want to keep the Hourly reader open between requests,
>> but
>> > since the above isn't searching, you *might* be able to open it each
>> time.
>> > I'd
>> > go for keeping it open between requests if at all possible.
>> >
>> > And here's a wild and crazy idea. Remember that Lucene documents don't
>> > require that *any* fields be in common. It might make your management
>> > easier if you *combined* the indexes. Crudely, prefix each Daily field
>> > with
>> > "D_" and each hourly field with "H_" and put 'em all in the same index.
>> > I'm
>> > not claiming that's a good solution in your case, but I thought I'd
>> > mention
>> > it
>> > as a possibility. You still can't do joins on them though.....
>> >
>> > Erick
>> >
>> >
>> >
>> > On Fri, Dec 18, 2009 at 9:08 AM, François Eric
>> > <francois.e...@jarca.ca>wrote:
>> >
>> >> Hello,
>> >>
>> >> I have a performance problem and would need expert advice on how to go
>> >> about fixing it:
>> >>
>> >> I currently have 2 indexes: Daily and Hourly.  The Daily index
>> contains
>> >> about 1,000,000 documents and my Hourly index approximately:
>> 24,000,000
>> >> documents.  My Daily index contains many fields and some of them are
>> IDs
>> >> to
>> >> my Hourly Index.
>> >> What I want to do is fetch data in one request (if possible).
>> >> Right now I do it in many requests:
>> >> 1- Get the matching Daily documents (say it returns 500 documents)
>> >> 2- For each of these documents, locate the Hourly Index Id and fetch
>> it.
>> >>
>> >> Therefore I make 501 requests to lucene.  This causes some performance
>> >> issues I guess because of the overhead to  making a request to Lucene.
>> >>
>> >> Is it possible to do this in 1 request?  I'm thinking no because I'm
>> not
>> >> sure what the result set would be but maybe I'm missing something.
>> >>
>> >> If not I guess it would be possible to build a query with my 500
>> hourly
>> >> ids
>> >> and make a OR between them to make it in 2 requests....but then I have
>> to
>> >> find the matching documents.  Will this overflow if I have 50000 ids
>> in
>> >> my
>> >> query?
>> >>
>> >> Anyway, I just want advice on how one would address this situation.
>> >>
>> >> Thank you very much,
>> >>
>> >> François
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Query-joining-2-indexes-tp26843980p26846129.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Query-joining-2-indexes-tp26843980p26848434.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Query joining 2 indexes

Reply via email to