Re: how to define a pool for Searcher?

Mohammad Norouzi Wed, 07 Mar 2007 21:02:00 -0800

Erick and Mark thank you very much, you really give me good information. so
I decided to try HitCollector and see how it works. but about storing
document ID I dont think it is good because the result may be exceed than 50
000 and I just were optimistic about telling that number ;)


any way, I will try them, even storing IDs.

Thank you very much again.

--
Regards,
Mohammad

On 3/7/07, Mark Miller <[EMAIL PROTECTED]> wrote:


You should not be returning 50 thousand documents. Since you are
implementing paging, you should only return enough to cover your page
size. If a user is viewing page 1 with documents 1-10, you send back
information for 10 of the docs. On page 2, 10-20, you send back
information for 10 of the docs, after executing a new search and
skimming off the correct docs.

Also, unless your documents are *very* big, 500,000 documents on any
sort of modern hardware is a cakewalk for Lucene.

Using Hits as it was meant to be used is generally a poor decision in a
multi-user environment in my opinion. You will be faster and scale
better using a stateless paradigm instead of taking advantage of Hits
caching. Besides that, if you are currently loading up 50 thousand
documents from Hits, you are doing something that is extremely
inefficient. As Erick said, Hits will continuously re-query to fill it
ups cache. After grabbing info for a 100 docs it will re query, then
after 200 it will require, etc.

Stateless man. Trust me.

- Mark

Mohammad Norouzi wrote:
> yes I am very concerned about this because we have a big project with
> many
> users and I am responsible for this. the thing that preoccupied my
> mind is
> application performance because there is more than 500 thousands records
> (documents).
>
> a single search may returns about 50 thousand documents and it is not
> possible to put all of them into a say java.util.List  so I have to
> keep the
> searcher open and move forward or backward through the Hits and when the
> user clicked on "Finish" button or the time exceeds over than a specific
> time,(or the session destroyed)  so I set a flag to true, then the other
> session can access that searcher without closing any searcher or reader.
>
> any way, your comments are useful for me.
> thanks
>
> On 3/7/07, Mark Miller <[EMAIL PROTECTED]> wrote:
>>
>> To address your hits question: I wouldn't keep hits around, but would
>> re-search instead. It is often more of a headache than a time savings
to
>> keep around all of the Hits objects and to have to manage them. I made
>> my own Hits object that does no caching because of this. Pagination is
>> often best done by re-querying.
>>
>> Also, keep in mind that you prob won't have 1000 Similarities...you
will
>> prob have much closer to 1 <g>, maybe a couple if you have created a
>> custom one. The biggest chance you have more than one Searcher cached
>> for an Index is if you have a MultiSearcher cached that searches over
>> it. Out of the box, indexAccessor does not handle MultiSearchers
>> perfectly though...it does not check out a Searcher for each of the
>> underlying Indexes, so you will have to do that your self...then
>> remember to release them all when you release the MultiSearcher.
>>
>> I think in general, you are over concerned. IndexAccessor will handle
>> most of this for you without much intervention on your part.
>>
>> - Mark
>>
>> Mohammad Norouzi wrote:
>> > Hello Mark,
>> > there is something vague for me about the Lucene-indexAccessor you
>> > created
>> > and my problem.
>> > as I see your codes, you create IndexSearcher and put it into a Map
>> > and the
>> > only thing that separate them is the Similarity the have. so if say
>> 1000
>> > users with different Similarity connect to my application there will
>> > be 1000
>> > IndexSearcher with their own internal Reader.
>> > now, in my case, I have an IndexResultSet just like
java.sql.ResultSet
>> > which
>> > it contains a Hits. so a user may go forward or backward through the
>> > Hits'
>> > documents and actually every user are doing this job.
>> >
>> > to do so, I have to find the Similarity that a user working with it
>> > and find
>> > the right IndexSearcher in order to support pagination for her. is
>> this
>> > right? I mean can I trust to Similarity to find the right IndexReader
>> > that a
>> > user have used it before?
>> >
>> > another question is, how about I have one IndexReader for all my
>> > IndexSearcher and manage them simultaneously to access that single
>> > Reader.?
>> >
>> > thank you very much in advance
>> >
>> >
>> > On 2/22/07, Mark Miller <[EMAIL PROTECTED]> wrote:
>> >>
>> >> I would not do this from scratch...if you are interested in Solr go
>> that
>> >> route else I would build off
>> >> http://issues.apache.org/jira/browse/LUCENE-390
>> >>
>> >> - Mark
>> >>
>> >> Mohammad Norouzi wrote:
>> >> > Hi all,
>> >> > I am going to build a Searcher pooling. if any one has
>> experience on
>> >> > this, I
>> >> > would be glad to hear his/her recommendation and suggestion. I
want
>> to
>> >> > know
>> >> > what issues I should be apply. considering I am going to use
>> this on
>> a
>> >> > web
>> >> > application with many user sessions.
>> >> >
>> >> > thank you very much in advance.
>> >>
>> >>
---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >> For additional commands, e-mail: [EMAIL PROTECTED]
>> >>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to define a pool for Searcher?

Reply via email to