Re: Can Lucene unite multiple instances run as one ?

Erick Erickson Mon, 16 Nov 2009 06:13:58 -0800

I should have read more carefully.

Look at the Searchable definition. One of the concrete realizations
of that interface is a RemoteSearchable, which is what you're
asking for I think.


Have you thought about SOLR? It's built on top of Lucene and has
lots of stuff built in for handling distributed indexes....

Best
Erick


On Mon, Nov 16, 2009 at 9:06 AM, Wenbo Zhao <zha...@gmail.com> wrote:

> About the ParallelMultiSearcher, I don't really know that yet, just a
> quick look at jdoc.  It seems to be a searcher searches other
> searchables.   If all searchables are in same jvm, it won't help.   If
> there is some searchable implementation can work as proxy for a
> 'remote' lucene instance, then it might be what I'm looking for.  Is
> there such a class ?
>
> 2009/11/16 Erick Erickson <erickerick...@gmail.com>:
> > I confess that I've just skimmed your e-mail, but there's absolutely
> > no requirement that the entire index fit in RAM. The fact that your
> > index is larger than available RAM isn't the reason you're hitting OOM.
> >
> > Typical reasons for this are:
> > 1> you're sorting on a field with many, many, many unique values. If
> > you're sorting on a fine-grained timestamp, this is quite possible.
> > 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
> > on, say, one-letter wildcards.
> > 3> many other reasons.
> >
> > I agree with Jacob, jumping into a multi-machine solution without
> > understanding the problem in detail may not be your best course.
> >
> > So, can you tell us more about the conditions under which you hit
> > OOM? Maybe with more details we can come up with better solutions.
> >
> > If you absolutely *must* implement a multi-machine solution, have
> > you seen ParallelMultiSearcher?
> >
> > Best
> > Erick
> >
> > On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zha...@gmail.com> wrote:
> >
> >> Yes, exactly 'distributed'...
> >> From maintenance point of view, the 'horizontal' expandable is very
> >> important.
> >> For my case, the data file is a kind of 'history' file, categorized
> >> by date.  Once the data file is indexed, it will not change, unless
> >> the searching fields changed.
> >> Say I make whole ten years data indexed, generated 400G index,
> >> requiring 8G ram.  When I do backup, I have to backup the entire 400G
> >> every time.  I need another 8G machine for backup.  And 8G is not
> >> enough, the index is increasing everyday.
> >> Compare to distributed solution, I can split the index by year or by
> >> seasons.  Say I have 10x40G index.  I can easily run 10 jvm process
> >> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
> >> Consider the backup, 9 of 10 indexes are old, only need backup once,
> >> they won't change.  only 1 hot index is changing everyday, so I just
> >> backup up to 40G.  The spare machine is also very cheap.  And the
> >> machines are so cheap, I can use VMs to run this, it's more flexible
> >> in resource management.  As time goes by, I just install new jvm
> >> instance when needed.  I don't worry about ram and search speed
> >> anymore.
> >> I do think there should be more bigger cases out there just like mine.
> >>  The general distributed Lucene will be very useful.  It will bring
> >> Lucene to more enterprise applications, or more bigger, industry
> >> applications.
> >>
> >>
> >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>:
> >> > Sounds like you may need to have some sort of distributed system, I
> just
> >> > wanted to make sure you were aware of the cost/benifits of just buying
> a
> >> big
> >> > 62bit/8Gb ram machine, vs having to not only maintain and power
> several
> >> 32
> >> > bit machines, but also maintain and support your now more complicated
> >> code.
> >> >
> >> > I have seen it too many times developers/companies spend so much money
> in
> >> > not just the initial development, but long term support and
> maintenance
> >> that
> >> > could have been simplified by just buying a bigger/better more
> powerful
> >> > machine in the first place.
> >> >
> >> > I am interested to see what other people have to say about how to
> solve
> >> your
> >> > problem.
> >> >
> >> > Best regards,
> >> > Jacob
> >> >
> >> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
> >> >
> >> >> My data is categorized by date.  About 14M+ docs per month, 37M+
> terms.
> >> >> When I use 1G heap size to do search of 10 month index, I got OOM.
> >> >> The problem is I can't increase heap size in an easy way.
> >> >> I have several machines, all 32bit windows, 4G ram.
> >> >> And my goal is to index 10 year's data, plus more data every day !
> >> >> If I put all of them together, I will need 8G+ ram to run search.
> >> >> Maybe another 8G+ ram to run indexwriter.
> >> >>
> >> >> I think to split large index into smaller indexes and use a group of
> >> >> machines to work as one is more flexible and faster compare to one
> >> >> huge ram machine.
> >> >> Any suggestions ?  beside more rams.
> >> >>
> >> >>
> >> >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>:
> >> >>>
> >> >>> Not sure how large your index is,  but it might be easier (if
> possible
> >> to
> >> >>> increase your memory) than to develop a fairly complicated
> alternative
> >> >>> strategy.
> >> >>>
> >> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
> >> >>>
> >> >>>> Hi, all
> >> >>>> I'm facing a large index, on a x86 win platform which may not have
> big
> >> >>>> enough jvm heap space to hold the entire index.
> >> >>>> So, I think it's possible to split the index into several smaller
> >> >>>> indexes, run them in different jvm instances on different machine.
> >> >>>> Then for each query, I can concurrently run it one every indexes
> and
> >> >>>> merge the result together.
> >> >>>> This can be a workaround of OutOfMemory issue.
> >> >>>> But before I start to do this, I want to ask if Lucene already have
> a
> >> >>>> solution for things like this.
> >> >>>> Thanks.
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Best Regards,
> >> >>>> ZHAO, Wenbo
> >> >>>>
> >> >>>> =======================
> >> >>>>
> >> >>>>
> ---------------------------------------------------------------------
> >> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>>>
> >> >>>
> >> >>> ____________________________________
> >> >>> Information Technology Services,
> >> >>> The University of Melbourne
> >> >>>
> >> >>> Email: jrho...@unimelb.edu.au
> >> >>> Phone: +61 3 8344 2884
> >> >>> Mobile: +61 4 1095 7575
> >> >>>
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Best Regards,
> >> >> ZHAO, Wenbo
> >> >>
> >> >> =======================
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >
> >> > ____________________________________
> >> > Information Technology Services,
> >> > The University of Melbourne
> >> >
> >> > Email: jrho...@unimelb.edu.au
> >> > Phone: +61 3 8344 2884
> >> > Mobile: +61 4 1095 7575
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Best Regards,
> >> ZHAO, Wenbo
> >>
> >> =======================
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
>
>
>
> --
>
> Best Regards,
> ZHAO, Wenbo
>
> =======================
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Can Lucene unite multiple instances run as one ?

Reply via email to