I should have read more carefully. Look at the Searchable definition. One of the concrete realizations of that interface is a RemoteSearchable, which is what you're asking for I think.
Have you thought about SOLR? It's built on top of Lucene and has lots of stuff built in for handling distributed indexes.... Best Erick On Mon, Nov 16, 2009 at 9:06 AM, Wenbo Zhao <zha...@gmail.com> wrote: > About the ParallelMultiSearcher, I don't really know that yet, just a > quick look at jdoc. It seems to be a searcher searches other > searchables. If all searchables are in same jvm, it won't help. If > there is some searchable implementation can work as proxy for a > 'remote' lucene instance, then it might be what I'm looking for. Is > there such a class ? > > 2009/11/16 Erick Erickson <erickerick...@gmail.com>: > > I confess that I've just skimmed your e-mail, but there's absolutely > > no requirement that the entire index fit in RAM. The fact that your > > index is larger than available RAM isn't the reason you're hitting OOM. > > > > Typical reasons for this are: > > 1> you're sorting on a field with many, many, many unique values. If > > you're sorting on a fine-grained timestamp, this is quite possible. > > 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching > > on, say, one-letter wildcards. > > 3> many other reasons. > > > > I agree with Jacob, jumping into a multi-machine solution without > > understanding the problem in detail may not be your best course. > > > > So, can you tell us more about the conditions under which you hit > > OOM? Maybe with more details we can come up with better solutions. > > > > If you absolutely *must* implement a multi-machine solution, have > > you seen ParallelMultiSearcher? > > > > Best > > Erick > > > > On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zha...@gmail.com> wrote: > > > >> Yes, exactly 'distributed'... > >> From maintenance point of view, the 'horizontal' expandable is very > >> important. > >> For my case, the data file is a kind of 'history' file, categorized > >> by date. Once the data file is indexed, it will not change, unless > >> the searching fields changed. > >> Say I make whole ten years data indexed, generated 400G index, > >> requiring 8G ram. When I do backup, I have to backup the entire 400G > >> every time. I need another 8G machine for backup. And 8G is not > >> enough, the index is increasing everyday. > >> Compare to distributed solution, I can split the index by year or by > >> seasons. Say I have 10x40G index. I can easily run 10 jvm process > >> each with 1G heap space, in 3-5 low cost not dedicated x86 machines. > >> Consider the backup, 9 of 10 indexes are old, only need backup once, > >> they won't change. only 1 hot index is changing everyday, so I just > >> backup up to 40G. The spare machine is also very cheap. And the > >> machines are so cheap, I can use VMs to run this, it's more flexible > >> in resource management. As time goes by, I just install new jvm > >> instance when needed. I don't worry about ram and search speed > >> anymore. > >> I do think there should be more bigger cases out there just like mine. > >> The general distributed Lucene will be very useful. It will bring > >> Lucene to more enterprise applications, or more bigger, industry > >> applications. > >> > >> > >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>: > >> > Sounds like you may need to have some sort of distributed system, I > just > >> > wanted to make sure you were aware of the cost/benifits of just buying > a > >> big > >> > 62bit/8Gb ram machine, vs having to not only maintain and power > several > >> 32 > >> > bit machines, but also maintain and support your now more complicated > >> code. > >> > > >> > I have seen it too many times developers/companies spend so much money > in > >> > not just the initial development, but long term support and > maintenance > >> that > >> > could have been simplified by just buying a bigger/better more > powerful > >> > machine in the first place. > >> > > >> > I am interested to see what other people have to say about how to > solve > >> your > >> > problem. > >> > > >> > Best regards, > >> > Jacob > >> > > >> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote: > >> > > >> >> My data is categorized by date. About 14M+ docs per month, 37M+ > terms. > >> >> When I use 1G heap size to do search of 10 month index, I got OOM. > >> >> The problem is I can't increase heap size in an easy way. > >> >> I have several machines, all 32bit windows, 4G ram. > >> >> And my goal is to index 10 year's data, plus more data every day ! > >> >> If I put all of them together, I will need 8G+ ram to run search. > >> >> Maybe another 8G+ ram to run indexwriter. > >> >> > >> >> I think to split large index into smaller indexes and use a group of > >> >> machines to work as one is more flexible and faster compare to one > >> >> huge ram machine. > >> >> Any suggestions ? beside more rams. > >> >> > >> >> > >> >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>: > >> >>> > >> >>> Not sure how large your index is, but it might be easier (if > possible > >> to > >> >>> increase your memory) than to develop a fairly complicated > alternative > >> >>> strategy. > >> >>> > >> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote: > >> >>> > >> >>>> Hi, all > >> >>>> I'm facing a large index, on a x86 win platform which may not have > big > >> >>>> enough jvm heap space to hold the entire index. > >> >>>> So, I think it's possible to split the index into several smaller > >> >>>> indexes, run them in different jvm instances on different machine. > >> >>>> Then for each query, I can concurrently run it one every indexes > and > >> >>>> merge the result together. > >> >>>> This can be a workaround of OutOfMemory issue. > >> >>>> But before I start to do this, I want to ask if Lucene already have > a > >> >>>> solution for things like this. > >> >>>> Thanks. > >> >>>> > >> >>>> -- > >> >>>> > >> >>>> Best Regards, > >> >>>> ZHAO, Wenbo > >> >>>> > >> >>>> ======================= > >> >>>> > >> >>>> > --------------------------------------------------------------------- > >> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >>>> > >> >>> > >> >>> ____________________________________ > >> >>> Information Technology Services, > >> >>> The University of Melbourne > >> >>> > >> >>> Email: jrho...@unimelb.edu.au > >> >>> Phone: +61 3 8344 2884 > >> >>> Mobile: +61 4 1095 7575 > >> >>> > >> >>> > >> >>> > --------------------------------------------------------------------- > >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >>> > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> Best Regards, > >> >> ZHAO, Wenbo > >> >> > >> >> ======================= > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >> > >> > > >> > ____________________________________ > >> > Information Technology Services, > >> > The University of Melbourne > >> > > >> > Email: jrho...@unimelb.edu.au > >> > Phone: +61 3 8344 2884 > >> > Mobile: +61 4 1095 7575 > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > > >> > >> > >> > >> -- > >> > >> Best Regards, > >> ZHAO, Wenbo > >> > >> ======================= > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > > > -- > > Best Regards, > ZHAO, Wenbo > > ======================= > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >