I just checked 2.9.1 doc from http://lucene.apache.org/java/2_9_1/api/core/index.html I can't find the RemoteSearchable you mentioned. I don't know SOLR yet, look at it tomorrow. Thanks
2009/11/16 Erick Erickson <erickerick...@gmail.com>: > I should have read more carefully. > > Look at the Searchable definition. One of the concrete realizations > of that interface is a RemoteSearchable, which is what you're > asking for I think. > > Have you thought about SOLR? It's built on top of Lucene and has > lots of stuff built in for handling distributed indexes.... > > Best > Erick > > > On Mon, Nov 16, 2009 at 9:06 AM, Wenbo Zhao <zha...@gmail.com> wrote: > >> About the ParallelMultiSearcher, I don't really know that yet, just a >> quick look at jdoc. It seems to be a searcher searches other >> searchables. If all searchables are in same jvm, it won't help. If >> there is some searchable implementation can work as proxy for a >> 'remote' lucene instance, then it might be what I'm looking for. Is >> there such a class ? >> >> 2009/11/16 Erick Erickson <erickerick...@gmail.com>: >> > I confess that I've just skimmed your e-mail, but there's absolutely >> > no requirement that the entire index fit in RAM. The fact that your >> > index is larger than available RAM isn't the reason you're hitting OOM. >> > >> > Typical reasons for this are: >> > 1> you're sorting on a field with many, many, many unique values. If >> > you're sorting on a fine-grained timestamp, this is quite possible. >> > 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching >> > on, say, one-letter wildcards. >> > 3> many other reasons. >> > >> > I agree with Jacob, jumping into a multi-machine solution without >> > understanding the problem in detail may not be your best course. >> > >> > So, can you tell us more about the conditions under which you hit >> > OOM? Maybe with more details we can come up with better solutions. >> > >> > If you absolutely *must* implement a multi-machine solution, have >> > you seen ParallelMultiSearcher? >> > >> > Best >> > Erick >> > >> > On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zha...@gmail.com> wrote: >> > >> >> Yes, exactly 'distributed'... >> >> From maintenance point of view, the 'horizontal' expandable is very >> >> important. >> >> For my case, the data file is a kind of 'history' file, categorized >> >> by date. Once the data file is indexed, it will not change, unless >> >> the searching fields changed. >> >> Say I make whole ten years data indexed, generated 400G index, >> >> requiring 8G ram. When I do backup, I have to backup the entire 400G >> >> every time. I need another 8G machine for backup. And 8G is not >> >> enough, the index is increasing everyday. >> >> Compare to distributed solution, I can split the index by year or by >> >> seasons. Say I have 10x40G index. I can easily run 10 jvm process >> >> each with 1G heap space, in 3-5 low cost not dedicated x86 machines. >> >> Consider the backup, 9 of 10 indexes are old, only need backup once, >> >> they won't change. only 1 hot index is changing everyday, so I just >> >> backup up to 40G. The spare machine is also very cheap. And the >> >> machines are so cheap, I can use VMs to run this, it's more flexible >> >> in resource management. As time goes by, I just install new jvm >> >> instance when needed. I don't worry about ram and search speed >> >> anymore. >> >> I do think there should be more bigger cases out there just like mine. >> >> The general distributed Lucene will be very useful. It will bring >> >> Lucene to more enterprise applications, or more bigger, industry >> >> applications. >> >> >> >> >> >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>: >> >> > Sounds like you may need to have some sort of distributed system, I >> just >> >> > wanted to make sure you were aware of the cost/benifits of just buying >> a >> >> big >> >> > 62bit/8Gb ram machine, vs having to not only maintain and power >> several >> >> 32 >> >> > bit machines, but also maintain and support your now more complicated >> >> code. >> >> > >> >> > I have seen it too many times developers/companies spend so much money >> in >> >> > not just the initial development, but long term support and >> maintenance >> >> that >> >> > could have been simplified by just buying a bigger/better more >> powerful >> >> > machine in the first place. >> >> > >> >> > I am interested to see what other people have to say about how to >> solve >> >> your >> >> > problem. >> >> > >> >> > Best regards, >> >> > Jacob >> >> > >> >> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote: >> >> > >> >> >> My data is categorized by date. About 14M+ docs per month, 37M+ >> terms. >> >> >> When I use 1G heap size to do search of 10 month index, I got OOM. >> >> >> The problem is I can't increase heap size in an easy way. >> >> >> I have several machines, all 32bit windows, 4G ram. >> >> >> And my goal is to index 10 year's data, plus more data every day ! >> >> >> If I put all of them together, I will need 8G+ ram to run search. >> >> >> Maybe another 8G+ ram to run indexwriter. >> >> >> >> >> >> I think to split large index into smaller indexes and use a group of >> >> >> machines to work as one is more flexible and faster compare to one >> >> >> huge ram machine. >> >> >> Any suggestions ? beside more rams. >> >> >> >> >> >> >> >> >> 2009/11/16 Jacob Rhoden <jrho...@unimelb.edu.au>: >> >> >>> >> >> >>> Not sure how large your index is, but it might be easier (if >> possible >> >> to >> >> >>> increase your memory) than to develop a fairly complicated >> alternative >> >> >>> strategy. >> >> >>> >> >> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote: >> >> >>> >> >> >>>> Hi, all >> >> >>>> I'm facing a large index, on a x86 win platform which may not have >> big >> >> >>>> enough jvm heap space to hold the entire index. >> >> >>>> So, I think it's possible to split the index into several smaller >> >> >>>> indexes, run them in different jvm instances on different machine. >> >> >>>> Then for each query, I can concurrently run it one every indexes >> and >> >> >>>> merge the result together. >> >> >>>> This can be a workaround of OutOfMemory issue. >> >> >>>> But before I start to do this, I want to ask if Lucene already have >> a >> >> >>>> solution for things like this. >> >> >>>> Thanks. >> >> >>>> >> >> >>>> -- >> >> >>>> >> >> >>>> Best Regards, >> >> >>>> ZHAO, Wenbo >> >> >>>> >> >> >>>> ======================= >> >> >>>> >> >> >>>> >> --------------------------------------------------------------------- >> >> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >>>> >> >> >>> >> >> >>> ____________________________________ >> >> >>> Information Technology Services, >> >> >>> The University of Melbourne >> >> >>> >> >> >>> Email: jrho...@unimelb.edu.au >> >> >>> Phone: +61 3 8344 2884 >> >> >>> Mobile: +61 4 1095 7575 >> >> >>> >> >> >>> >> >> >>> >> --------------------------------------------------------------------- >> >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >>> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> Best Regards, >> >> >> ZHAO, Wenbo >> >> >> >> >> >> ======================= >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> > >> >> > ____________________________________ >> >> > Information Technology Services, >> >> > The University of Melbourne >> >> > >> >> > Email: jrho...@unimelb.edu.au >> >> > Phone: +61 3 8344 2884 >> >> > Mobile: +61 4 1095 7575 >> >> > >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> Best Regards, >> >> ZHAO, Wenbo >> >> >> >> ======================= >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> > >> >> >> >> -- >> >> Best Regards, >> ZHAO, Wenbo >> >> ======================= >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > -- Best Regards, ZHAO, Wenbo ======================= --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org