2GB size is a limitation of OS and/or file systems, not of the index as supported by Lucene. There is some other kind of limitation in Lucene: number of documents < 2147483648 However the size of the lucene index may reach tens and hundreds of GB way before that.
If you are thinking about BIG indexes, you should forget windows+fat32. On linux with i've seen big indexes, like 80M of relatively small documents, about 50Gb on disk with reasonable performance (on pretty cheap machine) If you need more documents, better performance, etc, you need to partition your index into several smaller indexes running on separate hosts, call them in parallel and then merge results in a single resultset. This way of operation is not "built-in" into Lucene, but you can relativelly easy build a customized wrapper to do that. AFAIK something simmilar powers google: each box handles about 10M docs, there are thousands of boxes which do searches in parallel. On Mon, May 18, 2009 at 12:42, raistlink <ela...@gmail.com> wrote: > > Hi, > I think I've read that there is a limit for de index, may be 2Gb for fat > machines. If this is right I ask you for good resources (webs or books) > about programming search engines to know about the techniques used by big > search engines to search among such huge data. > > Thanks > -- > View this message in context: > http://www.nabble.com/Max-size-of-index--How-do-search-engines-avoid-this--tp23594241p23594241.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org