Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

2012-09-19 Thread Reyna Melara
Hi, I'm trying to index a big set of plain text files, almost 8,104,467 files, that are all under the same directory /media/MAFALDA/yohasebewp2txt/Archivos and want to get my index under /media/MAFALDA/LuceneIndex using IndexFiles.java program from the documentation. I'm using Netbeans IDE, and I

Re: Wikipedia Index

2012-06-19 Thread Reyna Melara
Could it be possible to index Wikipedia in a 2 core machine with 3 GB in RAM? I have had the same problem trying to index it. I've tried with a dump from april 2011. Thanks Reyna CIC-IPN Mexico 2012/6/19 Michael McCandless > Likely the bottleneck is pulling content from the database? Maybe >

Re: is it possible to index wiki markup files?

2012-01-11 Thread Reyna Melara
Thanks to all that have done a reply to my question. Send regards, Reyna 2012/1/11 Michael Wechner > Maybe Tika is also of help to you > > http://tika.apache.org/ > > HTH > > Michael > > Am 11.01.12 20:13, schrieb Reyna Melara: > >> Hi, my name is Reyna

is it possible to index wiki markup files?

2012-01-11 Thread Reyna Melara
Hi, my name is Reyna Melara I'm a PhD student form Mexico, and I have a set of 11,051,447 files with txt extension but the content of each file is in fact in wiki format, I want and I need them to be indexed, but I don't know if I have to convert this content to flat text, I have been r