Re: Using Lucene to index Wikipedia

Michael Sokolov Sun, 23 Oct 2011 13:49:34 -0700

Daniel, since no one knowledgeable has answered I'll take a stab - thereare a number of ant targets you can run, most of which incorporate someindexing step(s). Basically you can run:


ant -Dtask.alg=<alg file>

it looks as if the ant build.xml is set up to runconf/micro-standard.alg by default, but tehre are a bunch of other algfiles in the conf folder, each of which is set up to run some differentbenchmark.


The only "document" I found is the build.xml file.


On 10/20/2011 12:30 PM, Daniel Quach wrote:

How do I use the Lucene Benchmark to index a wikipedia dump? I want tobe able to execute phrase queries on the latest english wikipedia pagedump. I'm trying to look for example use cases but I haven't found any.
I downloaded the latest english dump, named:enwiki-latest-pages-articles.xml.bz2
Then I ran the command in the terminal:
java org.apache.lucene.benchmark.utils.ExtractWikipedia -i~/enwiki-latest-pages-articles.xml.bz2
which I believe extracted the pages into a directory labeled "enwiki"
Now is there something else in benchmarks that I need to run in orderto index the wiki? The README.enwiki does not really give me a clearset of instructions, in fact I'm not even sure if I was supposed torun the ExtractWikipedia class or not.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using Lucene to index Wikipedia

Reply via email to