From the javadoc for DocMaker:
* *doc.stored* - specifies whether fields should be stored (default
*false*).
* *doc.body.stored* - specifies whether the body field should be
stored (default = *doc.stored*).
So ootb you won't get content stored. Does this help?
regards
-will
On 1/22/2
Hi,
I have a question regarding the format of the Index created by DocMaker,
from EnWikiContentSource.
After creating the Index from dump of all Wikipedia's articles (
https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-
pages-articles-multistream.xml.bz2), I'm having trouble understanding th
the
documents into content and title
thanksshaimaa
From: luc...@mikemccandless.com
Date: Tue, 19 Jun 2012 19:48:24 -0400
Subject: Re: Wikipedia Index
To: java-user@lucene.apache.org
I have the index locally ... but it's really impractical to send it
especially if you already have the source
I only have the source text on a mysql database
Do you know where I can download it in xml and is it possible to split the
documents into content and title
thanksshaimaa
> From: luc...@mikemccandless.com
> Date: Tue, 19 Jun 2012 19:48:24 -0400
> Subject: Re: Wikipedia Index
> T
title.if there is any place where I can get
> the index, that will save me great time
> regardsshaimaa
>
>> From: luc...@mikemccandless.com
>> Date: Tue, 19 Jun 2012 16:29:39 -0400
>> Subject: Re: Wikipedia Index
>> To: java-user@lucene.apache.org
>>
>>
3 GB RAM is plenty for indexing Wikipedia (eg, that nightly benchmark
uses a 2 GB heap).
2 cores just means it'll take longer than more cores... just use 2
indexing threads.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jun 19, 2012 at 5:26 PM, Reyna Melara wrote:
> Could it be possib
rom: luc...@mikemccandless.com
> Date: Tue, 19 Jun 2012 16:29:39 -0400
> Subject: Re: Wikipedia Index
> To: java-user@lucene.apache.org
>
> Likely the bottleneck is pulling content from the database? Maybe
> test just that and see how long it takes?
>
> 24 hours is w
Could it be possible to index Wikipedia in a 2 core machine with 3 GB in
RAM? I have had the same problem trying to index it.
I've tried with a dump from april 2011.
Thanks
Reyna
CIC-IPN
Mexico
2012/6/19 Michael McCandless
> Likely the bottleneck is pulling content from the database? Maybe
>
Likely the bottleneck is pulling content from the database? Maybe
test just that and see how long it takes?
24 hours is way too long to index all of Wikipedia. For example, we
index Wikipedia every night for our trunk/4.0 performance tests, here:
http://people.apache.org/~mikemccand/luceneb
Hi everybody
I'm using Lucene3.6 to index Wikipedia documents which is over 3 million
article, the data is on a mysql database and it is taking more than 24 hours so
far.Do you know any tips that can speed up the indexing process
here is mycode:
public static void main(String[] args) {
10 matches
Mail list logo