Re: Format of Wikipedia Index

2018-01-22 Thread Will Martin
From the javadoc for DocMaker: * *doc.stored* - specifies whether fields should be stored (default *false*). * *doc.body.stored* - specifies whether the body field should be stored (default = *doc.stored*). So ootb you won't get content stored. Does this help? regards -will On 1/22/2

Format of Wikipedia Index

2018-01-22 Thread Armins Stepanjans
Hi, I have a question regarding the format of the Index created by DocMaker, from EnWikiContentSource. After creating the Index from dump of all Wikipedia's articles ( https://dumps.wikimedia.org/enwiki/latest/enwiki-latest- pages-articles-multistream.xml.bz2), I'm having trouble understanding th