Re: Format of Wikipedia Index

2018-01-22 Thread Will Martin
From the javadoc for DocMaker: * *doc.stored* - specifies whether fields should be stored (default *false*). * *doc.body.stored* - specifies whether the body field should be stored (default = *doc.stored*). So ootb you won't get content stored. Does this help? regards -will On 1/22/2

Format of Wikipedia Index

2018-01-22 Thread Armins Stepanjans
Hi, I have a question regarding the format of the Index created by DocMaker, from EnWikiContentSource. After creating the Index from dump of all Wikipedia's articles ( https://dumps.wikimedia.org/enwiki/latest/enwiki-latest- pages-articles-multistream.xml.bz2), I'm having trouble understanding th

MultiFieldQueryParser over Analyzer

2018-01-22 Thread Chitra
Hi Team, I have a doubt on parsing a query using MultiFieldQueryParser over StandardAnalyzer. searchWord: abc.def_...@global-international.com while performing a search using the code, Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40, new > StringReader("")); >

Re: Custom explain implementation - how to transfer the data

2018-01-22 Thread Adrien Grand
In general, the right way to proceed is to recompute the map the same way in explain() that it would be computed in score(). Le ven. 19 janv. 2018 à 12:57, Vadim Gindin a écrit : > Assume, I have some scorer. During the execution of score() method, I'm > caching a document id and scoring details

Dubious tokenizing with WordDelimiterGraphFilter

2018-01-22 Thread Parit Bansal
Hi, I have a question about the tokenization performed by WordDelimiterGraphFilter. I am not sure if this is a bug or maybe I am missing some flags in setting up the GraphFilter. Please have a look.  Lucene version used is 6.6.1 Here is a gist code for it: https://gist.github.com/parit/cecf