Re: Format of Wikipedia Index

2018-01-22 Thread Will Martin
From the javadoc for DocMaker: * *doc.stored* - specifies whether fields should be stored (default *false*). * *doc.body.stored* - specifies whether the body field should be stored (default = *doc.stored*). So ootb you won't get content stored. Does this help? regards -will On 1/22/2

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Will Martin
https://doi.org/10.3115/981574.981579 On 12/20/2016 12:21 PM, Dwaipayan Roy wrote: Hello, Can anyone help me understand the scoring function in the LMJelinekMercerSimilarity class? The scoring function in LMJelinekMercerSimilarity is shown below: -

Re: Multi-field IDF

2016-11-18 Thread Will Martin
In this work, we aim to improve the fi eld weighting for structured doc- ument retrieval. We fi rst introduce the notion of fi eld relevance as the generalization of fi eld weights, and discuss how it can be estimated using relevant documents, which eff ectively implements relevance feedback for f

Re: Multi-field IDF

2016-11-17 Thread Will Martin
are you familiar with pivoted normalized document length practice or theory? or croft's recent work on relevance algorithms accounting for structured field presence? On 11/17/2016 5:20 PM, Nicolás Lichtmaier wrote: That depends on what you want. In this case I want to use a discrimination po

Re: Searching in a bitMask

2016-08-27 Thread will martin
hi aren’t we waltzing terribly close to the use of a bit vector in your field caches? there’s no reason to not filter longword operations on a cache if alignment is consistent across multiple caches just be sure to abstract your operations away from individual bits….imo -will > On Aug 27, 2

Re: how to backup index files with Replicator

2016-01-23 Thread will martin
Hi Dancer: Found this thread with good info that may be irrelevant to your scenario but, this in particular struck me writer.waitForMerges(); writer.commit(); replicator. replicate(new IndexRevision(writer)); writer.close(); — even though writer.close() can

Re: SolrIndexSearcher throws Misleading Error Message When timeAllowed is Specified.

2016-01-08 Thread will martin
Please read the javadoc for System.nanoTime(). I won’t bore you with the details about how computer clocks work. > On Jan 8, 2016, at 4:14 AM, Vishnu Mishra wrote: > > I am using Solr 5.3.1 and we are facing OutOfMemory exception while doing > some complex wildcard and proximity query (even fo

Re: Any lucene query sorts docs by Hamming distance?

2015-12-24 Thread will martin
m distance 0 to 3. > > 2015-12-22 21:42 GMT+08:00 will martin : > >> Yonghui: >> >> Do you mean sort, rank or score? >> >> Thanks, >> Will >> >> >> >>> On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote: >>> >&

Re: range query highlighting

2015-12-23 Thread will martin
Todd: "This trick just converts the multi term queries like PrefixQuery or RangeQuery to boolean query by expanding the terms using index reader." http://stackoverflow.com/questions/7662829/lucene-net-range-queries-highlighting beware cost. (my comment) g’luck will > On Dec 23, 2015, at 4:49

Re: Any lucene query sorts docs by Hamming distance?

2015-12-22 Thread will martin
Yonghui: Do you mean sort, rank or score? Thanks, Will > On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote: > > Hi, > > Is there any query can sort docs by hamming distance if field values are > same length, > > Seems fuzzy query only works on edit distance. ---

Re: Jensen–Shannon divergence

2015-12-14 Thread will martin
cool list. Thanks Uwe. Opportunities to gain competitive advantage in selected domains. > On Dec 14, 2015, at 6:02 PM, Uwe Schindler wrote: > > Hi, > > Next to BM25 and TF-IDF, Lucene also privides many more similarity > implementations: > > https://lucene.apache.org/core/5_4_0/core/org/apac

Re: Jensen–Shannon divergence

2015-12-13 Thread will martin
g'luck > On Dec 13, 2015, at 10:55 AM, Shay Hummel wrote: > > Hi > > I am sorry but I didn't understand your answer. Can you please elaborate? > > Shay > > On Sun, Dec 13, 2015 at 3:41 PM will martin wrote: > >> expand your due d

Re: Jensen–Shannon divergence

2015-12-13 Thread will martin
expand your due diligence beyond wikipedia: i.e. http://ciir.cs.umass.edu/pubfiles/ir-464.pdf > On Dec 13, 2015, at 8:30 AM, Shay Hummel wrote: > > LMDiricletbut its feasibilit

Re: debugging growing index size

2015-11-13 Thread will martin
Hi Rob: Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter Levart and Uwe Schindler on a lucene-dev thread 9/9/2015? MappedByteBuffer …. what OS are you on Rob? What JVM? http://bugs.java.com/view_bug.do?bug_id=4724038 http://mail-archives.apache.org/mod_mbox/lucene-dev/

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-30 Thread will martin
call IndexReader.checkIntegrity. Mike McCandless http://blog.mikemccandless.com On Tue, Sep 29, 2015 at 9:00 PM, will martin wrote: > Ok So I'm a little confused: > > The 4.10 JavaDoc for LiveIndexWriterConfig supports volatile access on > a flag to setCheckIntegrityAtMerge ..

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
rom the runtime system. The file system is EMC Isilon via NFS. Jim ____ From: will martin Sent: 29 September 2015 14:29 To: java-user@lucene.apache.org Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x? This sounds robust. Is the index

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
o we implemented a check step once the index is in its final state to ensure that it is OK. So, since we want to do the check post-merge, is there a way to disable the check during merge so we don't have to do two checks? Thanks! Jim ____ From: will mar

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
So, if its new, it adds to pre-existing time? So it is a cost that needs to be understood I think. And, I'm really curious, what happens to the result of the post merge checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if you let it merge anyway could you get a false

RE: Solr java.lang.OutOfMemoryError: Java heap space

2015-09-28 Thread will martin
http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/ -Original Message- From: Ajinkya Kale [mailto:kaleajin...@gmail.com] Sent: Monday, September 28, 2015 2:46 PM To: solr-u...@lucene.apache.org; java-user@lucene.apache.org Subject: Solr jav

RE: hello,I have a problem about lucene,please help me to explain ,thank you

2015-09-22 Thread will martin
Hi: Would you mind doing websearch and cataloging the relevant pages into a primer? Thx, Will -Original Message- From: 王建军 [mailto:jianjun200...@163.com] Sent: Tuesday, September 22, 2015 4:02 AM To: java-user@lucene.apache.org Subject: hello,I have a problem about lucene,please help me t

RE: A really hairy token graph case

2014-10-24 Thread Will Martin
lemma2 PI 0 lemmaN PI 0 comp0-1 PI 0 comp1-1 PI 0 comp0-N compM-N That is, group all the first-components, and all the second-components. But now the bits and pieces of the compounds are interspersed. Maybe that's OK. On Fri, Oct 2

RE: A really hairy token graph case

2014-10-24 Thread Will Martin
HI Benson: This is the case with n-gramming (though you have a more complicated start chooser than most I imagine). Does that help get your ideas unblocked? Will -Original Message- From: Benson Margulies [mailto:bimargul...@gmail.com] Sent: Friday, October 24, 2014 4:43 PM To: java-us