From the javadoc for DocMaker:
* *doc.stored* - specifies whether fields should be stored (default
*false*).
* *doc.body.stored* - specifies whether the body field should be
stored (default = *doc.stored*).
So ootb you won't get content stored. Does this help?
regards
-will
On
https://doi.org/10.3115/981574.981579
On 12/20/2016 12:21 PM, Dwaipayan Roy wrote:
Hello,
Can anyone help me understand the scoring function in the
LMJelinekMercerSimilarity class?
The scoring function in LMJelinekMercerSimilarity is shown below:
-
ver "or" is, perhaps, not
so usual in titles. Then, "or" will have a high IDF value and be treated
as an important term. That's bad.
One solution I see is to modify the Similarity to have a global, or
multi-field IDF value. This value would include in its calculation
longer
n be bad for very short fields (like titles). One
example of this problem: If I don't delete stop words, then "or", "and",
etc. should be dealt with low IDF values, however "or" is, perhaps, not
so usual in titles. Then, "or" will have a high IDF value
hi
aren’t we waltzing terribly close to the use of a bit vector in your field
caches?
there’s no reason to not filter longword operations on a cache if alignment is
consistent across multiple caches
just be sure to abstract your operations away from individual bits….imo
-will
> On Aug
() can trigger a commit. hmmm
thread:
http://grokbase.com/t/lucene/java-user/143dsnrxh8/replicator-how-to-use-it
<http://grokbase.com/t/lucene/java-user/143dsnrxh8/replicator-how-to-use-it>
-will
> On Jan 23, 2016, at 4:39 AM, Dancer <462921...@qq.com> wrote:
>
> Hi,
> h
Please read the javadoc for System.nanoTime(). I won’t bore you with the
details about how computer clocks work.
> On Jan 8, 2016, at 4:14 AM, Vishnu Mishra wrote:
>
> I am using Solr 5.3.1 and we are facing OutOfMemory exception while doing
> some complex wildcard and proximity query (even fo
m distance 0 to 3.
>
> 2015-12-22 21:42 GMT+08:00 will martin :
>
>> Yonghui:
>>
>> Do you mean sort, rank or score?
>>
>> Thanks,
>> Will
>>
>>
>>
>>> On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote:
>>>
>&
Todd:
"This trick just converts the multi term queries like PrefixQuery or RangeQuery
to boolean query by expanding the terms using index reader."
http://stackoverflow.com/questions/7662829/lucene-net-range-queries-highlighting
beware cost. (my comment)
g’luck
will
> On Dec 2
Yonghui:
Do you mean sort, rank or score?
Thanks,
Will
> On Dec 22, 2015, at 4:02 AM, Yonghui Zhao wrote:
>
> Hi,
>
> Is there any query can sort docs by hamming distance if field values are
> same length,
>
> Seems fuzzy query onl
t;> On Sun, Dec 13, 2015 at 8:30 AM, Shay Hummel
>> wrote:
>>
>>> Hi
>>>
>>> I need help to implement similarity between query model and document
>> model.
>>> I would like to use the JS-Divergence
>>> <https://en.wikipedia.org/
g'luck
> On Dec 13, 2015, at 10:55 AM, Shay Hummel wrote:
>
> Hi
>
> I am sorry but I didn't understand your answer. Can you please elaborate?
>
> Shay
>
> On Sun, Dec 13, 2015 at 3:41 PM will martin wrote:
>
>> expand your due d
expand your due diligence beyond wikipedia:
i.e.
http://ciir.cs.umass.edu/pubfiles/ir-464.pdf
> On Dec 13, 2015, at 8:30 AM, Shay Hummel wrote:
>
> LMDiricletbut its feasibilit
/201509.mbox/%3c55f0461a.2070...@gmail.com%3E
hth
-will
> On Nov 13, 2015, at 11:23 AM, Rob Audenaerde wrote:
>
> I'm currently running using NIOFS. It seems to prevent the issue from
> appearing.
>
> This is a second run (with applied deletes etc)
>
> rauden
Hi Rob:
Do you understand how deletes work and how an index is compacted?
There's some configuration/runtime activities you don't mention And
you make testing process sound like a mirror of production? (Including
configuration?)
-will
On 11/5/15 7:33 AM, Rob Audenaerde wrot
Kumaran -
Aren't you creating an unworkable scenario for sorting?
-will
On 10/27/15 5:49 AM, Kumaran Ramasubramanian wrote:
Hi All,
i have indexed module wise data in same index. In this case, we index two
types of field in same name in two different document like this.
*docu
Hi Bhaskar:
or everyone's benefit, I hope you will collate the emails into a
wiki page and carry it forward. Meritocracy's might have rtfm'd the
whole thing.
With all respect:
Will
On 10/5/15 1:06 PM, Bhaskar wrote:
Hi,
Actually I am looking for auto complete only.
call IndexReader.checkIntegrity.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Sep 29, 2015 at 9:00 PM, will martin wrote:
> Ok So I'm a little confused:
>
> The 4.10 JavaDoc for LiveIndexWriterConfig supports volatile access on
> a flag to setCheckIntegrityAtMerge ..
rom the runtime system.
The file system is EMC Isilon via NFS.
Jim
From: will martin
Sent: 29 September 2015 14:29
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?
This sounds robust. Is the index
This sounds robust. Is the index batch creation workflow a separate process?
Distributed shared filesystems?
--will
-Original Message-
From: McKinley, James T [mailto:james.mckin...@cengage.com]
Sent: Tuesday, September 29, 2015 2:22 PM
To: java-user@lucene.apache.org
Subject: Re
So, if its new, it adds to pre-existing time? So it is a cost that needs to be
understood I think.
And, I'm really curious, what happens to the result of the post merge
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if
you let it merge anyway could you get a false
http://opensourceconnections.com/blog/2014/07/13/reindexing-collections-with-solrs-cursor-support/
-Original Message-
From: Ajinkya Kale [mailto:kaleajin...@gmail.com]
Sent: Monday, September 28, 2015 2:46 PM
To: solr-u...@lucene.apache.org; java-user@lucene.apache.org
Subject: Solr jav
Hi:
Would you mind doing websearch and cataloging the relevant pages into a
primer?
Thx,
Will
-Original Message-
From: 王建军 [mailto:jianjun200...@163.com]
Sent: Tuesday, September 22, 2015 4:02 AM
To: java-user@lucene.apache.org
Subject: hello,I have a problem about lucene,please help me
lemma2 PI 0
lemmaN PI 0
comp0-1 PI 0
comp1-1 PI 0
comp0-N
compM-N
That is, group all the first-components, and all the second-components.
But now the bits and pieces of the compounds are interspersed. Maybe that's OK.
On Fri, Oct 2
HI Benson:
This is the case with n-gramming (though you have a more complicated start
chooser than most I imagine). Does that help get your ideas unblocked?
Will
-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com]
Sent: Friday, October 24, 2014 4:43 PM
To: java
Hi Michel,
You can do all of this with Lucene however not with a standard index/query
operators. At Attivio we have a custom Lucene index structure + custom
query operators that support relational joins across records in an index. You
can write the queries in our standard query language or run
On Fri, Jan 8, 2010 at 16:27, Jamie wrote:
> Hi Ian / Will
>
> Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. it
> could check the capitalization of the first letter of a word and whether or
> not the word is the start of sentence. If so, it could
result = new PorterStemFilter(result);
PorterStemFilter is changing Lowe to low. Change your tokenizer so
that Lowe's is tokenized as a single token, and that should avoid it.
Will
-
To unsubscribe, e-mail: java-user-unsubscr.
t; which means you have to analyze it.
>
>
> I think Will is suggesting that he doesn't want to have to analyze it
> *again* -
> if he really has different fields for every tag type, it would get
> prohibitively
> expensive in terms of Indexing CPU usage to retokenize over an
ng like this:
Document doc = new Document();
doc.add(new Field("h1", "hello\0world"));
doc.add(new Field("alltext", "hello\0world\0goodnight\0moon"));
I think that makes sense. Comments?
Will
>
> HTH
> Erick
>
>
> On Tue, Oct 27, 2009 at
hat's the best way to approach this? My initial thought is to make
some kind of MultiAnalyzer that consumes the text and produces several
token streams, which are added to the document one at a time. Is that
a reasonable strategy?
Thanks!
Will
do fuzzy search ie post1:NW10 post2:7?Y and so on.
- will
-Original Message-
From: Chris Mannion [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 06, 2008 12:28 PM
To: java-user@lucene.apache.org
Subject: Postcode/zipcode search
Hi all
I've got a bit of a niggling problem with how one of
#x27; which are
probably the ones you would want anyways.
- will
-Original Message-
From: Duan, Nick [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 04, 2008 2:29 PM
To: java-user@lucene.apache.org
Subject: RE: Why indexing database is necessary? (RE: indexing database)
Hmm, I guess t
t to say that a
search engine is always better, just the it often times is for when the
inputs and outputs are carefully defined.
- will
-Original Message-
From: Darren Hartford [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 04, 2008 1:52 PM
To: java-user@lucene.apache.org
Subject: RE: Why in
a CustomScoreQuery combined with a FieldCacheSource that holds the
the lat/lon might work.
- will
On Aug 29, 2007, at 11:15 AM, Mike wrote:
I've searched the mailing list archives, the web, read the FAQ, etc
and I
don't see anything relevant so here it goes…
I'm trying
;
System.out.println(q.getDocValues().getMinValue());
- will
On Aug 24, 2007, at 5:17 PM, Grant Ingersoll wrote:
Can you provide more details on what you are trying to do? Are you
trying to collect information from the FunctionQuery after it is done?
-Grant
On Aug 24, 2007, at 5:03 PM
at a basic level yes, just getting the avg/min/max from a function
query would be awesome. once that is in place getting more complex
stats would be gravy. i need to do something in this area i just
want to know if there is some more fundamental that i'm working against.
- will
O
resting to anyone other than me?
- will
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ard only mode and just reverse out the strings on the
display side.
This method makes a number of assumptions about index size constraints,
character sets; ie ymmv.
- will
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Friday, July 20, 2007 10:05
Solr, which is built on top of lucene and adds highlighting among other
features, gets close to what you want. Check out:
http://wiki.apache.org/solr/HighlightingParameters
- will
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Friday, June 01, 2007 8:57 AM
To
It seems to me like a french stemmer is what you need instead of a fuzzy
query. What analyzer are you using for your documents and queries ?
-- Stefan
[EMAIL PROTECTED] wrote:
Hi!
I have a problem in dealing whith a fuzzy query in Lucene 2.1.0.
In order to explain my problem, I illustrate it
This makes perfect sense to me. Of course the hard part will be how to
extract the acronyms.
-- Stefan
Hannes Carl Meyer wrote:
Hi All,
I would like enable users to do an acronym search on my index.
My idea is the following:
1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document
Hey guys, here is the exact thing you want, check out this searchable archive
hosted by Nabble: http://www.nabble.com/Lucene-f44.html - it archives all
Lucene mailing lists into a forum, you can cross search all or drill down and
search a single list. You can also narrow search by author, sort
The challenge with this is always not breaking the HTML page itself.
-Original Message-
From: Fred Toth [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 24, 2005 3:47 PM
To: java-user@lucene.apache.org
Subject: Using Highlighter to highlight entire HTML documents?
Hi,
We have a need to pres
I would recommend not optimizing your index that often. Another solution is to
use the multisearcher and keep one fully optimized primary index, and an
unoptimized secondary index that you add to. Then search against both. During
off peak hours you could merge the secondary index onto your pr
45 matches
Mail list logo