Hi
LogMP *always* picks adjacent segments together. Therefore, if you have
segments S1, S2, S3, S4 where the date-wise sort order is S4>S3>S2>S1, then
LogMP will pick either S1-S4, S2-S4, S2-S3 and so on. But always adjacent
segments and in a raw (i.e. it doesn't skip segments).
I guess what both
Hi list
I’m trying to figure out how customizable scoring and weighting is in the
Lucene API. I read about the API’s but still can’t figure out if the following
is possible.
I would like to do normal document text indexing, but I would like to control
the weight added to tokens my self, also I
@Mike,
I had suggested the same approach in one of my previous mails, where-by
each segment records min/max timestamps in seg-info diagnostics and use it
for merging adjacent segments.
"Then, I define a TimeMergePolicy extends LogMergePolicy and define the
segment-size=Long.MAX_VALUE - SEG_LEAST_
Hi All,
I have a question regarding retrieval of documents by lucene.
I know lucene uses many files on disk to keep documents, each comprising
fields in it, and uses many IR algorithms, and inverted index to match
documents.
My question is :
1. How lucene stores these documents inside file system
Hi Uwe,
thanks a lot, I will try with that.
Uwe Schindler wrote
> Hi andy,
>
> unfortunately, that is not easy to show with one simple code. You have to
> change the Similarity used.
>
> Before starting to do this, you should be sure, that this affects you
> users. The example you gave is sh
Right, I think you'll need to use either of the LogXMergePolicy (or
subclass LogMergePolicy and make your own): they always pick adjacent
segments to merge.
SortingMP let's you pass in the MP to wrap, so just pass in a LogXMP,
and then sort by timestamp?
Mike McCandless
http://blog.mikemccandles
Why not use LogByteSizeMP in conjunction w/ SortingMP? LogMP picks adjacent
segments and SortingMP ensures the merged segment is also sorted.
Shai
On Wed, Feb 12, 2014 at 3:16 PM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:
> Yes exactly as you have described.
>
> Ex: Cons
Yes exactly as you have described.
Ex: Consider Segment[S1,S2,S3 & S4] are in reverse-chronological order and
goes for a merge
While SortingMergePolicy will correctly solve the merge-part, it does not
however play any role in picking segments to merge right?
SMP internally delegates to TieredMer
OK, I see (early termination).
That's a challenge, because you really want the docs sorted backwards
from how they were added right? And, e.g., merged and then searched
in "reverse segment order"?
I think you should be able to do this w/ SortingMergePolicy? And then
use a custom collector that
Mike,
All our queries need to be sorted by timestamp field, in descending order
of time. [latest-first]
Each segment is sorted in itself. But TieredMergePolicy picks arbitrary
segments and merges them [even with SortingMergePolicy etc...]. I am trying
to avoid this and see if an approximate globa
It sounds like you are just indexing at TextField and then calling
getDocTermOrds? This then requires a slow "uninvert" step...Hmm, how
are you adding this field to your documents?
Instead, you should use SortedSetDocValuesField, which will store the
doc values directly in the index, and loading
Hi andy,
unfortunately, that is not easy to show with one simple code. You have to
change the Similarity used.
Before starting to do this, you should be sure, that this affects you users.
The example you gave is showing very short documents. Lucene is optimized to
handle larger documents, for
Thanks Uwe,could you please give me a more detail example about how to change
the lucene behavior
Uwe Schindler wrote
> Hi Erick,
>
> a statement like " Adding &debug=all to the query will show you if this is
> the case" will not help a Lucene user, as it is only available in the Solr
> server.
Hi Erick,
a statement like " Adding &debug=all to the query will show you if this is the
case" will not help a Lucene user, as it is only available in the Solr server.
But Andy uses Lucene directly. In his case he should use IndexSearcher's
explain functionalities to retrieve a structured outpu
thanks for your reply Erick, this is the case ,But how can I keep the
precision of the fields' length?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect-the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp4111390p4116832.html
15 matches
Mail list logo