test
> framework checks all these conditions. The new logic about the pre-check
> for short circuiting the disjunction creating is now different and relies
> on "correct" behaviour of all (sub-)scorers.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee
coreAll.
Should a scorer now return a -1 when in some initialized state?
Ivan
On Wed, Dec 30, 2015 at 6:17 PM, Ivan Brusic wrote:
> I potentially found the issue, but I am wondering why the code worked in
> the first place. Did the contract for the scorer change with Lucene 5?
>
> T
few more test cases, but ultimately why the code
exists in the first place and potentially replace it with base classes.
Ivan
On Tue, Dec 29, 2015 at 7:01 AM, Ivan Brusic wrote:
> Thanks Adrien. I added the BaseScorer to the gist, but I was hoping to
> achieve was which direction I should g
off, so I will get around to looking back into it soon.
Ivan
On Mon, Dec 28, 2015 at 5:41 PM, Adrien Grand wrote:
> Ivan, I can't find the BaseScorer class in the gist. Maybe you forgot to
> git add it?
>
> Le lun. 28 déc. 2015 à 23:07, Ivan Brusic a écrit :
>
> >
u can share the code of your scorer, I
> could give it a quick look.
>
> Le lun. 28 déc. 2015 à 22:18, Ivan Brusic a écrit :
>
> > I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom
> > collectors, analyzers, queries, etc.. I have migrated other code ba
I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom
collectors, analyzers, queries, etc.. I have migrated other code bases from
Lucene before (2->3, 3->4) and I always had one issue I could not eyeball!
When using a custom query, I get the same document twice in the result set.
Perhaps more of an NLP question, but are there any tests regarding
relevance for Lucene? Given an example corpus of documents, what are the
golden sets for specific queries? The Wikidump dump is used as a
benchmarking tool for both indexing and querying in Lucene, but there are
no metrics in terms
as I am consistent, it should be valid.
Now onto the real testing.
Cheers,
Ivan
On Mon, Dec 9, 2013 at 9:41 AM, Ivan Brusic wrote:
> I am currently using document-level boosts, which really translates to
> changing the norm for every field under the covers. As part of an
> experimen
I am currently using document-level boosts, which really translates to
changing the norm for every field under the covers. As part of an
experiment, I want to remove the boost, but that would require either
re-indexing content or changing the scoring algorithm (similarity).
If I create my own simi
) up to freq() times
> otherwise the behaviour is undefined. So essentially if you dont' want
> to take the TF into account in your scoring model you kind of left
> with changing your similarity.
>
> simon
>
> On Tue, Aug 6, 2013 at 1:41 AM, Ivan Brusic wrote:
> > As the subje
As the subject says, is it possible to omit the term frequencies for a
field, but still keep positions? Term frequencies are omitted for better
scoring under our model, but positions are required for span queries. Are
the two concepts related? Are they indexed in the same data structure?
One optio
page down by now - oh well. But now I have
> the Solr-only book, self-published as an e-book on Lulu.com.
>
> Yes, LIA2 is still a valuable resource. Details have changed, but most
> concepts are still valid.
>
> -- Jack Krupansky
>
> -Original Message- From: Ivan Bru
Jack, don't you also have a book coming out on O'Reilly?
http://shop.oreilly.com/product/0636920028765.do
Lucene in Action might be outdated, but many of the core concepts remain
the same. The analysis chain (analyzers/tokenizers) might have a slightly
different API, but the concepts are still va
It depends on which version of Lucene you are using. With Lucene 4,
TermAttribute has been replaced with CharTermAttribute. I believe
TermAttribute was simply deprecated in Lucene 3.
Cheers,
Ivan
On Mon, Jun 24, 2013 at 11:32 PM, 雨与泪 <1137925...@qq.com> wrote:
> I can't find the Class of TermAt
There was a similar question asked a couple of months ago, with a great
answer by Uwe Schindler:
http://search-lucene.com/m/Z2GP220szmS&subj=RE+What+is+equivalent+to+Document+setBoost+from+Lucene+3+6+inLucene+4+1+
I am still on Lucene 3.x, so I have not yet had a chance to mimic document
level bo
, "searchText",
>>>> analyzer).parse("Takeaway");
>>>>
>>>> int hitsPerPage = 10;
>>>> IndexReader reader = IndexReader.open(index);
>>>> IndexSearcher searcher = new IndexSearcher(reader);
>>
Use the explain function to understand why the query is producing the
results you see.
http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query,
int)
Does your current query return Listing 2 first? That might be because
of term fre
The snowball analyzer will not work since it analyzes the field. Use
the KeywordAnalyzer, which will preserve the text as is.
--
Ivan
On Mon, May 7, 2012 at 11:25 PM, Yogesh patel
wrote:
> I used SnowBall Analyzer with English language.In snowball analyzer is it
> possible?
>
>
> On Mon, May 7,
Thu, Apr 5, 2012 at 3:31 PM, Ivan Brusic wrote:
>
>> On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless
>> wrote:
>>> I'm assuming this is a "build once and never change" index...? Else,
>>> it sounds like you should never run forceMerge...
>&
A cache should be independent of the data store. Ehcache works well in
front of Lucene as well as a (relational) database. However, caches
work great for key/value data, so the cache value would be a result
set. Is caching the grouped result good enough?
--
Ivan
On Tue, Apr 10, 2012 at 1:40 PM,
Hi Mike,
Response inline:
On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless
wrote:
> I'm assuming this is a "build once and never change" index...? Else,
> it sounds like you should never run forceMerge...
Correct. The forceMerge was merely to preserve the previous 2.3
behavior of using opti
I recently migrated a legacy Lucene application from 2.3 to 3.5. The
code was filled with numerous custom
filter/analyzers/similarites/collectors. Took about a week to convert
all the token streams to the new API and removed deprecated classes.
Most importantly, there is a collector that enables fa
Hi Reyna,
I have never used it, but there is a WikipediaTokenizer defined in the
analyzer contrib:
http://lucene.apache.org/java/3_5_0/api/contrib-analyzers/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.html
You can find a test case for this tokenizer in the source code.
Hopefully othe
23 matches
Mail list logo