Re: Duplicate values in search

2016-01-04 Thread Ivan Brusic
test > framework checks all these conditions. The new logic about the pre-check > for short circuiting the disjunction creating is now different and relies > on "correct" behaviour of all (sub-)scorers. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee

Re: Duplicate values in search

2015-12-30 Thread Ivan Brusic
coreAll. Should a scorer now return a -1 when in some initialized state? Ivan On Wed, Dec 30, 2015 at 6:17 PM, Ivan Brusic wrote: > I potentially found the issue, but I am wondering why the code worked in > the first place. Did the contract for the scorer change with Lucene 5? > > T

Re: Duplicate values in search

2015-12-30 Thread Ivan Brusic
few more test cases, but ultimately why the code exists in the first place and potentially replace it with base classes. Ivan On Tue, Dec 29, 2015 at 7:01 AM, Ivan Brusic wrote: > Thanks Adrien. I added the BaseScorer to the gist, but I was hoping to > achieve was which direction I should g

Re: Duplicate values in search

2015-12-29 Thread Ivan Brusic
off, so I will get around to looking back into it soon. Ivan On Mon, Dec 28, 2015 at 5:41 PM, Adrien Grand wrote: > Ivan, I can't find the BaseScorer class in the gist. Maybe you forgot to > git add it? > > Le lun. 28 déc. 2015 à 23:07, Ivan Brusic a écrit : > > >

Re: Duplicate values in search

2015-12-28 Thread Ivan Brusic
u can share the code of your scorer, I > could give it a quick look. > > Le lun. 28 déc. 2015 à 22:18, Ivan Brusic a écrit : > > > I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom > > collectors, analyzers, queries, etc.. I have migrated other code ba

Duplicate values in search

2015-12-28 Thread Ivan Brusic
I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom collectors, analyzers, queries, etc.. I have migrated other code bases from Lucene before (2->3, 3->4) and I always had one issue I could not eyeball! When using a custom query, I get the same document twice in the result set.

Relevancy tests

2014-06-12 Thread Ivan Brusic
Perhaps more of an NLP question, but are there any tests regarding relevance for Lucene? Given an example corpus of documents, what are the golden sets for specific queries? The Wikidump dump is used as a benchmarking tool for both indexing and querying in Lucene, but there are no metrics in terms

Re: Changing similarity at query time

2013-12-09 Thread Ivan Brusic
as I am consistent, it should be valid. Now onto the real testing. Cheers, Ivan On Mon, Dec 9, 2013 at 9:41 AM, Ivan Brusic wrote: > I am currently using document-level boosts, which really translates to > changing the norm for every field under the covers. As part of an > experimen

Changing similarity at query time

2013-12-09 Thread Ivan Brusic
I am currently using document-level boosts, which really translates to changing the norm for every field under the covers. As part of an experiment, I want to remove the boost, but that would require either re-indexing content or changing the scoring algorithm (similarity). If I create my own simi

Re: Omitting term frequencies while preserving positions

2013-08-06 Thread Ivan Brusic
) up to freq() times > otherwise the behaviour is undefined. So essentially if you dont' want > to take the TF into account in your scoring model you kind of left > with changing your similarity. > > simon > > On Tue, Aug 6, 2013 at 1:41 AM, Ivan Brusic wrote: > > As the subje

Omitting term frequencies while preserving positions

2013-08-05 Thread Ivan Brusic
As the subject says, is it possible to omit the term frequencies for a field, but still keep positions? Term frequencies are omitted for better scoring under our model, but positions are required for span queries. Are the two concepts related? Are they indexed in the same data structure? One optio

Re: Lucene in Action

2013-07-10 Thread Ivan Brusic
page down by now - oh well. But now I have > the Solr-only book, self-published as an e-book on Lulu.com. > > Yes, LIA2 is still a valuable resource. Details have changed, but most > concepts are still valid. > > -- Jack Krupansky > > -Original Message- From: Ivan Bru

Re: Lucene in Action

2013-07-10 Thread Ivan Brusic
Jack, don't you also have a book coming out on O'Reilly? http://shop.oreilly.com/product/0636920028765.do Lucene in Action might be outdated, but many of the core concepts remain the same. The analysis chain (analyzers/tokenizers) might have a slightly different API, but the concepts are still va

Re: Can't find the Class of TermAttribute

2013-06-25 Thread Ivan Brusic
It depends on which version of Lucene you are using. With Lucene 4, TermAttribute has been replaced with CharTermAttribute. I believe TermAttribute was simply deprecated in Lucene 3. Cheers, Ivan On Mon, Jun 24, 2013 at 11:32 PM, 雨与泪 <1137925...@qq.com> wrote: > I can't find the Class of TermAt

Re: Document boosting

2013-04-30 Thread Ivan Brusic
There was a similar question asked a couple of months ago, with a great answer by Uwe Schindler: http://search-lucene.com/m/Z2GP220szmS&subj=RE+What+is+equivalent+to+Document+setBoost+from+Lucene+3+6+inLucene+4+1+ I am still on Lucene 3.x, so I have not yet had a chance to mimic document level bo

Re: Search Ranking

2012-05-17 Thread Ivan Brusic
, "searchText", >>>> analyzer).parse("Takeaway"); >>>> >>>>         int hitsPerPage = 10; >>>>         IndexReader reader = IndexReader.open(index); >>>>         IndexSearcher searcher = new IndexSearcher(reader); >>

Re: Search Ranking

2012-05-16 Thread Ivan Brusic
Use the explain function to understand why the query is producing the results you see. http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query, int) Does your current query return Listing 2 first? That might be because of term fre

Re: Lucene Question about Query

2012-05-08 Thread Ivan Brusic
The snowball analyzer will not work since it analyzes the field. Use the KeywordAnalyzer, which will preserve the text as is. -- Ivan On Mon, May 7, 2012 at 11:25 PM, Yogesh patel wrote: > I used SnowBall Analyzer with English language.In snowball analyzer is it > possible? > > > On Mon, May 7,

Re: Slow merging after upgrading to 3.5

2012-04-18 Thread Ivan Brusic
Thu, Apr 5, 2012 at 3:31 PM, Ivan Brusic wrote: > >> On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless >> wrote: >>> I'm assuming this is a "build once and never change" index...?  Else, >>> it sounds like you should never run forceMerge... >&

Re: Query for "cache" mechanism to used

2012-04-11 Thread Ivan Brusic
A cache should be independent of the data store. Ehcache works well in front of Lucene as well as a (relational) database. However, caches work great for key/value data, so the cache value would be a result set. Is caching the grouped result good enough? -- Ivan On Tue, Apr 10, 2012 at 1:40 PM,

Re: Slow merging after upgrading to 3.5

2012-04-05 Thread Ivan Brusic
Hi Mike, Response inline: On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless wrote: > I'm assuming this is a "build once and never change" index...?  Else, > it sounds like you should never run forceMerge... Correct. The forceMerge was merely to preserve the previous 2.3 behavior of using opti

Slow merging after upgrading to 3.5

2012-04-05 Thread Ivan Brusic
I recently migrated a legacy Lucene application from 2.3 to 3.5. The code was filled with numerous custom filter/analyzers/similarites/collectors. Took about a week to convert all the token streams to the new API and removed deprecated classes. Most importantly, there is a collector that enables fa

Re: is it possible to index wiki markup files?

2012-01-11 Thread Ivan Brusic
Hi Reyna, I have never used it, but there is a WikipediaTokenizer defined in the analyzer contrib: http://lucene.apache.org/java/3_5_0/api/contrib-analyzers/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.html You can find a test case for this tokenizer in the source code. Hopefully othe