Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-01 Thread Atri Sharma
Mine is atris for github, atri for JIRA On Mon, Aug 1, 2022 at 4:03 PM Tomoko Uchida wrote: > > Hi Mike, Marcus, and Praveen: > > I verified the added two mappings - these Jira users have activity on > Lucene Jira, also corresponding GitHub accounts are valid. > - marcussorealheis > - pru30 > > T

Re: Potential bug

2021-06-14 Thread Atri Sharma
+1 to Adrien. Let's keep the tone neutral. On Mon, 14 Jun 2021, 16:00 Adrien Grand, wrote: > Baris, you called out an insult from Alessandro and your replies suggest > anger, but I couldn't see an insult from Alessandro actually. > > +1 to Alessandro's call to make the tone softer on this discu

[ANNOUNCE] Apache Lucene 8.7.0 released

2020-11-04 Thread Atri Sharma
03/11/2020, Apache Lucene™ 8.7 available The Lucene PMC is pleased to announce the release of Apache Lucene 8.7. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-01 Thread Atri Sharma
D (binding) On Wed, 2 Sep 2020 at 01:51, Ryan Ernst wrote: > Dear Lucene and Solr developers! > > > > Sorry for the multiple threads. This should be the last one. > > > > In February a contest was started to design a new logo for Lucene > > [jira-issue]. The initial attempt [first-vote] to call

Re: Resizable LRUQueryCache

2020-03-05 Thread Atri Sharma
On Fri, Mar 6, 2020 at 1:04 AM Aadithya C wrote: > > In my personal opinion, there are a few advantages of resizing - > > > 1) The size of the cache is unpredictable as there is a fixed(guesstimate) > accounting for the key size. With a resizable cache, we can potentially > cache heavier queries a

Re: PhraseQuery

2020-01-24 Thread Atri Sharma
PhraseQuery enforces the order of terms specified and needs an exact match of order of terms unless slop is specified. When appending terms, term pos numbers need to be incremental in the builder On Fri, Jan 24, 2020 at 11:15 PM wrote: > > Hi,- > > how do i enforce the order of sequence of ter

Re: Lucene index directory grows and shrinks

2019-11-04 Thread Atri Sharma
This are typical symptoms of an index merge. However, it is hard to predict more without knowing more data. What is your segment size limit? Have you changed the default merge frequency or max segments configuration? Would you have an estimate of ratio of number of segments reaching max limit / to

Re: Parameterized queries in Lucene

2019-10-23 Thread Atri Sharma
query many times with a different parameter means recreating the > > Query > > > every time. > > > > > > I admit that creation of the Lucene query is not the most expensive > part > > of > > > the planning process still we can gain something by not creati

Re: Parameterized queries in Lucene

2019-10-21 Thread Atri Sharma
I am curious — what use case are you targeting to solve here? In relational world, this is useful primarily due to the fact that prepared statements eliminate the need for re planning the query, thus saving the cost of iterating over a potentially large combinatorial space. However, for Lucene, th

Re: partial match

2019-08-05 Thread Atri Sharma
Yes, that will allow specifying wildcard as the first character, but it can lead to very slow queries, especially on larger indices. On Mon, Aug 5, 2019 at 6:08 PM wrote: > > Does QueryParser.setAllowLeadingWildCard(true) work? > > this will allow to use wildcard as first char in the search strin

Re: partial match

2019-08-04 Thread Atri Sharma
It is not very clear as to what is it that you are trying to achieve here. If you want to match similar terms as the one you specify in the query (test, tesk, lest etc), then a fuzzy query (~) should suffice. Note that you cannot specify a mandatory part of the text that has to match in every resul

Re: Lucene 5.2.1 score for MUST_NOT query

2019-08-04 Thread Atri Sharma
MUST_NOT represents a clause which must not match against a document in order for it to be qualified as a hit (think of SQL’s NOT IN). MUST_NOT clauses are used as filters to eliminate candidate documents. On Sun, 4 Aug 2019 at 23:11, Claude Lepere wrote: > Hello! > > What score of a hit in res

Re: Impact and WAND

2019-07-11 Thread Atri Sharma
ments in postings lists. > Then this information is leveraged by block-max WAND in order to skip > low-scoring blocks. > > This does indeed help avoid reading norms, but also document IDs and > term frequencies. > > On Wed, Jul 10, 2019 at 4:10 PM Wu,Yunfeng > mailto:wuyunfen.

Re: Multi field Lucene index

2019-07-05 Thread Atri Sharma
Should not matter, AFAIK. If your first MUST clause in a BooleanQuery fails to match for a document, then there is no point for the engine to match further clauses, right? On Fri, Jul 5, 2019 at 7:56 PM wrote: > > Re-sending and please let me know Your amazing thoughts > > Happy July 4th > > Bes

Re: how to find out each score contribution from booleanquery components

2019-06-26 Thread Atri Sharma
n required clause (+countryDFLT:united > (countryDFLT:uniten)^0.4202 +countryDFLT:states > (countryDFLT:statesir)^0.56) > 0.0 = Failure to meet condition(s) of required/prohibited clause(s) >0.0 = no match on required clause (countryDFLT:united) > 0.0 = no matching

Re: how to find out each score contribution from booleanquery components

2019-06-26 Thread Atri Sharma
It depends a lot on the actual clauses (whether they are SHOULD, MUST, MUST_NOT), each query’s type (phrase, term etc). Could you post your query and the explain plan of IndexSearcher post the rewrite? On Wed, 26 Jun 2019 at 6:46 PM, wrote: > Hi,- > > how can one find out each score contribut

Re: Incremental Lucene Index

2019-06-24 Thread Atri Sharma
Yes, Lucene supports incremental indexing. Note that the underlying structure is append only, so you are still paying the cost of delete + insert, but the semantics are what you expect them to be. On Mon, 24 Jun 2019 at 7:18 PM, Sukhendu Kumar Biswal wrote: > Hi Team, > Does Lucene support incre

Re: FuzzyQuery

2019-06-10 Thread Atri Sharma
ing > in the call. > > Best regards > > > > On 6/10/19 10:47 AM, baris.ka...@oracle.com wrote: > > How do i check how it is indexed? lowecase or uppercase? > > > > only way is now to by testing. > > > > i am using standardanalyzer. > > > >

Re: Lucene FuzzyQuery

2019-06-10 Thread Atri Sharma
> i make sure i specify a string with 1 edit away misspelled and that > never gets hit but the word with correct spelling is in the index. How long are your query terms and the actual word? For fuzzy query to match, your edit distance needs to be less than the smaller of the query and the actual w

Re: Sampled Queries -- Use Cases and Feedback

2019-06-09 Thread Atri Sharma
Any thoughts on this? I am envisioning applications to machine learning systems, where the training dataset might be a small sample of the entire dataset, and the user wants scoring to be done only on samples of the dataset. On Fri, Jun 7, 2019 at 5:45 PM Atri Sharma wrote: > > Hi All, >

Re: FuzzyQuery

2019-06-09 Thread Atri Sharma
On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida wrote: > > Hi, > > What analyzer do you use for the text field? Is the term "Main" > correctly indexed? Agreed. Also, it would be good if you could post your actual code. What analyzer are you using? If you are using StandardAnalyzer, then all of your

Re: Lucene FuzzyQuery

2019-06-07 Thread Atri Sharma
>However, with MUST > clause, that restriction is lifted. I meant that with a SHOULD clause, that restriction is lifted i.e. a query can score hits even if SHOULD clause does not match the hit (but other MUST clauses do match).

Re: Lucene FuzzyQuery

2019-06-07 Thread Atri Sharma
Is your FuzzyQuery matching any documents at all? It would be helpful if you could post your entire query. It might be happening that your Fuzzy query is not matching any hits, but when you specify it as a MUST clause, then it becomes a necessary condition for any hit to be returned by your overal

Sampled Queries -- Use Cases and Feedback

2019-06-07 Thread Atri Sharma
Hi All, While working on a new Query type, I was inclined to think of a couple of use cases where the documents being scored need not be all of the data set, but a sample of them. This can be useful for very large datasets, where a query is only interested in getting the "feel" of the data, and ot