Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-24 Thread Mikhail Khludnev
Right. TextField.TYPE_NOT_STORED should be used then. On Thu, Apr 24, 2025 at 10:37 AM Saha, Rajib wrote: > Thanks Mikhail for the suggestion. > Now the previous exception has gone. But a new exception has come from > Field.java. > Here below are the excep

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-23 Thread Mikhail Khludnev
/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer>[3] > = (byte)(uid< > http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid > >>>24); > PayloadAttributeImpl attributeImpl = new > PayloadAttributeImpl(new BytesRef(buffer)); > addAttributeImpl(attributeImpl); > returnToken = true; > } > public boolean incrementToken() throws IOException { > if (returnToken){ >returnToken = false; >return true; > } > else { >return false; > } > } > } > > Regards > Rajib > > -- Sincerely yours Mikhail Khludnev

Re: Synonyms and searching

2025-03-05 Thread Mikhail Khludnev
include both "licence" and "license"), but the phrase > substitutions are not. "http", "proxy" and "server " are there, but none of > the conjunctions appear. > > > > I don't think synonym replacement should be occurring at search time, if > only for performance reasons, but what have I missed in how this should > work? Am I chasing the impossible dream? > > > > cheers > > T > > > > > > -- Sincerely yours Mikhail Khludnev

Re: How can I know the lucene index version from files

2025-03-02 Thread Mikhail Khludnev
Rechtschreibfehler kann ich nicht > ausschliessen > > Am 02.03.2025 um 08:18 schrieb Mikhail Khludnev : > > > > Hi Daniel. > > Giving >Lucene41< my bet it's written by 4.1..4.9 version. > > Presumably you may get 4.9 (a decade old, heh) and invok

Re: How can I know the lucene index version from files

2025-03-01 Thread Mikhail Khludnev
th the same version as the files above? > > Cheers. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Suggestions for modeling an Index

2025-01-20 Thread Mikhail Khludnev
t; 500), so I think this strategy is OK, but I don't like the > idea of having "dynamic fields". > > Given the intersection query requirement, is there a better way to model > the index, aside from creating multiple documents per Root entry? > > Regards > -- Sincerely yours Mikhail Khludnev

Re: Reg Migration to 10.0.0 lucene core jar

2024-12-14 Thread Mikhail Khludnev
ucene-analyzers-common-4.7.0.jar, > lucene-queries-4.7.0.jar, lucene-queryparser-4.7.0.jar, > lucene-sandbox-4.7.0.jar. When lucene core is upgraded is it recommended to > upgrade all these jars. > > > > Regards, > > Lavanya > -- > Lavanya > Give out ' What you want most' to come back > -- Sincerely yours Mikhail Khludnev

Re: Lucene Query Metrics

2024-12-04 Thread Mikhail Khludnev
> Cosmos > > > DB? > > > > > > *Thanks and Regards,* > > > *Ashwini Singh* > > > > > > > > -- > > > *Thanks and Regards,* > *Ashwini Singh* > -- Sincerely yours Mikhail Khludnev

Re: Custom Query Implementation

2024-12-03 Thread Mikhail Khludnev
Thanks for clarification Michael! On Tue, Dec 3, 2024 at 1:56 PM Michael Sokolov wrote: > Sparse is meaning two different things here. In the case you found Mikhail, > it means not every document has a value for some vector field. I think the > question here is about very high di

Re: Custom Query Implementation

2024-12-02 Thread Mikhail Khludnev
. On Mon, Dec 2, 2024 at 8:03 PM Viacheslav Dobrynin wrote: > Hi! > > I need to index sparse vectors, whereas as I understand it, > KnnFloatVectorField is designed for dense vectors. > Therefore, it seems that this approach will not work. > > вс, 1 дек. 2024 г. в 18:36, Mikhai

Re: Custom Query Implementation

2024-12-01 Thread Mikhail Khludnev
Hi, May it look like KnnFloatVectorField(... DOT_PRODUCT) and KnnFloatVectorQuery?

Re: Custom Query Implementation

2024-11-30 Thread Mikhail Khludnev
ilder.add(FieldValueAsScoreQuery(field_name, value), > BooleanClause.Occur.SHOULD) > return builder.build() > > it seems to work, but I'm not sure if it's a good way to implement it. > Example 2: > I would also like to use this mechanism for the following index: > term1 -> (doc_id1, score), (doc_idN, score), ... > termN -> (doc_id1, score), (doc_idN, score), ... > Where resulting score will be calculated as: > sum(scores) by doc_id for terms in some query > > Thank you in advance! > > Best Regards, > Viacheslav Dobrynin! > -- Sincerely yours Mikhail Khludnev

Re: fuzzy search and distance tilde

2024-08-13 Thread Mikhail Khludnev
ired by some code. I suppose it's up to custom code around org.events.business.search.operations.SearchOperation.doRun(SearchOperation.java:202) -- Sincerely yours Mikhail Khludnev

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Mikhail Khludnev
r >kind:"VISI.Story" or kind:" VISI.Dataset" or kind:DataDiscoveryAlbum or >kind:DataDiscovery) > > > > Any comment on the different result set for the above two queries would be > really appreciated. > > > > Regards > > Rajib > > > -- Sincerely yours Mikhail Khludnev

Re: Seeking guidance on uncompressed storage options in Lucene 9.7.0

2024-02-27 Thread Mikhail Khludnev
t; Any guidance or insights would be greatly appreciated. Thank you for your > time and assistance. > > Hari > -- Sincerely yours Mikhail Khludnev

Re: Need suggestion for a Lucene upgrade scenario

2024-01-30 Thread Mikhail Khludnev
ValueTermAttribute.toString(); > > //How to get startOffset & endOffset as like in Lucene 2.4 > > //Do some calculation based on startOffset & endOffset > } > > Please let me know, if there is any further information is required from > my side. > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev

Re: Regarding extracting Token as String from TokenStream.

2024-01-25 Thread Mikhail Khludnev
ther information from my side. > > Thanks In Advance. > > Regards > Rajib > > -- Sincerely yours Mikhail Khludnev

Re: NumericRangeQuery in Lucene 5.5.5: replacing the deprecated setBoost while keeping the NumericRange type?

2023-11-25 Thread Mikhail Khludnev
e preserving the > NumericRangeQuery type? > BoostQuery doesn't allow this and I haven't found a way. > > Thanks for your help. > > Claude Lepère > -- Sincerely yours Mikhail Khludnev

Re: Filter question

2023-11-21 Thread Mikhail Khludnev
will return results and the second > will not. > > > > However would a query like "NOT product:c" be OK as a filter query if it > was > combined with other queries as per the pseudocode above? > > > > I don't think it's significant but for what it's worth this application is > still using Lucene 8_6.3. > > > > cheers > > T > > > > -- Sincerely yours Mikhail Khludnev

Re: Stored field already compressed

2023-11-14 Thread Mikhail Khludnev
essing it again. > This seems wasteful. Is there a solution to this? Or would I have to > implement my own Codec or some such? I started digging down that route and > it doesn’t look pretty. 😊 > > > > Tony > > -- Sincerely yours Mikhail Khludnev

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Mikhail Khludnev
it's something over there https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159 On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner wrote: > Hi Mikhail > > Thank you very

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Mikhail Khludnev
gt; with the code above? > I can do this, but want to make sure, that I don’t update it in a wrong > way. > > > > -- Sincerely yours Mikhail Khludnev

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Mikhail Khludnev
Hello, I'm surprised and in doubt it may happen. Would you mind to upload a short test reproducing it? On Wed, Sep 20, 2023 at 11:44 PM Amitesh Kumar wrote: > Thanks Mikhail! > > I have tried all other tokenizers from Lucene4.4. In case of > WhitespaceTokwnizer, it loses roma

Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Mikhail Khludnev
gt; > On the implementation front, I am using a set of filters like > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer > StandardTokenizer. > > Per my analysis, StandardTOkenizer strips off the % sign and hence the > behavior.Has someone faced similar requirement? Any help/guidance is highly > appreciated. > -- Sincerely yours Mikhail Khludnev

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-09-01 Thread Mikhail Khludnev
hael Wechner < >> michael.wech...@wyona.com> wrote: >> >>> Hi Together >>> >>> You might be interesed in this paper / article >>> >>> https://arxiv.org/abs/2308.14963 >>> >>> Thanks >>> >>> Michael >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Sincerely yours Mikhail Khludnev

Re: Reindexing leaving behind 0 live doc segments

2023-08-28 Thread Mikhail Khludnev
rld = iw.getPooledInstance(sci, true); > segmentReader = rld.getReader(IOContext.READ); > > //process all live docs similar to above using the segmentReader. > > rld.release(segmentReader); > iw.release(rld); > }finally{ >if (iwRef != null) { >iwRef.decref(); > } > } > > Help would be much appreciated! > > Thanks, > Rahul > -- Sincerely yours Mikhail Khludnev

Re: What is the approximate processing mechanism for field length?

2023-08-10 Thread Mikhail Khludnev
ple, "keywords" field has 78 > tokens. I think its field_length(dl) is 78, but lucene handled as > 76(approximate) as described in function explainTF(Explaination freq, long > norm). >    Thank you very much for your reading and look forward to your > answer! > > > Koo  > Drive development engineer -- Sincerely yours Mikhail Khludnev

Re: Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread Mikhail Khludnev
statistics on those terms or > proceed with this document without affecting it boolean score. > > What is the best way to achieve this? > -- Sincerely yours Mikhail Khludnev

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
OK https://lucene.apache.org/core/8_11_2/core/org/apache/lucene/search/Weight.html#matches-org.apache.lucene.index.LeafReaderContext-int- On Mon, Jul 10, 2023 at 2:08 PM nedyalko.zhe...@freelance.de.INVALID wrote: > Hi Mikhail, > > I don't see the matches `searcher.matches(topDo

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
ch.Query-int- On Mon, Jul 10, 2023 at 12:19 PM nedyalko.zhe...@freelance.de.INVALID wrote: > Hello Mikhail, > > Great, thanks for the very fast response! The link that you provided is > very useful and informative. > > Though, I have an understanding issue. After I have search

Re: retrieving search matches with their frequency and positions

2023-07-09 Thread Mikhail Khludnev
other words, I'd like to get the matches in > a form of terms with properties like frequncy and positions. > How can achive this? > > Thanks in advance! > Ned > > -- Sincerely yours Mikhail Khludnev

Re: Can an analyzer access other field's data during index time?

2023-04-26 Thread Mikhail Khludnev
(Lucene) fields with different content. If your logic is so comprehensive you may also consider to completely extract analysis logic https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html#the-preanalyzedfield-type On Tue, Apr 25, 2023 at 4:08 PM Wang, Guan wrote: >

Re: Can an analyzer access other field's data during index time?

2023-04-25 Thread Mikhail Khludnev
Guan, I hardly grasp the particular obstacle. But I don't think that the task is out of reach overall. Can you share a test case formally describing the desired behavior? On Tue, Apr 25, 2023 at 12:29 AM Wang, Guan wrote: > Hi Mikhail, > > Thank you for introducing

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
Well.. maybe something like https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html ? On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan wrote: > Hi Mikhail, > > Thank you for the definitive answer! > > I could "sol

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
ld not > be used for urgent or sensitive issues > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Run time error in IndexWriter.addDocument

2023-04-03 Thread Mikhail Khludnev
Nope, it's embedded completely. You can find Trie.java in lucene-8.11.2 sources. And compiled class in lucene-analyzers-stempel-8.11.2.jar as well. On Mon, Apr 3, 2023 at 12:03 PM Saha, Rajib wrote: > Hi Mikhail, > > In top stack, > java.lang.

Re: Run time error in IndexWriter.addDocument

2023-04-03 Thread Mikhail Khludnev
va:1757) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400) > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-03-03 Thread Mikhail Khludnev
. Enjoy. On Fri, Mar 3, 2023 at 3:48 PM Saha, Rajib wrote: > Hi Mikhail, Uwe, > > We are been able to overcome several hurdles. > Thanks for your suggestions, which helped us a lot. 😊 > > We need one more suggestion. Previously, we had used a sample

Re: Offset-Based Analysis

2023-02-22 Thread Mikhail Khludnev
(BLOOMBERG/ 919 3RD A) < lkotzanie...@bloomberg.net> wrote: > Hi Mikhail, > > Thanks for the quick reply and the suggestion. This is definitely good to > know about. In my case however, there are several such NLP/data extraction > systems and I am not sure if they all use the same

Re: Offset-Based Analysis

2023-02-21 Thread Mikhail Khludnev
sense does a similar solution already exist? If > it doesn’t exist yet would it be something that would be of interest to the > community? > Any thoughts on this would be much appreciated. > > Thanks, > Luke -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Mikhail Khludnev
gory/volume, but unfortunately the highlighter.getBestTextFragments() > method marks all the occurrences of "note" and "extra" in the content too. > This we don't want. > > I can't see how to separate that part of the query out in the highlighter > methods, and I wonder what best practice would be here. I'm probably being > naive in using a single query for the whole job. Do I need to run a query > for category/volume, and then a subquery on text and title, and just use > the > subquery in the highlighter? If that's the approach, is there a nice simple > explanation somewhere you could point me to? Because I'm a simple user who > has never done anything beyond using the simple QueryParser for everything. > > > > cheers > > T > > > > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Lucene Hunpell Spell checker

2023-02-19 Thread Mikhail Khludnev
t; happens for a bunch of the languages, just presented 2 examples. > > Feel free to propose any changes, comments fixes :) > > Thank's a lot in advance, > > Thanos > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-31 Thread Mikhail Khludnev
Hello, Rajib. On Mon, Jan 30, 2023 at 4:07 PM Saha, Rajib wrote: > Hi Mikhail, > > Thanks for your suggestion. It solved lots of cases today in my end. 😊 > > I need some more suggestions from your end. I am putting together as below > one by one: >

Re: What is the corresponding class for org.apache.lucene.codecs.memory.DirectDocValuesFormat in Lucene9

2023-01-30 Thread Mikhail Khludnev
ne9. > > But the "DirectPostingFormat" is still in Lucene9. > > Could anyone help me to understand how to replace the DirectDocValueFormat > in Lucene9? > > Thanks > Regards > MyCoy > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-29 Thread Mikhail Khludnev
-org.apache.lucene.index.IndexReader-java.lang.String- On Sun, Jan 29, 2023 at 2:08 PM Saha, Rajib wrote: > Hi Mikhail, > > Thanks for the reference link. > It really helped me. > > In One of my requirement, I need to extract, all the Terms in a > IndexReader. > I was trying the refere

Re: Question for SynonymQuery

2023-01-27 Thread Mikhail Khludnev
Right. SynonymMap.html#WORD_SEPARATOR <https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymMap.html#WORD_SEPARATOR> was a redundant complication. Spaces work fine. On Thu, Jan 19, 2023 at 4:26 AM Anh Dũng Bùi wrote: > Thanks Mikhail! > &g

Re: Question for SynonymQuery

2023-01-27 Thread Mikhail Khludnev
:18 PM _ SATNAM wrote: > Hey Mikhail and Anh Dung Bui > i am also struggling with synonym query > my use case for eg > I created synonyms for word > API --> Application program interface > UI -> user interface > > doc 1 ---> This is API and it is cal

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-18 Thread Mikhail Khludnev
{ > //Some internal function to process the doc. > forEach.process(termDocs.doc()); > } > > } > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help of example of Lucene use.

2023-01-04 Thread Mikhail Khludnev
; > Currently I am badly required of some examples of using TokenStream, > tokenAttributes, *Filter. > I need to replace the uses of "Token". > > Could somebody please help me in it? > > Regards > Rajib > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2023-01-02 Thread Mikhail Khludnev
------ > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2023-01-01 Thread Mikhail Khludnev
are computed? As I understand SynonymWeight > will > > > consider all terms as exactly the same while BooleanQuery will favor > the > > > documents with more matched terms. > > > - Is it worth it to support multi-term synonyms in SynonymQuery? My > > feeling > > > is that it's better to just use BooleanQuery in those cases, since to > > > support multi-term synonyms it needs to accept a list of Query, which > > would > > > make it behave like a BooleanQuery. Also how scoring works with > > multi-term > > > is another problem. > > > > > > Thanks & Regards! > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2022-12-28 Thread Mikhail Khludnev
se BooleanQuery in those cases, since to > support multi-term synonyms it needs to accept a list of Query, which would > make it behave like a BooleanQuery. Also how scoring works with multi-term > is another problem. > > Thanks & Regards! > -- Sincerely yours Mikhail Khludnev

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Mikhail Khludnev
tectorOp.java#L39 > ) at production scale and discovered really bad performance during certain > conditions which I attribute to this unnecessary synching. I suspect this > may have impacted others as well > https://stackoverflow.com/questions/42960569/indexing-taking-long-time-when-using-opennlp-lemmatizer-with-solr > > Many thanks, > > Luke Kot-Zaniewski > > > -- Sincerely yours Mikhail Khludnev

Re: Efficient sort on SortedDocValues

2022-11-07 Thread Mikhail Khludnev
We may have dozens > of such fields in our index, thus there isn't any one field that can be > used to sort the index. So I guess my question if what I am trying to > achieve is possible? I tried to look though Solr codebase, but so far > couldn't come up with anything. Code example is here > https://pastebin.com/i05E2wZy . I am using 9.4.1. Thanks in advance. > > Andrei > > -- Sincerely yours Mikhail Khludnev

Re: Multi-segments and HNSW

2022-11-02 Thread Mikhail Khludnev
> real impact on the retrieving quality and performance. > > I'm wondering if there is any best practice, e.g. how many docs should be > in a single graph? > Or does anyone have some production experience to share? > > Thanks & Regards > MyCoy > -- Sincerely yours Mikhail Khludnev

Re: Help to understand the per-field formats in Lucene

2022-10-25 Thread Mikhail Khludnev
example, I've studied the "KnnVectors" a little. > The "PerFieldKnnVectorsFormat.FieldsWriter" acutally uses the > "Lucene94HnswVectorsFormat". > But why do we have this kind of structures? > > Thanks & Regards > > MyCoy > -- Sincerely yours Mikhail Khludnev

Re: Lucene Suggester APIs question

2022-08-14 Thread Mikhail Khludnev
question about lucene suggester APIs. If I build multiple FSTs > using a suggester, is there a way to merge two generated FSTs? > > -- > > Nitish Jain > -- Sincerely yours Mikhail Khludnev

Re: Unclear on what position means

2022-07-21 Thread Mikhail Khludnev
ment, outside of > Lucene? > > Kendall > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Filter and FilteredQuery replacements

2022-07-11 Thread Mikhail Khludnev
gt; instances representing all of the Lucene Doc IDs in the index, with > the bits turned on for those documents we want to be included in search > results. > > If this has already been answered in a forum post, I apologize. Or if > there's a Lucene specific forum somewhere I could look at, if you could > kindly point me there, I would appreciate it. > > Any help/insight is greatly appreciated. > > Thanks, > Scott Robey > -- Sincerely yours Mikhail Khludnev

Re: Lucene Disable scoring

2022-07-11 Thread Mikhail Khludnev
verhead of function calls can cause delay. > As a result I'm looking for a trick to ignore the function call and have > all no scoring on my whole query > > Is it possible to ignore this step? > > thanks a million > -- Sincerely yours Mikhail Khludnev

Re: Question about Benchmark

2022-05-16 Thread Mikhail Khludnev
xisting index for search? Also, is there a way to configure the > benchmark to use multiple threads for indexing (looks to me that it’s a > single-threaded indexing)? > > --Regards, > Balmukund > -- Sincerely yours Mikhail Khludnev

ANN search current state

2020-07-15 Thread Mikhail
/browse/LUCENE-9136 ,  https://issues.apache.org/jira/browse/LUCENE-9322 . I see that there are some related work and related PRs. What is the current state of this functionality?   -- Thanks, Mikhail    

Re: About custom score using Solr8/Lucene8

2020-07-02 Thread Mikhail Khludnev
--- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > -- > Vincenzo D'Amore > -- Sincerely yours Mikhail Khludnev

Re: Retrieving query-time join fromQuery hits

2020-06-08 Thread Mikhail Khludnev
gt; implications of going down this route, especially when dealing with large > result sets. > > @Mikhail: Thanks for the suggestion! I actually hadn't thought of that. > Could you please provide more details on how we could approach the problem > from this angle? > > Tha

Re: Retrieving query-time join fromQuery hits

2020-06-03 Thread Mikhail Khludnev
i > > [1] > > https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html > [2] > > https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html > [3] https://issues.apache.org/jira/browse/LUCENE-3602 > -- Sincerely yours Mikhail Khludnev

Re: Autocompletion based on one field in index

2020-03-03 Thread Mikhail Khludnev
to achieve > this? > > > Regards > Kumaran R > -- Sincerely yours Mikhail Khludnev

Re: How to tell Lucene index search to stop when it takes too long

2020-02-27 Thread Mikhail Khludnev
Pass TopDocsCollector as the first arg into TimeLimitingCollector. On Thu, Feb 27, 2020 at 2:31 PM wrote: > Hi,- > > Sometimes the search takes too long even with PhraseWildcardQuery, so i > would like to limit the search time via TimeLimitingCollector API. > > > Thank

Re: How to tell Lucene index search to stop when it takes too long

2020-02-24 Thread Mikhail Khludnev
gt; Is there such an api or plan to implement one? > > > Best regards > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread Mikhail Khludnev
t; There are no one. > Best regards > > On 2/4/20 11:14 AM, baris.ka...@oracle.com wrote: > > > > Thanks but i thought this class would have a mechanism to fix this issue. > > Thanks > > > >> On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote: > >> &g

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread Mikhail Khludnev
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > --------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Can Lucene be used as Rules Engine?

2020-01-22 Thread Mikhail Khludnev
27;t use fixed number of Fields to > query on. Even if there are fixed number of fields, the query has to check > for each field to match at least one word. > > Is it possible to handle this requirement using Lucene? or should I go for > other options? > > I am new to Lucene, any help would be appreciated. > > > > Thanks, > > Kart > > -- Sincerely yours Mikhail Khludnev

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread Mikhail Khludnev
> > > > I use SmartChineseAnalyzer to do the indexing, and add a document with > a > > > TextField whose value is a long sentence, when anaylized, will get 18 > > > terms. > > > > > > & then i use the same value to construct a PhraseQuery, setting slop to > > 2, > > > and adding the 18 terms concequently... > > > > > > I expect the search api to find this document, but it returns empty. > > > > > > Where am i wrong? > > > > > > > > > -- > > Adrien > > > -- Sincerely yours Mikhail Khludnev

Re: Needs advice on auto-keyword-correction mode custom query

2020-01-06 Thread Mikhail Khludnev
, How can Lucene's Query API become high-order composable? Lucene's > "LeafContext" concept is really very confusing me... > -- Sincerely yours Mikhail Khludnev

Re: Question abount combining InvertedIndex and SortField

2019-12-31 Thread Mikhail Khludnev
o reduce memory footprint by storing only top candidate results in a binary heap. IIRC it's described in this classic paper http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf -- Sincerely yours Mikhail Khludnev

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

2019-12-27 Thread Mikhail Khludnev
ese doubts. I like to quote this talk https://www.youtube.com/watch?v=T5RmMNDR5XI > > Mikhail Khludnev 于2019年12月27日周五 下午5:05写道: > > > Hello, > > It's by design: StringFields are searchable and filled by analysis > output, > > StoredFields are returned input value

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

2019-12-27 Thread Mikhail Khludnev
ll){String term > = byteRef.utf8ToString();terms.add(term);} > } catch (IOException e) {e.printStackTrace(); > log.error(e.getMessage(), e);}* > > To my supprise, terms seems only returning the STORED value, which is the > original value form, but i expect they should be the terms i put in each > StringField! > > Is this a design miss or impl. limit? > -- Sincerely yours Mikhail Khludnev

Re: How can i specify a custom Analyzer for a Field of Document?

2019-12-09 Thread Mikhail Khludnev
> I have a document set, most fields to index is only text type, suited for a > StandAnalyzer or a SmartChineseAnalyzer. But the problem is, i have a > special field which is a KeywordList type, like "A;B;C", which i hope i can > fully control the analyzing step. > > How to do this in Lucene? > -- Sincerely yours Mikhail Khludnev

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
s my conditions: > 1) Uses a StandardAnalyzer > 2) Does the actual query.toString() return lowercase J and S > > David Shifflett > > > On 10/22/19, 10:44 AM, "Mikhail Khludnev" wrote: > > On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] < shifflett_da...@bah.com> wrote: > Mikhail, > > Thanks for running those tests. > I haven’t looked into the test, but can you confirm it uses an analyzer > with the lowercase filter? > Look at his diff. It

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
"~2 > Type of query : ComplexPhraseQuery > > If I change teststr to "\"Foo Bar\"" > I get > Query : "Foo Bar" > Type of query : ComplexPhraseQuery > > If I change teststr to "Foo Bar" > I get > Query : content:foo content:bar > Type of query : BooleanQuery > > > In the first two cases I was expecting the search terms to be switched to > lowercase. > > Were the Foo and Bar left as originally specified because the terms are > inside double quotes? > > How can I specify a search term that I want treated as a Phrase, > but also have the query parser apply the LowerCaseFilter? > > I am hoping to avoid the need to handle this using PhraseQuery, > and continue to use the QueryParser. > > > Thanks in advance for any help you can give me, > David Shifflett > > -- Sincerely yours Mikhail Khludnev

Re: Lucene one to many query

2019-09-21 Thread Mikhail Khludnev
> > > > > > > -- > Sent from: > https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Adding and Removing Facet Entries

2019-08-28 Thread Mikhail Khludnev
I'm essentially looking for something similar to `add-distinct` > and `remove` from Solr's atomic updates functionality, just directly in > Lucene. > -- Sincerely yours Mikhail Khludnev

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

2019-07-05 Thread Mikhail Khludnev
t; parents, I mean, it is already required to be the last document in the > block, why do we need to provide a query for them? > > > > > > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN > wrote: > > > > > > > > > > > > Thanks Mikhai

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

2019-07-03 Thread Mikhail Khludnev
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN wrote: > > This returns "id3", which is unexpected. > > Please check ToPBJQ javadoc. It's absolutely expected. -- Sincerely yours Mikhail Khludnev

Re: block min-max values for Sort Field with Top-N query..

2019-07-02 Thread Mikhail Khludnev
amp; won't work for multi-sort field queries or out-of-order scoring etc.. > > But, in general will this be a good idea to explore or something that is > best not attempted? > > Any help is much appreciated > > -- > Ravi > -- Sincerely yours Mikhail Khludnev

Re: About custom score using Solr8/Lucene8

2019-05-08 Thread Mikhail Khludnev
example, at least to understand how to start a minimal basic > project? > > Thanks > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: position-anchored queries

2019-03-21 Thread Mikhail Khludnev
are not any subsequent terms in the field? > > -Mike > -- Sincerely yours Mikhail Khludnev

Re: How can I use FunctionScoreQuery to replace CustomScoreQuery?

2019-01-29 Thread Mikhail Khludnev
base query (in the worst case it's MatchAllDocsQuery) and custom DoubleValuesSource by calling FunctionScoreQuery.boostByValue(Query, DoubleValuesSource). On Sun, Jan 27, 2019 at 9:34 PM MarcoR wrote: > Thanks Mikhail, > > I'm afraid I don't understand your sugge

Re: How can I use FunctionScoreQuery to replace CustomScoreQuery?

2019-01-26 Thread Mikhail Khludnev
e query type, > but I'm stuck. > > > > > > > > -- > Sent from: > http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Camel case search with Lucene

2018-10-04 Thread Mikhail Khludnev
e, search "redHotChilly" > instead of "red hot chilly" - you should use own pattern tokenizer to > divide the query by regex pattern. > > Regards > Vadim Gindin > > On Thu, Oct 4, 2018 at 11:58 AM Gordin, Ira wrote: > > > Hi friends, > > > > How can I implement Camel case search with Lucene? > > > > Thanks, > > Ira > > > > > > > -- Sincerely yours Mikhail Khludnev

Re: Question About FST, multiple-column index

2018-09-21 Thread Mikhail Khludnev
there any > Combined Index structure like multiple-column indexes in mysql? I think is > there any solutions to extends to FST which make the FINAL state connect to > another FST? > > > THANKS -- Sincerely yours Mikhail Khludnev

Re: How to access DocValues inside a customized collector?

2018-09-21 Thread Mikhail Khludnev
ave a way to see directly indexed data (Luke seems obsolete, > Marple does not work with lucene 7.4.0 yet)? > > Thanks very much for helps, Lisheng > -- Sincerely yours Mikhail Khludnev

Re: Lucene API to retrieve matched words

2018-09-06 Thread Mikhail Khludnev
highlighting, just a list of the words. So if > I > search for 'ski' and I match on 'skier' and 'skiis', I would like to get > back a list that includes 'skier' and 'skiis'. > > Is there an API call that provides this? > > > > Thanks > > Mike > > -- Sincerely yours Mikhail Khludnev

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
I mean, you'd rather need offsets not positions, but I don't have something definite to suggest. On Tue, Jun 26, 2018 at 1:29 PM Gordin, Ira wrote: > Hello Mikhail, > > I see in the link you sent that PositionIncrementAttribute determines the > position of this token re

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
e I will get the 'a' positions in TokenStream. > Additional question how I can get the line numbers and the positions > inside the line. > Many thanks in advance for your help, > Ira > > -- Sincerely yours Mikhail Khludnev

Re: Explain flag in CustomQuery

2018-06-25 Thread Mikhail Khludnev
ted that SearchContext will be propagated to a Query, but I didn't > found the way how to get. I only have LeafReaderContext or LeafReader. > Could you advice me? > > Regards, > Vadim Gindin > -- Sincerely yours Mikhail Khludnev

Re: Query in a doc context

2017-12-30 Thread Mikhail Khludnev
> > Apologies if I completely misundetstood but if you are looking to do > a > > > full > > > > doc match, you could duplicate duplicated the doc into another field > > that > > > > is a true full text index of the document. > > > > >

Re: Wrong ID in explain() method.

2017-12-29 Thread Mikhail Khludnev
ion. When explain(id) is called it checks specified id in this > > collection and outputs "matched"/"not matched". > > > > The questions. > > 0. This document is founded by the plugin, but explain(id) method takes > > the wrong ID. Why? It happens in the real installation, but in the test > > case - it works fine. > > 1. ID=342 and others come to explain(id) method. Note, it is not a > > document id - it is ID of the nested object (category). Why does it > happen? > > 2. I have a test case, based on ESIntegTestCase. It works fine with this > > document. But this document is not founded in the real index. > > > > Regards, > > Vadim Gindin > > > -- Sincerely yours Mikhail Khludnev

Re: Terminology. LeafReader -> TermEnum -> PostingsEnum

2017-12-14 Thread Mikhail Khludnev
ference > between these 20 implementations and which of them can be really useful? > > Regards, > Vadim Gindin > -- Sincerely yours Mikhail Khludnev

Re: Query in a doc context

2017-12-14 Thread Mikhail Khludnev
: what terms are matched to what fields and so on. > > > It seems, that BooleanQuery/BooleanScorer is not a good place to accumulate > some information from a child Queries/Scorers. > -- Sincerely yours Mikhail Khludnev

  1   2   >