Re: Lucene Query Metrics

2024-12-04 Thread Mikhail Khludnev
Hello, There's nothing like that. On top of my head is a profile collector in Elasticsearch. On Wed, Dec 4, 2024 at 11:46 PM ashwini singh wrote: > Does lucene provide extensions (utilities)to extract metrics from Lucene > during the request execution? Or applications can only track execution >

Re: Lucene Query Metrics

2024-12-04 Thread ashwini singh
Does lucene provide extensions (utilities)to extract metrics from Lucene during the request execution? Or applications can only track execution stats on top of Lucene. On Tue, 3 Dec 2024 at 23:20, Adrien Grand wrote: > Lucene doesn't expose query metrics, it's up to the application that > integr

Re: Lucene Query Metrics

2024-12-03 Thread Adrien Grand
Lucene doesn't expose query metrics, it's up to the application that integrates Lucene to compute and expose metrics that are relevant to them. Le mer. 4 déc. 2024, 00:31, ashwini singh a écrit : > Hey everyone, > > Does lucene provide any query metrics (perf) ? I am looking for something > very

Re: Lucene Slack Channel

2024-12-03 Thread ashwini singh
Thanks !! On Wed, 13 Nov 2024 at 13:31, Gus Heck wrote: > The slack channel (named 'lucene-dev') is generally for people building > lucene itself, and not generally for people looking for help providing > solutions using lucene. Typically one gets an apache.org address by > contributing enough

Re: Lucene Slack Channel

2024-11-13 Thread Gus Heck
The slack channel (named 'lucene-dev') is generally for people building lucene itself, and not generally for people looking for help providing solutions using lucene. Typically one gets an apache.org address by contributing enough to an apache project to get invited as a committer. Alternatively, (

Re: Lucene Slack Channel

2024-11-13 Thread Michael Wechner
I think you have to be committer of at least one Apache project https://infra.apache.org/committer-email.html HTH Michael Am 13.11.24 um 22:12 schrieb ashwini singh: Thanks . How can I get the apache.org email address ? Is there a policy for that ? On Mon, 4 Nov 2024 at 15:06, Michael Wechn

Re: Lucene Slack Channel

2024-11-13 Thread ashwini singh
Thanks . How can I get the apache.org email address ? Is there a policy for that ? On Mon, 4 Nov 2024 at 15:06, Michael Wechner wrote: > I think one can only join when you have an apache.org email address > > https://infra.apache.org/slack.html > > but maybe I misunderstand the access policy? >

Re: Lucene Slack Channel

2024-11-04 Thread Michael Wechner
I think one can only join when you have an apache.org email address https://infra.apache.org/slack.html but maybe I misunderstand the access policy? Thanks Michael Am 04.11.24 um 23:56 schrieb ashwini singh: Hi How can I get added to lucene slack channel? I am working on Lucene to build a

Re: lucene build failure on Windows using pylucene 9.7.0

2024-10-21 Thread Gautam Worah
Hi Prashant, >From your error it looks like the system is somehow trying to run code compiled with Java 23 (major version 67) but is unable to. Gradle 7.6 only has support for Java 19 and lower. Java 23 support was added in Gradle 8.10+. Try running it with JDK 19 or alternatively, the JDK recomm

Re: Lucene LRUQueryCache question

2024-07-16 Thread Yixun Xu
This post explains why Lucene doesn't cache all queries: https://www.mail-archive.com/java-user@lucene.apache.org/msg51649.html Your queries could be skipping the cache because of the LRUQueryCache constructor parameters, or because of the QueryCachingPolicy.shouldCache predicate. They probably ha

Re: Lucene Index Writer in a distributed system

2023-10-19 Thread Cody Amen
Zookeeper, right? Look how Zookeeper is used in Solr, but Zookeeper does exactly what you want, I believe. Sent from my iPhone > On Oct 19, 2023, at 3:49 AM, Gopal Sharma wrote: > > Hello Team, > > I am new to Lucene and want to use Lucene in a distributed system to write > in a Amazon EFS i

Re: Lucene Index Writer in a distributed system

2023-10-19 Thread Michael McCandless
Hi Gopal, Indeed, for a single Lucene index, only one writer may be open at a time. Lucene tries to catch you if you mess this up, using file-based locking. If you really need concurrent indexing, you could have N IndexWriters each writing into a private Directory, and then periodically use addIn

Re: Lucene in action

2023-06-10 Thread Michael McCandless
Hi Vimal, Indeed I think it is unlikely I have the energy for a 3rd edition ... but anyone can drive the 3rd edition, not just the prior authors. New authors welcome! > Since 2nd edition ( based on lucene 4), I'm sorry to say that 2nd edition is based on Lucene 3.0 not 4! It's even older than

Re: Lucene in action

2023-06-10 Thread Mark Miller
Nature abhors being anything but an author by name on a second tech book. The ruse is up after one when you have the inputs crystalized and the hourly wage in hand. Hard to find anything but executive producers after that. I’d shoot for a persuasive crowdfunding attempt.

Re: Lucene 9.0.0 inconsistent index options

2023-05-30 Thread Tomás Fernández Löbbe
I have a PR with a test and a possible fix for this: https://github.com/apache/lucene/pull/12326, anyone for review? Tomás On Tue, Dec 14, 2021 at 7:11 AM Ian Lea wrote: > Thanks for the response. > https://issues.apache.org/jira/browse/LUCENE-10314 > > Will we still be able to decide, maybe ye

Re: Lucene Hunpell Spell checker

2023-02-19 Thread Mikhail Khludnev
FIY, from what I saw there there was a `dictionary gap` - kind of incomplete dictionary files. Another question always makes me wonder: why there is no a hunspell based suggester, spellchecker in Lucene codebase? On Fri, Feb 17, 2023 at 11:23 AM Dawid Weiss wrote: > Can't open this repository,

Re: Lucene Hunpell Spell checker

2023-02-17 Thread Dawid Weiss
Can't open this repository, it's probably private. Dawid On Tue, Feb 14, 2023 at 2:42 PM Thanos Agelakpoulos wrote: > > Thanks for the response David ! > > I created a quick repo just to showcase, > https://github.com/aggelako/JavaSpellchecker > In there you can see how im using lucene, in the

RE: Lucene Hunpell Spell checker

2023-02-14 Thread Thanos Agelakpoulos
Thanks for the response David !  I created a quick repo just to showcase,  https://github.com/aggelako/JavaSpellchecker In there you can see how im using lucene, in the SpellChecker class/ the spellCheck function where im performing a spellcheck.I have also provided the dicts as resources. You

Re: Lucene Hunpell Spell checker

2023-02-13 Thread Dawid Weiss
It'd be good if you could share the problematic scenario as a piece of code (ideally a forked Lucene repository, with a test case?) so that we can take a look. There's been a ton of improvements to hunspell packages in Lucene 9 (and on the main branch) - you should take a look and perhaps take some

RE: Lucene Hunpell Spell checker

2023-02-13 Thread Thanos Agelakpoulos
*here

Re: Lucene 4.10.4 forward slash syntax error

2022-11-28 Thread Younes Bahloul
thank you escaping using a backslash does work i hope this now gets put in the archive so anyone having this question in the future will find the answer -- Kind regards, Younes Bahloul Junior Engineer On Mon, 28 Nov 2022 at 16:23, Michael Sokolov wrote: > Have you tried escaping with a backslas

Re: Lucene 4.10.4 forward slash syntax error

2022-11-28 Thread Michael Sokolov
Have you tried escaping with a backslash? I have a vague memory that might work. As for modifying classes in 4.10.4, you are welcome to do so in a custom fork, but that version is so old that we no longer post fixes for it on the official Apache release branches. The current release series is 9.x -

Re: Lucene V8 Support

2022-09-15 Thread Mike Drob
Hi Fergal, You should not expect much support on version 8 going forward. It will probably get critical security releases and not much else. Mike On Thu, Sep 15, 2022 at 8:31 AM Fergal Gavin wrote: > Hi there, > > We are a user of the Lucene core library in our product. > > With the release of

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Dawid Weiss
Yeah, no problem. It's dumb that this is needed - thanks for reporting and sorry for not having more faith in what you were saying. I should have known better than believing in computers being predictable. Dawid On Wed, Sep 14, 2022 at 6:40 PM Rahul Goswami wrote: > > Uwe, Dawid, and Robert, > T

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Rahul Goswami
Uwe, Dawid, and Robert, Thank you for the helpful pointers! I do have Visual Studio 2017 on my machine which I don't use much lately. https://github.com/microsoft/vswhere *"vswhere* is included with the installer as of Visual Studio 2017 version 15.2 and later, and can be found at the following lo

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Dawid Weiss
> I have no idea how to fix this. Dawid: Maybe we can also make the > configuration of that native stuff only opt-in? So only detect Visual > Studio when you actively activate native code compilation? It is an opt-in, actually. The problem is: gradle fails on applying the plugin - even if the task

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Robert Muir
I opened an issue with one idea of how we can fix this, for discussion: https://github.com/apache/lucene/issues/11772 On Wed, Sep 14, 2022 at 11:27 AM Uwe Schindler wrote: > > Hi, > > do you have Microsoft Visual Studio installed? It looks like Gradle > tries to detect it and fails with some Null

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Uwe Schindler
Hi, do you have Microsoft Visual Studio installed? It looks like Gradle tries to detect it and fails with some NullPointerException while parsing a JSON file from its instalation. The misc module contains some (optional) native code that will get compiled (optionally) with Visual C++. It loo

Re: Lucene 9.2.0 build fails on Windows

2022-09-13 Thread Dawid Weiss
It is a bug in gradle. If you look at the stack trace, it's clearly just happily logging a missing output and returns null: https://github.com/gradle/gradle/blob/v7.3.3/subprojects/platform-native/src/main/java/org/gradle/nativeplatform/toolchain/internal/msvcpp/version/CommandLineToolVersionLocat

Re: Lucene 9.2.0 build fails on Windows

2022-09-13 Thread Robert Muir
Looks to me like a gradle bug, detecting and trying to run some visual studio command (vswhere.exe) elsewhere on your system, and it does the wrong thing parsing its output. On Tue, Sep 13, 2022 at 3:00 PM Rahul Goswami wrote: > > Hi Dawid, > I believe you. Just that for some reason I have never

Re: Lucene 9.2.0 build fails on Windows

2022-09-13 Thread Rahul Goswami
Hi Dawid, I believe you. Just that for some reason I have never been able to get it to work on Windows. Also, being a complete newbie to gradle doesn't help much. So would appreciate some help on this while I find my footing. Here is the link to the diagnostics that you requested (since attachments

Re: Lucene 9.2.0 build fails on Windows

2022-09-13 Thread Dawid Weiss
Hi Rahul, Well, that's weird. > "releases/lucene/9.2.0" -> Run "gradlew help" > > If you need additional stacktrace or other diagnostics I am happy to > provide the same. Could you do the following: 1) run: git --version so that we're on the same page as to what the git version is (I don't thi

Re: Lucene 9.2.0 build fails on Windows

2022-09-13 Thread Rahul Goswami
Hi Dawid, I tried with Gitbash only after "gradlew help" failed on cmd. Just now tried Powershell as well and get the exact same error message. The steps I performed were, clone the repo -> create a branch from tag "releases/lucene/9.2.0" -> Run "gradlew help" If you need additional stacktrace or

Re: Lucene 9.2.0 build fails on Windows

2022-09-13 Thread Dawid Weiss
It does work just fine. Use cmd or powershell though. I don't think things are even tested with cygwin/msys. Dawid On Tue, Sep 13, 2022 at 4:55 AM Rahul Goswami wrote: > > Hello, > I am using gitbash to build lucene 9.2.0 on Windows. I checked out the > release/lucene/9.2.0 tag and tried running

Re: Lucene Suggester APIs question

2022-08-20 Thread Dawid Weiss
Yes, you need to build a third FST. You can build a merging iterator that will combine two or more FST traversal streams so that they're in order and then build a merged FST directly, with no extra sorting cost. https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/util/fst/Builder.html#add-

Re: Lucene Suggester APIs question

2022-08-14 Thread Mikhail Khludnev
Hello Nitish. What about https://lucene.apache.org/core/7_2_1/core/org/apache/lucene/util/automaton/Operations.html#union-org.apache.lucene.util.automaton.Automaton-org.apache.lucene.util.automaton.Automaton- ? On Mon, Aug 15, 2022 at 4:42 AM Nitish Jain wrote: > Hi, > > I have a question about

Re: Lucene 9.1.0 has changed name of lucene-analysis-common-9.1.0.jar

2022-07-27 Thread Dawid Weiss
This change was intentional to make it consistent with package naming, Dawid On Tue, Jul 26, 2022 at 10:34 PM Baris Kazar wrote: > Dear Folks,- > I see that Lucene has changed one of the JAR files' name to > lucene-analysis-common-9.1.0.jar in Lucene version 9.1.0. > It used to use analyzers.

Re: Lucene Disable scoring

2022-07-11 Thread Adrien Grand
Note that Lucene automatically disables scoring already when scores are not needed. E.g. queries that compute the top-k hits by score will definitely compute scores, but if you are just counting the number of matches of a query or aggregations, then Lucene skips scoring entirely already. Is there

Re: Lucene Disable scoring

2022-07-11 Thread Mikhail Khludnev
I'd rather agree with Uwe, but you can plug BooleanSimilarity just to check it out. On Mon, Jul 11, 2022 at 6:01 PM Mohammad Kasaei wrote: > Hello > > I have a question. Is it possible to completely disable scoring in lucene? > > Detailed description: > I have an index in elasticsearch and it co

Re: Lucene Disable scoring

2022-07-11 Thread Uwe Schindler
No that's the only way to do it. The function call does not cost overheads because it is optimized away by the runtime. Uwe Am 10.07.2022 um 11:34 schrieb Mohammad Kasaei: Hello I have a question. Is it possible to completely disable scoring in lucene? Detailed description: I have an index i

Re: Lucene 6.5.1 source code

2022-02-01 Thread Adrien Grand
You can find the 6.5.1 source code on the old lucene-solr repository: https://github.com/apache/lucene-solr/tree/releases/lucene-solr%2F6.5.1 On Tue, Feb 1, 2022 at 2:54 PM Omri wrote: > > It seems that the old versions branches in github were deleted. > There is a way to see Lucene 6.5.1 source

Re: Lucene 9.0.0 inconsistent index options

2021-12-14 Thread Ian Lea
Thanks for the response. https://issues.apache.org/jira/browse/LUCENE-10314 Will we still be able to decide, maybe years down the line, that we do want to search on fieldX after all, and be able to change the code and reindex the maybe small proportion of documents that have a value for fieldX wit

Re: Lucene 9.0.0 inconsistent index options

2021-12-14 Thread Michael Sokolov
Strictly speaking, we could have opened an older index using Lucene 8 (say one that was created using Lucene 7, or 6) that would no longer be valid in Lucene 9, at least according to the policy? I agree we should try to fix this, just want to clarify the policy On Tue, Dec 14, 2021 at 8:54 AM Adri

Re: Lucene 9.0.0 inconsistent index options

2021-12-14 Thread Adrien Grand
This looks related to the new changes around schema validation. Lucene now requires a field to either be absent from a document or be indexed with the exact same options (index options, points dimensions, norms, doc values type, etc.) as already indexed documents that also have this field. However

RE: lucene 4.10.4 punctuation

2021-08-26 Thread Trevor Nicholls
return tokens; } } Cheers T -Original Message- From: Younes Bahloul Sent: Thursday, 26 August 2021 22:07 To: java-user@lucene.apache.org Subject: Re: lucene 4.10.4 punctuation Hi thanks for getting back to me so quickly So to give some context, there are two things we would lik

Re: lucene 4.10.4 punctuation

2021-08-26 Thread Younes Bahloul
Hi thanks for getting back to me so quickly So to give some context, there are two things we would like to be able to do: 1. We want to have the option to be able to search on terms that include punctuation. So for example, if we have the two texts: "they sent an S.O.S from", and "she wrote SOS, b

RE: lucene 4.10.4 punctuation

2021-08-25 Thread Uwe Schindler
Hi, you should explain to use what you exactly want to do: How do you want to search, how do your documents look like? Why is it important to match on punctuation and how should this matching look like? Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@t

Re: Lucene cpu utilization & scoring

2021-08-20 Thread Varun Sharma
Thanks, Michael. Its good to know that scorers are also doing matching. I will check and verify whether the scores returned are 0 or not. Just to give some background, we have two setups: a) Old setup - Each machine serves a single lucene index which has roughly 30'ish segments with realtime updat

Re: Lucene cpu utilization & scoring

2021-08-20 Thread Michael Sokolov
I think the usual usage pattern is to *refresh* frequently and commit less frequently. Is there a reason you need to commit often? You may also have overlooked this newish method: MergePolicy.findFullFlushMerges If you implement that, you can tell IndexWriter to (for example) merge multiple small

Re: lucene execute ./gradlew precommit and ./gradlew test failed

2021-08-17 Thread Dawid Weiss
Your files have crlf line endings - your git is set up to convert files to Windows convention and this isn't supported by Lucene. You have to set git to clone exactly the same binary content as present in the repository, otherwise checksums won't match (this is intentional). You can always compare

Re: lucene execute ./gradlew precommit and ./gradlew test failed

2021-08-16 Thread Dawid Weiss
What do these two commands say? git config autocrlf git config --global autocrlf Also, can you zip and send me those two offending files from your local checkout (dawid.we...@gmail.com)? Dawid On Tue, Aug 17, 2021 at 4:09 AM wuda wrote: > I hava clone a completely new branch "main", and the o

Re: lucene execute ./gradlew precommit and ./gradlew test failed

2021-08-16 Thread Dawid Weiss
I'm sorry, I was out of office. I can't see that attachment you posted. If it's still a problem, can you copy-paste what you see on the console once you issue "git status"? Dawid On Wed, Aug 4, 2021 at 1:07 PM Da Wu wrote: > > I have executed like this. > > Dawid Weiss 于2021年8月2日周一 下午9:07写道: >>

Re: lucene execute ./gradlew precommit and ./gradlew test failed

2021-08-04 Thread Da Wu
I have executed like this. [image: 1628075031(1).png] Dawid Weiss 于2021年8月2日周一 下午9:07写道: > What does "git status" say? The hashes of generated files are not what > they're supposed to be - either something has changed them or you have > a git configuration that replaces something on the fly (lin

Re: lucene execute ./gradlew precommit and ./gradlew test failed

2021-08-02 Thread Dawid Weiss
What does "git status" say? The hashes of generated files are not what they're supposed to be - either something has changed them or you have a git configuration that replaces something on the fly (line endings, most likely). Dawid On Wed, Jul 28, 2021 at 9:55 AM Da Wu wrote: > > i want to contr

Re: Lucene/Solr and BERT

2021-05-27 Thread Julie Tibshirani
Your summary sounds right to me. There are some ideas (being discussed on the issue), but I don't think we have a detailed understanding yet of the performance difference. It would be great to get more eyes on the benchmark if you're interested in double-checking the results. Mike mentioned that h

Re: Lucene/Solr and BERT

2021-05-27 Thread Michael Wechner
Thank you very much for having done these benchmarks! IIUC one could state - Indexing:   Lucene is slower than hnswlib/C++, very roughly 10x performance difference - Searching (Queries per second):   Lucene is slower than hnswlib/C++, very roughly 8x performance difference right, bu

Re: Lucene/Solr and BERT

2021-05-26 Thread Julie Tibshirani
These JIRA issues contain results against two ann-benchmarks datasets. It'd be great to get your thoughts/ feedback if you have any: * Searching: https://issues.apache.org/jira/browse/LUCENE-9937 * Indexing: https://issues.apache.org/jira/browse/LUCENE-9941 The benchmarks are based on the setup he

Re: Lucene/Solr and BERT

2021-05-26 Thread Alex K
Thanks Michael. IIRC, the thing that was taking so long was merging into a single segment. Is there already benchmarking code for HNSW available somewhere? I feel like I remember someone posting benchmarking results on one of the Jira tickets. Thanks, Alex On Wed, May 26, 2021 at 3:41 PM Michael

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Sokolov
This java implementation will be slower than the C implementation. I believe the algorithm is essentially the same, however this is new and there may be bugs! I (and I think Julie had similar results IIRC) measured something like 8x slower than hnswlib (using ann-benchmarks). It is also surprising

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Wechner
Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K: Hi Michael and others, Sorry just now getting back to you. For your three original questions: - Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a thorough response. -

Re: Lucene/Solr and BERT

2021-05-25 Thread Alex K
; >>> > >> > https://opendistro.github.io/for-elasticsearch/blog/odfe-updates/2020/04/Building-k-Nearest-Neighbor-(k-NN)-Similarity-Search-Engine-with-Elasticsearch/ > >>> ? > >>>> They are however available in the snapshot releases. I started on a >

Re: Lucene/Solr and BERT

2021-05-24 Thread Michael Wechner
ething wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner < michael.wech...@wyona.com> wrote: Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medium.com/swlh/fun-with-apach

Re: Lucene/Solr and BERT

2021-05-23 Thread Michael Wechner
seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medi

Re: Lucene/Solr and BERT

2021-05-23 Thread Russell Jurney
; Is there still something missing? Or what would be the next steps? > > > > Thanks > > > > Michael > > > > > > > Here's the code: > > > https://github.com/alexklibisz/ann-benchmarks-lucene. There are some > test > > > suites

Re: Lucene/Solr and BERT

2021-05-23 Thread Michael Sokolov
gt; https://github.com/alexklibisz/ann-benchmarks-lucene. There are some test > > suites that index and search Glove vectors. My first impression was that > > indexing seems surprisingly slow, but it's entirely possible I'm doing > > something wrong. > > > > On We

Re: Lucene/Solr and BERT

2021-05-19 Thread Michael Wechner
uites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT

Re: Lucene 8 causing app server threads to hang due to high rate of network usage

2021-04-28 Thread ANDREI SOLODIN
Hello, I am also keenly interested in Lucene performance on NFS/EFS. I have an extensive experience with a another (proprietory) search engine successfully using NFS for indexing/search. In our case, the key has always been making sure that a large portion of the index is in the host page cache,

Re: Lucene 8 causing app server threads to hang due to high rate of network usage

2021-04-28 Thread Robert Muir
Don't use filesystems such as NFS (that is what EFS is) with lucene! This is really bad design, and it is the root cause of your issue. On Tue, Apr 27, 2021 at 1:21 PM Hilston, Kathleen < kathleen.hils...@snapon.com> wrote: > Hello, > > > > My name is Kathleen Hilston, and I am a Software Enginee

Re: Lucene Explanation

2021-04-23 Thread Puneeth Bikkumanla
Thank you this was very helpful! On Mon, Apr 12, 2021 at 9:07 AM Michael Sokolov wrote: > You might want to check out > https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to > implement some debugging utilities on top of Explain. It never got > committed, but it does explore some of

Re: Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
e are some test suites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucen

Re: Lucene/Solr and BERT

2021-04-21 Thread Alex K
doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: > Hi > > I recently found the following articles re Lucene/Solr and BERT > > https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 > > https://medium.com/swlh/fun-with-apache-luc

Re: Lucene Explanation

2021-04-12 Thread Michael Sokolov
You might want to check out https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to implement some debugging utilities on top of Explain. It never got committed, but it does explore some of the challenges around introducing a more structured explain response. On Fri, Apr 9, 2021 at 6:40

Re: Lucene custom scoring / analyzer

2021-03-17 Thread Charlie Hull
I think you'll need a SpanQuery with the inOrder flag set: https://lucene.apache.org/core/8_8_1/core/org/apache/lucene/search/spans/SpanNearQuery.html Charlie On 17/03/2021 10:30, Vlad Smirnovskiy wrote: Hello! I`d like to do something like that: When I add a document and some text is going wi

Re: Lucene 8.7 error searching an index created with 8.3

2020-12-22 Thread Nicolás Lichtmaier
I'd like to add that if I enable assertions I get a stack trace like this: java.lang.AssertionError     at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:903)     at org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePo

Re: Lucene 8.7 error searching an index created with 8.3

2020-11-24 Thread Nicolás Lichtmaier
This is reproducible only within our product, I haven't yet been able to isolate this and reproduce it standalone. It's Java 11. Yes, I've run CheckIndex with the "-slow" option and with assertions enabled. El 24/11/20 a las 11:32, Adrien Grand escribió: This is related to phrase matching ind

Re: Lucene 8.7 error searching an index created with 8.3

2020-11-24 Thread Adrien Grand
This is related to phrase matching indeed. Positions are stored in blocks of 128 values, where every block is encoded with a different number of bits per value. And the error you are seeing suggests that one block reports 69 bits per value. The fact that CheckIndex didn't complain is surprising. D

Re: Lucene 8.7 error searching an index created with 8.3

2020-11-24 Thread Nicolás Lichtmaier
Lucene 8.7's CheckIndex says there are no errors in the index. On closer inspection this seems related to phrase matching... El 24/11/20 a las 05:18, Adrien Grand escribió: Can you run CheckIndex on your index to make sure it is not corrupt? On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier

Re: Lucene 8.7 error searching an index created with 8.3

2020-11-24 Thread Adrien Grand
Can you run CheckIndex on your index to make sure it is not corrupt? On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier wrote: > I'm seeing errors like this one (using backwards codecs): > > java.lang.ArrayIndexOutOfBoundsException: Index 69 out of bounds for > length 33 > at > org.apache.l

Re: Lucene Migration Query

2020-11-22 Thread Erick Erickson
If you created your index with 7x, you don’t need to do anything, 8x will be able to operate with it. If you ever used 6x to index any docs you must reindex completely by deleting the entire index and starting over, or index to a new collection and use collection aliasing to seamlessly switch.

Re: Lucene Migration query

2020-11-20 Thread Michael Sokolov
Ah, sorry for the misdirection, thanks for the correction, Erick. That does jibe with what I now remember having heard before. I guess we reserve the right to create index data structures in the future for which we did not save sufficient data in the past. On Fri, Nov 20, 2020 at 9:15 AM Erick Eri

Re: Lucene Migration query

2020-11-20 Thread Erick Erickson
The IndexUpgraderTool does a forceMerge(1). If you have a large index, that has its own problems, but will work. The threshold for the issues is 5G. See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ I should emphasize that if you have a very large single segment as a result,

Re: Lucene Migration query

2020-11-20 Thread Michael Sokolov
I think running the upgrade tool would also be necessary to set you up for the next upgrade, when 9.0 comes along. On Fri, Nov 20, 2020, 4:25 AM Uwe Schindler wrote: > Hi, > > > Currently I am using Lucene 7.3, I want to upgrade to lucene 8.5.1. > Should > > I do reindexing in this case ? > > No

RE: Lucene Migration query

2020-11-20 Thread Uwe Schindler
Hi, > Currently I am using Lucene 7.3, I want to upgrade to lucene 8.5.1. Should > I do reindexing in this case ? No, you don't need that. > Can I make use of backward codec jar without a reindex? Yes, just add the JAR file to your classpath and it can read the indexes. Updates written to the

Re: Lucene Migration issue

2020-06-08 Thread Michael McCandless
You're welcome! Mike McCandless http://blog.mikemccandless.com On Mon, Jun 8, 2020 at 10:48 AM Adarsh Sunilkumar < adarshsunilkuma...@gmail.com> wrote: > Hi Michael, > > Thanks for your information. > > > Thanks&Regards, > Adarsh Sunilkumar > > On Mon, Jun 8, 2020, 20:15 Michael McCandless >

Re: Lucene Migration issue

2020-06-08 Thread Adarsh Sunilkumar
Hi Michael, Thanks for your information. Thanks&Regards, Adarsh Sunilkumar On Mon, Jun 8, 2020, 20:15 Michael McCandless wrote: > Ahh, yes is does! That is the change that made Lucene catch this mis-use, > whereas previously it would silently throw things away (term frequencies > and positio

Re: Lucene Migration issue

2020-06-08 Thread Michael McCandless
Ahh, yes is does! That is the change that made Lucene catch this mis-use, whereas previously it would silently throw things away (term frequencies and positions). If you want to simply continue throwing things away like Lucene did before, without rebuilding your index, switch your indexing to Ind

Re: Lucene Migration issue

2020-06-07 Thread Adarsh Sunilkumar
Hi Michael, Thanks for the information. Does the error has any relationship with this patch ttps://issues.apache.org/jira/browse/LUCENE-8134 Thanks& Regards, Adarsh Sunilkumar On Fri, Jun 5, 2020 at 7:28 PM Michael McCandless wrote: > This ju

Re: Lucene Migration issue

2020-06-05 Thread Michael McCandless
This just means you previously indexed only docis (skipping term frequencies, positions) for at least one of the fields in at least one document in your existing index. But now you are trying to also index with term frequencies and positions, which Lucene cannot do. You either have to reindex wit

Re: Lucene Approximation

2020-06-02 Thread Michael Sokolov
Sorry, I thought that you wanted to maintain the true value rather than the approximated value. I am not entirely sure, but I think the approximation arises due to rounding and low-precision storage of these values in the index. You might be able to reverse engineer it by looking at "Norms," which

Re: Lucene Approximation

2020-06-02 Thread moritz
Thank you for your answer, but please could you explain this idea in detail as I cannot see how this would help solving my problem? For example, I got the indexed Wikipedia Article "Alan Smithee" with a document length of 756, which also is used when calculating the average document length. Bu

Re: Lucene Approximation

2020-06-02 Thread Michael Sokolov
You could append an EOF token to every indexed text, and then iterate over Terms to get the positions of those tokens? On Tue, Jun 2, 2020 at 11:50 AM Moritz Staudinger wrote: > > Hello, > > I am not sure if I am at the right place here, but I got a question about > the approximation my Lucene im

Re: Lucene 7.7.2 Indexwriter.numDocs() replacement in Lucene 8.4.1

2020-02-26 Thread Michael McCandless
Yes. Mike McCandless http://blog.mikemccandless.com On Mon, Feb 24, 2020 at 5:55 PM wrote: > A typo corrected below. > > Best regards > > > On 2/24/20 5:54 PM, baris.ka...@oracle.com wrote: > > Hi,- > > > > I hope everyone is doing great. > > > > > > I think the Lucene 7.7.2 Indexwriter.num

Re: Lucene 7.7.2 Indexwriter.numDocs() replacement in Lucene 8.4.1

2020-02-24 Thread baris . kazar
A typo corrected below. Best regards On 2/24/20 5:54 PM, baris.ka...@oracle.com wrote: Hi,-  I hope everyone is doing great. I think the Lucene 7.7.2  Indexwriter.numDocs() https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/index/IndexWriter.html#numDocs-- can be replaced by t

Re: Lucene download page

2020-02-24 Thread baris . kazar
Thanks Erick and the Forum. Best regards On 2/23/20 8:32 AM, Erick Erickson wrote: No, 7.7.2 was a patch fix that _was_ released after 8.1.1. On Feb 22, 2020, at 2:49 PM, baris.ka...@oracle.com wrote: Hi,- i hope everyone is doing great. Licene 7.7.2 is listed as released after Lucene 8

Re: Lucene download page

2020-02-23 Thread Erick Erickson
No, 7.7.2 was a patch fix that _was_ released after 8.1.1. > On Feb 22, 2020, at 2:49 PM, baris.ka...@oracle.com wrote: > > Hi,- > > i hope everyone is doing great. > > Licene 7.7.2 is listed as released after Lucene 8.1.1 is released on this > page > https://lucene.apache.org/core/corenews.

Re: Lucene 8 early termination

2020-01-23 Thread Uwe Schindler
Hi, There is no support with calculating facets, because the counts can't be optimized with wand or blockmax. The general recommendation is to execute facets/aggregations in separate Elasticsearch or Solr requests (e.g. using AJAX on your website). The display of search results would be instan

Re: Lucene index directory grows and shrinks

2019-11-04 Thread Erick Erickson
Merge frequency is the mergeFactor ? If yes I'm using the default that is 10, > read here https://jackrabbit.apache.org/archive/wiki/JCR/Search_115513504.html > > Max segment I don't know, where could I see it? > > Bye > > -Messaggio originale- > Da: Sh

Re: Lucene index directory grows and shrinks

2019-11-04 Thread Atri Sharma
This are typical symptoms of an index merge. However, it is hard to predict more without knowing more data. What is your segment size limit? Have you changed the default merge frequency or max segments configuration? Would you have an estimate of ratio of number of segments reaching max limit / to

Re: Lucene one to many query

2019-09-21 Thread Mikhail Khludnev
Hi, see ToParentBlockJoinQuery On Sat, Sep 21, 2019 at 6:14 PM ncs88 wrote: > Hi everyone. I am trying to build a lucene query that will work with the > following one-to-many relationship. I’m trying to-do this in lucene 5.5 but > if i can’t then i’ll move towards upgrading the project to a new

Re: Lucene one to many query

2019-09-21 Thread Jigar Shah
Nested documents structure supported by solr is what you need. But as you are using lucene, you should denormalize and store item with company fields and price. Apply search on item with function query on item_price. As you have results you can store companies in a set. On Sat, Sep 21, 2019, 11

  1   2   3   4   5   6   7   8   9   10   >