Re: Lucene/Solr and BERT

2021-05-27 Thread Julie Tibshirani
Your summary sounds right to me. There are some ideas (being discussed on the issue), but I don't think we have a detailed understanding yet of the performance difference. It would be great to get more eyes on the benchmark if you're interested in double-checking the results. Mike mentioned that h

Re: Lucene/Solr and BERT

2021-05-27 Thread Michael Wechner
Thank you very much for having done these benchmarks! IIUC one could state - Indexing:   Lucene is slower than hnswlib/C++, very roughly 10x performance difference - Searching (Queries per second):   Lucene is slower than hnswlib/C++, very roughly 8x performance difference right, bu

Re: Lucene/Solr and BERT

2021-05-26 Thread Julie Tibshirani
These JIRA issues contain results against two ann-benchmarks datasets. It'd be great to get your thoughts/ feedback if you have any: * Searching: https://issues.apache.org/jira/browse/LUCENE-9937 * Indexing: https://issues.apache.org/jira/browse/LUCENE-9941 The benchmarks are based on the setup he

Re: Lucene/Solr and BERT

2021-05-26 Thread Alex K
Thanks Michael. IIRC, the thing that was taking so long was merging into a single segment. Is there already benchmarking code for HNSW available somewhere? I feel like I remember someone posting benchmarking results on one of the Jira tickets. Thanks, Alex On Wed, May 26, 2021 at 3:41 PM Michael

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Sokolov
This java implementation will be slower than the C implementation. I believe the algorithm is essentially the same, however this is new and there may be bugs! I (and I think Julie had similar results IIRC) measured something like 8x slower than hnswlib (using ann-benchmarks). It is also surprising

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Wechner
Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K: Hi Michael and others, Sorry just now getting back to you. For your three original questions: - Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a thorough response. -

Re: Lucene/Solr and BERT

2021-05-25 Thread Alex K
; >>> > >> > https://opendistro.github.io/for-elasticsearch/blog/odfe-updates/2020/04/Building-k-Nearest-Neighbor-(k-NN)-Similarity-Search-Engine-with-Elasticsearch/ > >>> ? > >>>> They are however available in the snapshot releases. I started on a >

Re: Lucene/Solr and BERT

2021-05-24 Thread Michael Wechner
ething wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner < michael.wech...@wyona.com> wrote: Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medium.com/swlh/fun-with-apach

Re: Lucene/Solr and BERT

2021-05-23 Thread Michael Wechner
seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medi

Re: Lucene/Solr and BERT

2021-05-23 Thread Russell Jurney
; Is there still something missing? Or what would be the next steps? > > > > Thanks > > > > Michael > > > > > > > Here's the code: > > > https://github.com/alexklibisz/ann-benchmarks-lucene. There are some > test > > > suites

Re: Lucene/Solr and BERT

2021-05-23 Thread Michael Sokolov
gt; https://github.com/alexklibisz/ann-benchmarks-lucene. There are some test > > suites that index and search Glove vectors. My first impression was that > > indexing seems surprisingly slow, but it's entirely possible I'm doing > > something wrong. > > > > On We

Re: Lucene/Solr and BERT

2021-05-19 Thread Michael Wechner
uites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT

Re: Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
e are some test suites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucen

Re: Lucene/Solr and BERT

2021-04-21 Thread Alex K
doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: > Hi > > I recently found the following articles re Lucene/Solr and BERT > > https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 > > https://medium.com/swlh/fun-with-apache-luc

Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559 and would like to ask whether there might be more recent developments