Re: Re: Solr Config XML DTD's

2011-05-01 Thread Michael Sokolov
My first post too - but if I can offer a suggestion - there are more modern XML validation technologies available than DTD. I would heartily recommend RelaxNG/Compact notation (see http://relaxng.org/compact-tutorial-20030326.html) - you can generate Relax from a DTD, but it is more expressive

Re: Solr Config XML DTD's

2011-05-04 Thread Michael Sokolov
I'm not sure you will find anyone wanting to put in this effort now, but another suggestion for a general approach might be: 1 very basic static analysis to catch what you can - this should be a pretty minimal effort only given what can reasonably be achieved 2 throw runtime errors as Hoss sa

XmlCharFilter

2011-06-14 Thread Michael Sokolov
I work with a lot of XML data sources and have needed to implement an analysis chain for Solr/Lucene that accepts XML. In the course of doing that, I found I needed something very much like HTMLCharFilter, but that does standard XML parsing (understands XML entities defined in an internal or ex

Re: pro coding style

2012-12-01 Thread Michael Sokolov
On 12/1/2012 7:59 AM, Per Steffensen wrote: It is all about information - git has it, SVN doesnt. And my logical sence tells me that is has to be git and not github! :-) Now tell me that I am stupid :-) This kind of information (merge tracking) has been in svn since 1.5 (see http://subversio

Re: Lucene 9.0 release

2021-10-02 Thread Michael Sokolov
aked and I don't think that they should block 9.0. There are > no other blockers left to my knowledge. > > On Sat, Aug 14, 2021 at 6:24 PM Michael Sokolov wrote: >> >> It's been two years since our last release, we had lots of +1 when we >> rais

Re: Welcome Michael Gibney as Lucene committer

2021-10-07 Thread Michael Sokolov
Welcome, Michael! On Wed, Oct 6, 2021 at 9:34 AM Dawid Weiss wrote: > > Hello everyone! > > Please welcome Michael Gibney as the latest Lucene committer. Michael > - it's a tradition for you to introduce yourself, even if we've been > seeing you for quite a while! :) > > Dawid > > ---

should we clean up dev-docs?

2021-10-15 Thread Michael Sokolov
I was poking around looking for info on how we release Lucene, and I stumbled into this dev-docs folder. It seems to have info that's mostly useful for solr workflows, assumes you are working in lucene-solr repo, etc. The info about pmc-chair seems helpful, and maybe Putnam's guide to using git wor

Re: [VOTE] Release Lucene/Solr 8.10.1 RC1

2021-10-15 Thread Michael Sokolov
having a little trouble here with new laptop - after installing jdk8, ant, and finally running the smoke tester - my laptop shut down due to overheating! A little afraid to try it again, but I'll try improving the air circulation ... On Thu, Oct 14, 2021 at 2:53 PM Namgyu Kim wrote: > > +1 SUCCES

Re: [VOTE] Release Lucene/Solr 8.10.1 RC1

2021-10-15 Thread Michael Sokolov
+1 SUCCESS! [1:11:55.023477] I blame some random antivirus or something kicking in at the same time On Fri, Oct 15, 2021 at 10:44 AM Michael Sokolov wrote: > > having a little trouble here with new laptop - after installing jdk8, > ant, and finally running the smoke tester - my laptop

Re: Lucene 9.0 release

2021-10-17 Thread Michael Sokolov
> smoothly. > >> > >> ~2021-11-10: First RC for 9.0 > >> The date is indicative, the plan would be to move forward with the first > >> 9.0 RC as soon as the following conditions are met: > >> - 8.11 is out > >> - all 9.0 blockers have been a

Re: Glove dictionary?

2021-10-27 Thread Michael Sokolov
Yes, I copied some data from those GloVe files into the knn-token-vectors in the demo module. On Wed, Oct 27, 2021 at 2:38 PM Dawid Weiss wrote: > > I'm looking at licenses/pddl-10.txt, trying to figure out what it > applies to. I see this comment: > > * The vector dictionary used in the demo is

Re: Lucene 9.0 release

2021-10-29 Thread Michael Sokolov
> Dawid >>> >>> On Fri, Oct 29, 2021 at 6:00 PM Adrien Grand wrote: >>> >>>> Hearing no objections, I will be moving forward with the plan I >>>> outlined above. Next Monday is a holiday in France so I'll actually be >>>> cu

Re: [JENKINS] Lucene-jdk17panama-Windows (64bit/jdk-17) - Build # 625 - Unstable!

2021-10-30 Thread Michael Sokolov
Seems I had previously neglected to apply the vector similarity to query score conversion function in SimpleTextKnnVectorReader. It's a trivial fix, so I'll push without JIRA/review. On Sat, Oct 30, 2021 at 12:48 AM Policeman Jenkins Server wrote: > > Build: https://jenkins.thetaphi.de/job/Lucene

Re: Making org.apache.lucene.search.join.TermsQuery Public

2021-10-31 Thread Michael Sokolov
I'm not really sure why we have these two different implementations, but TermInSetQuery (which is public, and in core) provides a similar function -- have you compared the performance of the two? On Sun, Oct 31, 2021 at 6:57 PM Shad Storhaug wrote: > > Hi, > > > > In Lucene.NET we had a request f

Re: 8.11 and 9.0 release notes

2021-11-02 Thread Michael Sokolov
Re: the statement about our mirroring setup; I wonder if we should re-word it to reflect the new CDN setup https://fossforce.com/2021/10/apache-foundation-moves-from-mirrors-to-a-cdn-to-distribute-software/ Maybe we could strike this note now (deployment to CDNs is expected to be faster): Note:

Re: 8.11 and 9.0 release notes

2021-11-02 Thread Michael Sokolov
I changed the wording so it mentions the new CDN On Tue, Nov 2, 2021 at 2:48 PM Michael Sokolov wrote: > > Re: the statement about our mirroring setup; I wonder if we should > re-word it to reflect the new CDN setup > > https://fossforce.com/2021/10/apache-foundation-moves-from-m

Re: Bump minimum Java version to 17 on main (10.0)

2021-11-04 Thread Michael Sokolov
I can imagine some users might like to keep abreast of main, at least in some kind of testing setup, but aren't ready to cut over their JDK for some reason (as Dawid was describing), but they can always do a small patch and build Lucene themselves, applying the -target for compatibility with their

Re: Bump minimum Java version to 17 on main (10.0)

2021-11-04 Thread Michael Sokolov
> I think you misunderstood what I said: We still have a stable branch where we > backport our stuff to. So the main branch is really for "trying out and > working on new stuff". That's what a main branch is thought for. ... Sorry if I did. I am definitely mixing up -target and --release - old w

Re: [VOTE] Release Lucene/Solr 8.11.0 RC1

2021-11-10 Thread Michael Sokolov
SUCCESS! [0:53:26.276923] +1 On Wed, Nov 10, 2021 at 10:36 AM Timothy Potter wrote: > > +1 (binding) > SUCCESS! [1:21:09.867850] > > + kicked the tires on the UI locally and deployed to EKS using the > Solr operator! > > Cheers, > Tim > > On Wed, Nov 10, 2021 at 4:15 AM Jan Høydahl wrote: > > >

Re: Anyone familiar (or use) MultiRangeQuery?

2021-11-22 Thread Michael Sokolov
I did a little git spelunking and found this PR https://github.com/apache/lucene-solr/pull/794 where it was introduced. It does sound to me as if the intent was to match on multiple multi-dimensional ranges (ie hypercubes), not on any dimension among multiple ranges? Why would anyone ever want to d

Re: What should we do of branch_8x?

2021-11-23 Thread Michael Sokolov
+1 to remove all content and leave behind a README in 8.x and 7.x, and it sounds like adding the .asf..yaml file could even prevent further commits? I hope there weren't any consequences of having a few unintended commits in the 7x branch. Makes me feel it would be OK to handle this cleanup asynch

Re: [VOTE] Release Lucene 9.0.0 RC2

2021-11-23 Thread Michael Sokolov
I ran the smoke tester to SUCCESS! - sorry, I lost the timing in my terminal scrollback, but it is so fast :) Also ran unit tests of our service after upgrading, and although there are some fails, it all looks like we are stumbling over our own feet - not Lucene issues. +1 to release On Tue, Nov

Re: [VOTE] Release Lucene 9.0.0 RC2

2021-11-23 Thread Michael Sokolov
and ... now all the tests are passing; no change to Lucene needed. On Tue, Nov 23, 2021 at 5:22 PM Michael Sokolov wrote: > > I ran the smoke tester to SUCCESS! - sorry, I lost the timing in my > terminal scrollback, but it is so fast :) Also ran unit tests of our > service after up

Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-26 Thread Michael Sokolov
SUCCESS! [0:10:10.031522] +1 from me On Fri, Nov 26, 2021 at 9:31 AM Adrien Grand wrote: > > Please vote for release candidate 3 for Lucene 9.0.0. > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-1ddce848cf3d5067efcafc6569d5f8203e56af

Re: [JENKINS] Lucene-jdk17panama-Windows (64bit/jdk-17) - Build # 701 - Still Unstable!

2021-11-27 Thread Michael Sokolov
I see that we periodically get various test failures from this test. Has anybody been tracking this more carefully than me, and if so, do you remember if it's always on Windows where these too-many-open-files (and sometimes out of disk space) errors show up? I wonder if we should reduce the cardina

Re: [JENKINS] Lucene-jdk17panama-Windows (64bit/jdk-17) - Build # 701 - Still Unstable!

2021-11-27 Thread Michael Sokolov
ts.nightly=true -Dtests.seed=B67ECC7381FE35B after 12m44s I killed it On Sat, Nov 27, 2021 at 11:59 AM Michael Sokolov wrote: > > I see that we periodically get various test failures from this test. > Has anybody been tracking this more carefully than me, and if so, do > you remembe

Re: [JENKINS] Lucene-jdk17panama-Windows (64bit/jdk-17) - Build # 701 - Still Unstable!

2021-11-27 Thread Michael Sokolov
mits; then perhaps we might see some more interesting failure modes On Sat, Nov 27, 2021 at 1:30 PM Michael Sokolov wrote: > > Hmm, that test is taking a very long time on my laptop (running > Ubuntu) with JDK11, so doesn't seem to be a Windows or JDK17 issue > > ./gra

Re: [JENKINS] Lucene-jdk17panama-Windows (64bit/jdk-17) - Build # 701 - Still Unstable!

2021-11-27 Thread Michael Sokolov
Oh! Thank you; after reading it I see folks have progressed to a root cause and a proposed solution. On Sat, Nov 27, 2021 at 4:24 PM Dawid Weiss wrote: > > > It's this issue, Mike: > https://issues.apache.org/jira/browse/LUCENE-10088 > > On Sat, Nov 27, 2021 at 8:11 PM

Re: Welcome Julie Tibshirani to the Lucene PMC

2021-11-30 Thread Michael Sokolov
Welcome, Julie! I think Adrien already added you to the PMC LDAP group, but I'll double-check On Tue, Nov 30, 2021, 2:11 PM Anshum Gupta wrote: > Congratulations and welcome, Julie! > > On Tue, Nov 30, 2021 at 1:49 PM Adrien Grand wrote: > >> I'm pleased to announce that Julie Tibshirani has

Re: Welcome Julie Tibshirani to the Lucene PMC

2021-11-30 Thread Michael Sokolov
yup I checked and you are there: https://whimsy.apache.org/roster/committee/lucene -- just curious, does anyone know why some of our names are **bold** on that list? On Tue, Nov 30, 2021 at 5:19 PM Michael Sokolov wrote: > > Welcome, Julie! > > I think Adrien already added you to

Re: [VOTE] Release Lucene 9.0.0 RC4

2021-12-01 Thread Michael Sokolov
SUCCESS! [0:10:23.145686] +1 On Wed, Dec 1, 2021 at 11:56 AM Adrien Grand wrote: > > Please vote for release candidate 4 for Lucene 9.0.0 > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC4-rev-0b18b3b965cedaf5eb129aa41243a44c83ca826d > > Yo

Re: Closing GitHub PRs

2021-12-02 Thread Michael Sokolov
In this specific instance, I don't see the harm in leaving these issues there since the entire repo is essentially an archival artifact at this point. If we actually want to notify people that "hey your issue is in a dead zone, do you want to revive it? Here's how ..." we could maybe generate some

Re: [VOTE] Release Lucene 9.0.0 RC4

2021-12-02 Thread Michael Sokolov
I feel I kind of wimped out of being RM, so thank you very much for stepping up Adrien! On Thu, Dec 2, 2021 at 3:30 PM Adrien Grand wrote: > > Thanks Anshum, Dawid and others for the support! > > Le jeu. 2 déc. 2021 à 20:39, Dawid Weiss a écrit : >> >> >> SUCCESS! [0:17:10.950384] >> >> +1. >> >

Re: Any potential benefits to a SSDV#bulkLookupOrd(long ord) impl?

2021-12-17 Thread Michael Sokolov
> And you get two single-valued fields instead of one big multi-valued field... > so I'm not sure I am convinced that "dim mixing" is typically a good thing. Mixing enables the user to model multiple (of their) fields within a single Lucene field. You may have a very lightweight and loosely-manag

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Michael Sokolov
Welcome Patrick! On Sun, Dec 19, 2021 at 3:27 PM Xi Chen wrote: > > Congratulations and welcome Haoyu! > > Best, > Zach > > On Dec 19, 2021, at 12:05 PM, Patrick Zhai wrote: > >  > Thanks everyone! > > It's a great honor to become a lucene committer and thank you everyone for > building such a

Re: Searching Lucene FAQ with Lucene

2021-12-21 Thread Michael Sokolov
interesting -- it always matches *something* I guess? It might be helpful to show not only the answer, but also the question that was matched? On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner wrote: > > Hi > > I am working on a webapp called "Katie" in order to detect duplicated > questions > > ht

Re: Payloads for each term

2022-01-13 Thread Michael Sokolov
Oh interesting! I did not know about this FeatureField (link was to the old repo, now gone: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/FeatureField.java worked for me) On Wed, Nov 11, 2020 at 4:37 PM Mayya Sharipova wrote: > > For sparse vectors, we

Re: Filtering before a vector search.

2022-01-19 Thread Michael Sokolov
+1 we should extend the functionality to support any Bits, not just liveDocs; we need to propose an API. The implementation should not be too hard - we need to intersect the user-supplied Bits with liveDocs and use that to filter. On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein wrote: > > Hi, > >

Re: Welcome Guo Feng as Lucene committer

2022-01-25 Thread Michael Sokolov
Hail Feng, well met, please make yourself at home! On Tue, Jan 25, 2022, 6:04 AM Ignacio Vera wrote: > Congratulations and welcome! > > On Tue, Jan 25, 2022 at 11:50 AM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >> Welcome and congratulations, Feng! >> >> On Tue, Jan 25, 2022 a

Re: [GitHub] [lucene] msokolov commented on pull request #641: LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls

2022-02-04 Thread Michael Sokolov
Ah nvm I saw your later comment to the effect that we have a copy we can use already On Fri, Feb 4, 2022, 8:57 AM GitBox wrote: > > msokolov commented on pull request #641: > URL: https://github.com/apache/lucene/pull/641#issuecomment-1030007403 > > >I think we cannot use intinthashset in c

Re: AddIndexes(CodecReader...) API Question

2022-02-04 Thread Michael Sokolov
>From looking at https://issues.apache.org/jira/browse/LUCENE-2996 I think your analysis is correct, but I wasn't around for that so I am just reading the historical record, same as you. To my way of thinking, doing these opportunistic flushes clutter up the logic and ideally we would *not* do eith

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-14 Thread Michael Sokolov
I think we picked the 1024 number as something that seemed so large nobody would ever want to exceed it! Obviously that was naive. Still the limit serves as a cautionary point for users; if your vectors are bigger than this, there is probably a better way to accomplish what you are after (eg better

Re: How to Increase max vector size?

2022-02-15 Thread Michael Sokolov
er the patch/or do it myself but after 10/03. > Once the pull request is ready (including the Javadoc documentation that > clearly states that if you go above X it's at your own risk), we'll involve > also Michael Sokolov and the other committers familiar with this area of > the c

Re: How to Increase max vector size?

2022-02-16 Thread Michael Sokolov
xamples could > be a service of OpenAI or vector search databases like for example Weaviate > or Pinecone. > > Thanks > > Michael > > > > > Am 15.02.22 um 23:34 schrieb Michael Sokolov: > > I don't think it makes sense to have a static variable maximum that yo

Re: Lucene 9.1 release soon?

2022-02-25 Thread Michael Sokolov
+1 thanks for volunteering On Thu, Feb 24, 2022, 5:41 AM Mayya Sharipova wrote: > + 1 > > On Thu, Feb 24, 2022 at 11:28 AM Ignacio Vera wrote: > >> +1 >> >> On Thu, Feb 24, 2022 at 9:05 AM Adrien Grand wrote: >> >>> +1 >>> >>> On Thu, Feb 24, 2022 at 8:43 AM Michael Wechner >>> wrote: >>> > >

Re: [jira] [Updated] (LUCENE-10454) UnifiedHighlighter can miss terms because of query rewrites

2022-03-03 Thread Michael Sokolov
Isn't this kind of like if a tree falls in the woods and nobody is there does it make a sound? I mean -- if the index is empty, how can UH fail? No documents will ever match, ergo no highlights will be returned, so it seems fine that it is unable to extract terms from the query. On Thu, Mar 3, 20

Re: [JENKINS] Lucene-9.x-Linux (64bit/jdk-17.0.2) - Build # 1798 - Unstable!

2022-03-09 Thread Michael Sokolov
This did not reproduce for me On Wed, Mar 9, 2022 at 3:41 PM Policeman Jenkins Server wrote: > > Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/1798/ > Java: 64bit/jdk-17.0.2 -XX:-UseCompressedOops -XX:+UseG1GC > > 3 tests failed. > FAILED: org.apache.lucene.search.TestMatchAllDocsQuery

Re: [VOTE] Release Lucene 9.1.0 RC2

2022-03-18 Thread Michael Sokolov
Yeah this is endemic in our world now. I am having the same issue On Fri, Mar 18, 2022 at 2:51 PM Robert Muir wrote: > > >> 2>at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) > >> 2>at Log4jHotPatch.asmVersion(Log4jHotPatch.java:71) > >> 2>at Log4jHotPatch.agen

Re: [VOTE] Release Lucene 9.1.0 RC2

2022-03-18 Thread Michael Sokolov
We had to do a workaround in our internal test suites by setting a system property to trick this thing into not running; Maybe we can apply that also here... On Fri, Mar 18, 2022 at 3:12 PM Dawid Weiss wrote: > > I think this is Amazon trying to cope with log4shell - they've added > external inst

Re: [VOTE] Release Lucene 9.1.0 RC2

2022-03-18 Thread Michael Sokolov
> block (the Policeman Jenkins machine has many cores, so issues with not >> synchronized cache lines can happen easily)." Is this a different problem >> from #1 where we just have slow tests? I'm not sure if this is something we >> want to investigate as part of th

Re: [ANNOUNCE] Apache Lucene 9.1.0 released

2022-03-22 Thread Michael Sokolov
Thank you for another release Adrien! On Tue, Mar 22, 2022 at 10:32 AM Adrien Grand wrote: > > The Lucene PMC is pleased to announce the release of Apache Lucene 9.1.0. > > Apache Lucene is a high-performance, full-featured search engine > library written entirely in Java. It is a technology suit

Lucene PMC Chair Bruno Roustant

2022-03-23 Thread Michael Sokolov
Hello, Lucene developers. Lucene Program Management Committee has elected a new chair, Bruno Roustant, and the Board has approved. Bruno, thank you for stepping up, and congratulations! -Mike - To unsubscribe, e-mail: dev-unsubsc

Re: Can Lucene9.0.0 be used on Android devices?

2022-04-05 Thread Michael Sokolov
I don't know, it probably comes down to how compatible Android's JVM is with JDK 11. Certainly it isn't a platform that gets a lot of attention from devs here, and I suspect Dalvik is not up to JDK11? Not sure though ... let us know what happens! On Tue, Mar 29, 2022 at 10:53 AM Baiyang Liu wrote

spotless targets

2022-04-06 Thread Michael Sokolov
Hi, locally I failed to run spotlessCheck/spotlessApply on main (10.x) branch. I assume it's because of a JVM difference; here's the error: Step 'google-java-format' found problem in 'lucene/core/src/java/module-info.java': null java.lang.reflect.InvocationTargetException at java.base/jd

Re: spotless targets

2022-04-06 Thread Michael Sokolov
OK, this also happens with Oracle's JDK17. Now I'm confused On Wed, Apr 6, 2022 at 4:28 PM Michael Sokolov wrote: > > Hi, locally I failed to run spotlessCheck/spotlessApply on main (10.x) > branch. I assume it's because of a JVM difference; here's the error: &

Re: spotless targets

2022-04-06 Thread Michael Sokolov
that were not there before > > On Wed, Apr 6, 2022 at 3:46 PM Michael Sokolov wrote: >> >> OK, this also happens with Oracle's JDK17. Now I'm confused >> >> On Wed, Apr 6, 2022 at 4:28 PM Michael Sokolov wrote: >> > >> > Hi, locally I fail

Re: spotless targets

2022-04-08 Thread Michael Sokolov
I guess this is related to the use of Java modules that now hide symbols? On Fri, Apr 8, 2022 at 3:05 AM Dawid Weiss wrote: > > > Maybe a check like this? > https://github.com/apache/lucene/pull/802 > > On Thu, Apr 7, 2022 at 9:26 PM Dawid Weiss wrote: >>> >>> Does spotless have an option to for

Re: spotless targets

2022-04-08 Thread Michael Sokolov
uot; to "deny". In Jdk 17 the setting was > completely removed (https://openjdk.java.net/jeps/403). > > So you have to tell java to export the affected packages (each one > separately listed) also to classpath applications (unnamed module). > > Uwe > > Am 8. April 2022 15:07:

Re: FST codec for *infix* queries. No luck so far.

2022-04-26 Thread Michael Sokolov
I'm not sure under which scenario ngrams (edgengrams) would not be an option? Another to try maybe would be something like BPE (byte pair encoding). In this encoding, you train a set of tokens from a vocabulary based on frequency of occurrence, and agglomerate them iteratively until you have the vo

Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-05 Thread Michael Sokolov
I'd like to see some discussion of pros/cons. Personally I don't have a lot of experience working with github's issue system, while I have grown comfortable with JIRA over the years, in spite of its warts. Here are a few things I like and *don't* like about both systems (mostly JIRA), but I don't k

Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-05 Thread Michael Sokolov
> Is the original Jira -> GitHub move just a change of defaults or are we, > once moved to GitHub, not letting people use Jira at all anymore ? Nothing has been decided - it's all open for debate. I just want to re-state the idea (at least as I heard it) behind this proposed move is to make contr

Re: XML retrieval with Intervals

2022-05-06 Thread Michael Sokolov
Many years ago I had started this Lux project that was designed to build an XML-aware index using Solr; see https://github.com/msokolov/lux/tree/master/src/main/java/lux/index/analysis for the analysis chain I used. Maybe you'll find something useful in this project? It's dormant for years, and pre

Re: XML retrieval with Intervals

2022-05-06 Thread Michael Sokolov
> > Disclaimer: I worked there for a couple of years ten years ago. But I’ve been > inside that product and it is non-muggle technology. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On May 6, 2022, at 5:35 AM,

Re: [GitHub] [lucene] jpountz commented on pull request #859: LUCENE-10552: KnnVectorQuery has incorrect equals/ hashCode

2022-05-13 Thread Michael Sokolov
+1 to back port. It will make things more consistent at least On Thu, May 12, 2022, 11:36 AM GitBox wrote: > > jpountz commented on PR #859: > URL: https://github.com/apache/lucene/pull/859#issuecomment-1125144256 > >FWIW I found about this PR because it is in the 9.2 changelog on `main` > b

Re: [GitHub] [lucene] msokolov commented on pull request #870: LUCENE-10502: Refactor hnswVectors format

2022-05-13 Thread Michael Sokolov
Okay sorry I was confused about these override methods - they are different because of the different access patterns in the sparse/dense cases. Maybe the loss of history was unavoidable since we moved/renamed the file, but I wish we could maintain it. On Fri, May 13, 2022 at 1:45 PM GitBox wrote:

Re: [VOTE] Release Lucene 9.2.0 RC1

2022-05-18 Thread Michael Sokolov
+1 SUCCESS! [0:43:09.481661] I'm not going to get hung up on an issue with the smokeTester if Robert's not :) BTW thank you for running on slow machine that takes many hours! On Wed, May 18, 2022 at 3:48 PM Robert Muir wrote: > > I opened issue about this. It shouldn't block the release, but it

Re: [VOTE] Release Lucene 9.2.0 RC2

2022-05-20 Thread Michael Sokolov
+1 SUCCESS! [0:49:44.832567] JDK11 only On Fri, May 20, 2022 at 4:46 PM Houston Putman wrote: > > +1 > > SUCCESS! [2:17:07.370407] (java 11 & 17) > > - Houston > > On Fri, May 20, 2022 at 8:04 AM Jan Høydahl wrote: >> >> +1 >> >> SUCCESS! [1:13:38.226868] >> >> Jan >> >> > 19. mai 2022 kl. 17:1

Re: Adding a new PointDocValuesField

2022-05-25 Thread Michael Sokolov
Also, there should be examples from other fields. Suppose you are indexing map data and want to support a UI that shows "hot spots" on the map where there is a lot of let's say ... activity of some sort. You'd like to facet on 2-d areas. Or for log analytics -- you want to do anomaly detection and

Re: Welcome Chris Hegarty as Lucene committer

2022-06-01 Thread Michael Sokolov
Welcome Chris! I remember being part of a skeptical bunch of students in 1990 hearing about this new Java thing that was supposedly going to take over the world. Apparently it is still thriving :) -Mike On Wed, Jun 1, 2022 at 12:59 PM David Smiley wrote: > > Welcome Chris! -

Re: Welcome Lu Xugang as Lucene committer

2022-06-01 Thread Michael Sokolov
Welcome! I like finally too, but it seems strange that it has nothing to do with its apparent relative, final. On Wed, Jun 1, 2022 at 4:51 PM Gus Heck wrote: > Welcome and congratulations :) > > On Wed, Jun 1, 2022 at 3:32 PM Alessandro Benedetti > wrote: > >> Welcome on board Xugang! >> --

module not found error in intellij

2022-06-02 Thread Michael Sokolov
In IntelliJ building Lucene main branch I see this: .../workspace/lucene/lucene/core.tests/src/test/module-info.java:23: error: module not found: org.apache.lucene.core.tests.main requires org.apache.lucene.core.tests.main; ^ Am I doing it wrong? Does anyb

Re: module not found error in intellij

2022-06-02 Thread Michael Sokolov
t; console. > ./gradlew -p lucene/core.tests/ test > > I'm not sure the exact cause of that though IDEs' java module support > looks far from perfect for now, I would recommend not to use IDE when > running modular tests... > > Tomoko > > 2022年6月2日(木) 23:44 Michae

Re: module not found error in intellij

2022-06-02 Thread Michael Sokolov
ssue tracker), they are just not our bugs... > > > 2022年6月3日(金) 0:17 Michael Sokolov : > > > > glad to know I'm not the only one! I think it's not OK though. Running > > tests in IDE is super useful, especially for debugging, but also for > > visualizing c

Re: module not found error in intellij

2022-06-03 Thread Michael Sokolov
j compilation mode. >>> It's hacky but I've done it in the past. >>> >>> When I switch to (my preferred) intellij compilation, things break. This >>> is definitely a regression in IntelliJ somewhere because it used to work >>> very recently -

Re: 30% query performance degradation for documents with small stored fields

2022-06-07 Thread Michael Sokolov
I wonder whether it would be worth trying switching from stored fields to doc values. The access patterns are different, so the change would not be trivial, but you might be able to achieve gains this way - I really am not sure whether or not you would, the storage model is completely different, bu

Re: Welcome Greg Miller to the Lucene PMC

2022-06-07 Thread Michael Sokolov
Welcome Greg [copying from other thread, oops!] On Tue, Jun 7, 2022 at 11:41 AM Houston Putman wrote: > > Welcome Greg! > > On Tue, Jun 7, 2022 at 11:35 AM Gautam Worah wrote: >> >> Congratulations Greg! >> >> On Tue, Jun 7, 2022 at 8:04 AM Patrick Zhai wrote: >>> >>> Congrats Greg! >>> >>> Pat

Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-06-07 Thread Michael Sokolov
Sorry I missed the first vote I think; also +1(pmc) from me. I'd be OK with some issues (esp. closed ones) being orphaned in the old system too. On Tue, Jun 7, 2022 at 9:20 AM Dawid Weiss wrote: > > > I'm fine with either system (or both used concurrently). There is significant > research effort

Re: Welcome Lu Xugang as Lucene committer

2022-06-07 Thread Michael Sokolov
Welcome and thanks for spreading the word; your amazingkoala blog looks very active (although I can't read it :() On Thu, Jun 2, 2022 at 4:09 PM Mikhail Khludnev wrote: > > Welcome, Lu. > > On Wed, Jun 1, 2022 at 12:59 PM 陆徐刚 wrote: >> >> Thanks Adrien for the announcement and all for the welcom

exposing per-field storage usage

2022-06-13 Thread Michael Sokolov
At Amazon, we have a need to produce regular metrics on how much disk storage is consumed by each field. We manage an index with data contributed by many teams and business units and we are often asked to produce reports attributing index storage usage to these customers. The best tool we have for

Re: exposing per-field storage usage

2022-06-14 Thread Michael Sokolov
Oh, yes that's a clever idea. It seems it would take quite a while (tens of minutes?) for a larger index though? Much faster than the force-merge solution for sure. I guess to get faster we would have to instrument each format. I mean they generally do know how much space each field is occupying, b

Re: exposing per-field storage usage

2022-06-14 Thread Michael Sokolov
2 at 11:15 AM Robert Muir wrote: >> >> On Tue, Jun 14, 2022 at 10:37 AM Michael Sokolov wrote: >> > >> > Oh, yes that's a clever idea. It seems it would take quite a while >> > (tens of minutes?) for a larger index though? Much faster than the >> &

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-15 Thread Michael Sokolov
Agree with everyone here. Also consider that if we duplicate there will be two copies of the same issue, and they will inevitably diverge... On Wed, Jun 15, 2022 at 9:28 AM Jan Høydahl wrote: > > +1 for a manual approach > > Over time the volume will gravitate to mostly GitHub issues. And JIRA wi

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-20 Thread Michael Sokolov
I think the user mapping must be inferred based on membership in the Apache "organization" https://github.com/settings/organizations On Sun, Jun 19, 2022 at 2:45 AM Dawid Weiss wrote: > > >> User id mapping is an important consideration for me. > > > Some mapping has to be present somewhere alrea

Re: A prototype migration tool Jira to GitHub

2022-06-23 Thread Michael Sokolov
Yes thank you! You say this is not difficult, but it looks like a big job to me! Here are a bunch of things I noticed that we would ideally address (from looking at one long and complex issue, LUCENE-9004). I wouldn't be so bold as to say these should block us from proceeding if they're not address

Re: A prototype migration tool Jira to GitHub

2022-06-23 Thread Michael Sokolov
d, so apologies if this is a duplicate: >>>> >>>> Did you check >>>> https://spring.io/blog/2021/01/07/spring-data-s-migration-from-jira-to-github-issues >>>> >>>> They especially write there is an api that doesn't trigger notifications. >>>&g

Re: A prototype migration tool Jira to GitHub

2022-06-26 Thread Michael Sokolov
as for this access control/script monitoring problem, I wonder whether we could import all the issues into a new github repo owned by whomever is running the script, and then transfer from there to the lucene repo? It would be an extra step involving another script (or something), but maybe(?) that

Re: How to avoid double-emails on all git issue/PR updates?

2022-07-11 Thread Michael Sokolov
Oh! thank you - this will be a big help. I just went to https://github.com/apache/lucene and then under "Watch" selected "participating and mentions" instead of "all activity" (which I had before). On Mon, Jul 11, 2022 at 5:46 AM Uwe Schindler wrote: > > Hi, > > I fully agree with Adrien, because

Re: Lucene 9.3.0 release

2022-07-11 Thread Michael Sokolov
I would like to see if we can get https://issues.apache.org/jira/browse/LUCENE-10577 in. It is working and gives nice gains, but there is some controversy about the API. If we can't get it sorted out this week(?) it can certainly slip to the next revision. I know that https://issues.apache.org/jira

Build failures

2022-07-16 Thread Michael Sokolov
Sorry for all the noise. I think it may be a botched backport of the timeout support I did yesterday. Will look at it today

Re: [DISCUSS] Read-only Jira after the GitHub issues migration?

2022-07-17 Thread Michael Sokolov
I think we'd still have the mailing lists open for discussion. So anyone not willing or able to use GitHub would still be able to participate in a meaningful way. Having two parallel bug trackers seems much less useful to me. I'd rather have people emailing to a list that is active rather than post

Re: Lucene 9.3.0 release

2022-07-19 Thread Michael Sokolov
;> >> >> On Tue, Jul 12, 2022 at 2:50 PM Ignacio Vera wrote: >> >>> Thanks for the heads up, I am planning to cut the brunch middle next >>> week, Wednesday July 20th. >>> Let me know at the beginning of next week if there is any issue from >>>

Re: Lucene 9.3.0 release

2022-07-21 Thread Michael Sokolov
;> On Tue, Jul 19, 2022 at 4:09 PM Mayya Sharipova >>>> wrote: >>>> >>>>> Thanks for the reminder about the release, Ignacio! >>>>> About LUCENE-10592 >>>>> <https://issues.apache.org/jira/browse/LUCENE-10592> I will see wha

Re: Lucene 9.3.0 release

2022-07-21 Thread Michael Sokolov
Tue, Jul 19, 2022 at 4:09 PM Mayya Sharipova >>>> wrote: >>>>> >>>>> Thanks for the reminder about the release, Ignacio! >>>>> About LUCENE-10592 I will see what progress we can make today, and will >>>>> let you know be

Re: Lucene 9.3.0 release

2022-07-21 Thread Michael Sokolov
LUCENE-10592, it's also a big >>>> change, maybe we should not even try to get it in before cutting the >>>> branch? >>>> >>>> On Tue, Jul 19, 2022 at 4:09 PM Mayya Sharipova >>>> wrote: >>>>> >>>>> Thanks for

Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-26 Thread Michael Sokolov
searching JIRA for "slkjfdf" I found a few issues in other projects, but none seems to be getting the same degree of spam love On Tue, Jul 26, 2022 at 3:50 PM Mike Sokolov (Jira) wrote: > > > [ > https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issueta

Re: Welcome Vigya Sharma as Lucene committer

2022-07-28 Thread Michael Sokolov
Welcome Vigya! On Thu, Jul 28, 2022, 6:48 AM Michael McCandless wrote: > Welcome Vigya!! > > Mike > > On Thu, Jul 28, 2022 at 5:28 AM Lu Xugang > wrote: > >> Congratulations, and welcome Vigya! >> >> Xugang >> >> www.amazingkoala.com.cn >> >> >> >> >> On Jul 28, 2022, at 17:21, Ignacio Vera

Re: [jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-07-28 Thread Michael Sokolov
Thanks David On Wed, Jul 27, 2022 at 5:13 PM David Smiley wrote: > > FYI I had filed https://issues.apache.org/jira/browse/INFRA-23503 > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Jul 26, 2022 at 3:5

Re: [HELP] Please spot-check the migrated Lucene GitHub issues!

2022-07-30 Thread Michael Sokolov
I did some spot-checking. ooh, it looks so nice! I have one suggestion, totally optional/cosmetic, but I wonder if we could make the original comment authors' names more prominent by moving the [Legacy Jira: ${Name} (@${user}) on ${date}] to the top of each comment rather than the bottom? That wou

Re: [HELP] Please spot-check the migrated Lucene GitHub issues!

2022-07-30 Thread Michael Sokolov
now of any issues like that (except this SPAM!), so I'd be happy if we don't change anything here :) On Sat, Jul 30, 2022 at 6:12 PM Michael Sokolov wrote: > > I did some spot-checking. ooh, it looks so nice! > > I have one suggestion, totally optional/cosmetic, but I wonder

Re: [HELP] Please spot-check the migrated Lucene GitHub issues!

2022-08-09 Thread Michael Sokolov
Yes, looks amazing! All I could find was: This one https://github.com/mocobeta/forks-migration-test-2/issues/8964 seems to be missing its attachment - not sure if this was expected with this round? EG https://raw.githubusercontent.com/apache/lucene-jira-archive/attachments/attachments/LUCENE-9004/

  1   2   3   4   5   6   >