Re: Healthy PR Approaches from Apache Beam

2023-11-11 Thread Stefan Vodita
Thank you for going through all those PRs Mike!
I opened an issue for porting some of the bot functionality:
https://github.com/apache/lucene/issues/12796

Stefan


On Thu, 2 Nov 2023 at 15:30, Michael McCandless 
wrote:

> Thanks for raising this Stefan.  This is an impressive approach to more
> rigorously responding on PRs and taking them through their lifecycle,
> giving a better community experience especially for newcomers.  I love
> their docs too.
>
> Those graphs are awesome!  Much better than the simple PR open/closed
> count chart we have in our nightlies:
> https://home.apache.org/~mikemccand/lucenebench/github_pr_counts.html
>
> I just made a pass through some of our PRs (sorted oldest to newest, and
> sorry for all the dev list noise!) and it's sad how many PRs we (Lucene dev
> community) really should have responded to, but failed to, in a
> timely manner.  I think something like the Apache Beam bot could help this,
> though we don't really document attaching labels to newly opened PRs.
>
> I wonder what baby step we could adopt from Beam's approach to PRs?  Maybe
> open an issue on GitHub so we can discuss?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Oct 31, 2023 at 5:39 AM Stefan Vodita 
> wrote:
>
>> Hi all,
>>
>> I recently learned a few interesting things that the Beam
>>  project does to
>> promote and maintain good interactions on PRs.
>>
>> 1. Community metrics dashboard
>> . The
>> graphs are pretty and insightful. You can
>>see things like the number of open PRs across time or the mean time to
>>first interaction on a new PR.
>>
>> 2. Life cycle management for PRs.
>> a. A bot labels the PR and assigns reviewers based on the labels
>> (example
>> ).
>> b. Authors can run and re-run the pre-commit checks (doc
>> 
>> ).
>> c. If the PR is not reviewed within 3 business days, the author is
>> encouraged to notify the mailing list (doc
>> 
>> ).
>> d. If the PR doesn't have activity, the bot comments on it, warning
>> that it
>> will be closed (example
>> ).
>>
>> It's hard for me to tell which of these ideas would translate well to the
>> Lucene community, but we can try out something small, like an automated
>> comment
>> on stale PRs.
>>
>>
>> Stefan
>>
>>
>> https://github.com/apache/beam
>> http://35.193.202.176/d/code_velocity/code-velocity?orgId=1
>> https://github.com/apache/beam/pull/26424#issuecomment-1522788593
>>
>> https://github.com/apache/beam/blob/master/CONTRIBUTING.md#create-a-pull-request
>> https://github.com/apache/beam/blob/master/CONTRIBUTING.md#get-reviewed
>> https://github.com/apache/beam/pull/26424#issuecomment-1671254755
>>
>>


Re: Ascii folding

2023-11-11 Thread Uwe Schindler

Hi Dawid,

the ASCII folding filter is meant to remove accents. You would like to 
have searching for visually similar characters. These are 2 different 
things.


Actually Robert also has some config options, waht I generally use for 
wester european searches where some documents may contain names of 
people (Author names, titles in cyrillic or other languages) it to 
convert the tokens using ICU transliteration (use one of the ICU folding 
filters with the below config):


Transliterator.getInstance("Any-Latin; NFD; [:Nonspacing Mark:] Remove; 
NFKC; CaseFold", Transliterator.FORWARD);


This does convert everything to latin characters in a language-neutral 
way and then removes all accents by the trick "decompose, remove 
non-spacing mark, compose again and case-fold the result.


Uwe

Am 10.11.2023 um 19:03 schrieb Dawid Weiss:


Hi Steve, Chris,

Ok, makes sense. Thanks for the pointers. I agree the justification 
for the use of character-level normalization filters is highly 
context-dependent (for example, unsuitable when mixed languages are 
present on input).


Dawid

On Fri, Nov 10, 2023 at 6:58 PM Chris Hostetter 
 wrote:



: Here's the unicode letter after "th":
: https://www.fileformat.info/info/unicode/char/0435/index.htm
:
: To my surprise, I couldn't find it in the ascii folding filter:
:
:

https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java
:
: Anybody remembers whether the omission of Cyrillic characters was
: intentional (there is quite a few of them that are nearly
identical in
: appearance to Latin letters).

From the javadocs, i'm going to guess it's because the the filter
focuses
on "Latin_characters_in_Unicode" ... and your "CYRILLIC SMALL
LETTER IE"
isn't described as being a "(adjective) LATIN noun (WITH noun)"
like all
of the other characters that are considered to have a direct
mapping to
the "ASCII" / latin characters.

If you look back at when it was added...

https://issues.apache.org/jira/browse/LUCENE-1390

...the original focus was on deprecating "ISOLatin1AccentFilter" and
replacing it with "a more comprehensive version of this code that
included
not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin
Extended A unicode blocks."  (The originally proposed name was
'ISOLatinAccentFilter') ... subsequent discussion focused on
adding more
Latin blocks.

There was a related issue at the time which initially aimed to add a
more general "UnicodeNormalizationFilter" that ultimated resulted in
adding the "ICU" analysis classes...

https://issues.apache.org/jira/browse/LUCENE-1343

..which IIUC may better handle "CYRILLIC SMALL LETTER IE" (but i
haven't
tested that)



-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de


Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-11 Thread Mikhail Khludnev
Welcome, Patrick.

On Fri, Nov 10, 2023 at 11:05 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> I'm happy to announce that Patrick Zhai has accepted an invitation to join
> the Lucene Project Management Committee (PMC)!
>
> Congratulations Patrick, thank you for all your hard work improving
> Lucene's community and source code, and welcome aboard!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-11 Thread Uwe Schindler
Welcome Patrick!

Uwe

Am 10. November 2023 21:04:32 MEZ schrieb Michael McCandless 
:
>I'm happy to announce that Patrick Zhai has accepted an invitation to join
>the Lucene Project Management Committee (PMC)!
>
>Congratulations Patrick, thank you for all your hard work improving
>Lucene's community and source code, and welcome aboard!
>
>Mike McCandless
>
>http://blog.mikemccandless.com

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45394 - Unstable!

2023-11-11 Thread Uwe Schindler

Hi,

I had some time today to do upgrades of JDK versions on Policeman Jenkins:

 * jdk 8, 11, 17, 21 was updated to latest Temurin Hotspot versions
   (Linux, Windows, Mac x64): jdk1.8.0_392, jdk-11.0.21, jdk-17.0.9,
   jdk-21.0.1
 * updated to jdk-17.0.8 of IBM Semeru OpenJ9
 * added jdk-11.0.20 and jdk-20.0.2 of IBM Semeru OpenJ9 into the game

Uwe

Am 06.11.2023 um 14:02 schrieb Michael McCandless:

On Sun, Nov 5, 2023 at 5:01 AM Uwe Schindler  wrote:

I will update the J9 runtime later this day. But this was a real
bug, so it's good it catched this :-) So - no - I won't remove
OpenJ9 support at all.


I see, that's great that J9 build is indeed catching real Lucene 
bugs!  +1 to keep running it in CI builds.


The errors someties happen are bugs, they might get better with
latest versions. I see there's no waslo a Java 20 version. I will
give it a try, too - especially regarding Panama (+ Vector). Want
to see how it behaves.

+1

Thanks Uwe.

Mike McCandless

http://blog.mikemccandless.com


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de


Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-11 Thread Ignacio Vera
Welcome Patrick!

On Sat, Nov 11, 2023 at 3:29 PM Uwe Schindler  wrote:

> Welcome Patrick!
>
> Uwe
>
>
> Am 10. November 2023 21:04:32 MEZ schrieb Michael McCandless <
> luc...@mikemccandless.com>:
>
>> I'm happy to announce that Patrick Zhai has accepted an invitation to
>> join the Lucene Project Management Committee (PMC)!
>>
>> Congratulations Patrick, thank you for all your hard work improving
>> Lucene's community and source code, and welcome aboard!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>