dweiss commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2720601955
I think this will work just fine in most cases and is a rather inexpensive
way to implement this case-insensitive matching, but this comes at the cost of
the output automaton that may not be minimal. Consider this example:
```
List<BytesRef> terms = new ArrayList<>(List.of(
newBytesRef("abc"),
newBytesRef("aBC")));
Collections.sort(terms);
Automaton a = build(terms, false, false);
```
which produces:

However, when you naively expand just the transitions for each letter
variant, you get this:

which clearly isn't minimal (and doesn't pass checkMinimized).
I think the absolutely worst case is for the automaton to double the number
of transitions - the number of states remains the same. So it's not like it's
going to expand uncontrollably... But it's no longer minimal. Perhaps this is
acceptable, given the constrained worst case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]