I came up with the PR - https://github.com/apache/spark/pull/50314

Maybe it's less performant than the original one since we no longer do get
directly and we iterate the keys, but hopefully, offset log is expected to
maintain a static number of configs (subset of SQL config), and it's just
10-ish which should be trivial to iterate.

Since the Spark 4.0.0 release was blocked for a long time due to the recent
discussion and vote, I would respect the decision from release manager of
Spark 4.0.0 to reflect this in 4.0.0 or we go with original PR. We could
miss Spark 4.0.0 train, but IMHO it should be OK as long as we address this
back in Spark 4.0.1 and higher.


On Sun, Mar 16, 2025 at 5:24 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Yeah... maybe 1 is simpler if there is no side effect, and probably the
> latter pattern we have is long enough to figure out aliases without full
> text matching.
>
> On Sun, Mar 16, 2025 at 5:15 PM Mark Hamstra <markhams...@gmail.com>
> wrote:
>
>> Doing something like pattern matching on
>> u0064\u0061\u0074\u0061\u0062\u0072\u0069\u0063\u006b\u0073 instead of
>> “databricks” might also be an option if including “databricks” in the code
>> is believed to be so offensive.
>>
>>
>> On Sun, Mar 16, 2025 at 12:52 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi dev,
>>>
>>> I'm really tired of the discussion which does not move forward because
>>> the argument is not backed by strict ASF policy. We debate based on
>>> the interpretation of ASF policy by individuals, which I think makes zero
>>> sense.
>>>
>>> I really thought about this a lot how to resolve this, and now I'm open
>>> to have a hack on the migration logic to eliminate the main concern on the
>>> debate, which I can think like following:
>>>
>>> 1) Checks the config string via pattern, like the string contains
>>> ".optimizer.pruneFiltersCanPruneStreamingSubplan" at the end and the string
>>> is longer than "spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan"
>>> (or strictly 12 chars longer).
>>>
>>> 2) Encode the incorrect config name with any hashing algorithm, and use
>>> this to compare with. We do not document about the origin string, but
>>> probably leave the offending ticket to at least figure out when we forgot
>>> about  what was the origin string by ourselves when we ever debug this.
>>>
>>> 3) etcetc (I'm open for better ideas, except just removing the migration
>>> logic.)
>>>
>>> In overall, indirect comparison.
>>>
>>> This definitely complicates the logic, but the logic is really just 4
>>> lines to begin with, and maybe it will be just 10 lines or so, so it's
>>> still something manageable.
>>>
>>> It might be slightly tricky on testing, but this is a lot easier than
>>> debating in there. I'm now regretting not allowing myself to introduce a
>>> hack in earlier days - we should have saved 3 weeks.
>>>
>>> If we are OK to allow a bit of indirect checking on the logic to just
>>> remove out the debate, I'm happy to do that. It's obvious that we can just
>>> leave this migration logic much longer than what I was proposing, because
>>> we eliminate the main concern.
>>>
>>> I'm open to hear about support and objections.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>

Reply via email to