Bumping on this. Again, this is a blocker for Spark 4.0.0. I will consider
this as "lazy consensus" if there are no objections for 3 days from
initiation of the thread.

On Tue, Mar 4, 2025 at 2:15 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Hi dev,
>
> This is a spin-up of the original thread "Deprecating and banning
> `spark.databricks.*` config from Apache Spark repository". (link
> <https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0ndfw2jd>)
>
> From the original thread, we decided to deprecate the config in Spark
> 3.5.5 and remove the config in Spark 4.0.0. That thread did not decide one
> thing, about smooth migration logic.
>
> We "persist" the config into offset log for streaming query since the
> value of the config must be consistent during the lifecycle of the query.
> This means, the problematic config is already persisted for streaming query
> which ever ran with Spark 3.5.4.
>
> For the migration logic, we re-assign the value of the problematic config
> to the new config. This happens when the query is restarted, and it will be
> reflected into an offset log for "newer batch" so after a couple new
> microbatches the migration logic isn't needed. This migration logic is
> shipped in Spark 3.5.5, so once the query is run with Spark 3.5.5 for a
> couple microbatches, it will be mitigated.
>
> But I would say that there will always be a case that users just bump the
> minor/major version without following all the bugfix versions. I think it
> is still dangerous to remove the migration logic in Spark 4.0.0 (and
> probably Spark 4.1.0, depending on the discussion). From the migration
> logic, the problematic config is just a "string", and users wouldn't be
> able to set the value with the problematic config name. We don't document
> this, as it'll be done automatically.
>
> That said, I'd propose to have migration logic for Spark 4.0 version line
> (at minimum, 4.1 is debatable). This will give a safer and less burden
> migration path for users with just retaining a problematic "string" (again,
> not a config).
>
> I'd love to hear the community's voice on this. I'd like to remind you,
> this is a blocker for Spark 4.0.0.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>

Reply via email to