I think that different use cases benefit from or even require different
solutions. I think enabling options in Spark SQL is helpful, but allowing
some configurations to be done in SQLConf is also helpful.
For Cheng Pan's use case (to disable locality), I think providing a conf
(which can be added to spark-defaults.conf by a cluster admin) is useful.
For my customer's use case (https://github.com/apache/iceberg/pull/7790),
being able to set the write mode per Spark job (where right now it can only
be set as a table property) is useful. Allowing this to be done in the SQL
with an option/hint could also work, but as I understand it, Szehon's PR (
https://github.com/apache/spark/pull/416830) is only applicable to reads,
not writes.

- Wing Yew


On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote:

> Ryan, I understand that option should be job-specific, and introducing an
> OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame
> API does.
>
> My point is, some of the Iceberg options should not be job-specific.
>
> For example, Iceberg has an option “locality” which only allows setting at
> the job level, but Spark has a configuration
> “spark.shuffle.reduceLocality.enabled” which allows setting at the cluster
> level, this is a gap block Spark administers migrate to Iceberg because
> they can not disable it at the cluster level.
>
> So, what’s the principle in the Iceberg of classifying a configuration
> into SQLConf or OPTION?
>
> Thanks,
> Cheng Pan
>
>
>
>
> > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote:
> >
> > I would argue that the SQLConf way is more in line with Spark
> user/administrator habits.
> >
> > It’s a common practice that Spark administrators set configurations in
> spark-defaults.conf at the cluster level , and when the user has issues
> with their Spark SQL/Jobs, the first question they asked mostly is: can it
> be fixed by adding a spark configuration?
> >
> > The OPTIONS way brings additional learning efforts to Spark users and
> how can Spark administrators set them at cluster level?
> >
> > Thanks,
> > Cheng Pan
> >
> >
> >
> >
> >> On Jun 17, 2023, at 04:01, Wing Yew Poon <wyp...@cloudera.com.INVALID>
> wrote:
> >>
> >> Hi,
> >> I recently put up a PR, https://github.com/apache/iceberg/pull/7790,
> to allow the write mode (copy-on-write/merge-on-read) to be specified in
> SQLConf. The use case is explained in the PR.
> >> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733,
> to allow locality to be specified in SQLConf.
> >> In the recent past, https://github.com/apache/iceberg/pull/6838/ was a
> PR to allow the write distribution mode to be specified in SQLConf. This
> was merged.
> >> Cheng Pan asks if there is any guidance on when we should allow configs
> to be specified in SQLConf.
> >> Thanks,
> >> Wing Yew
> >>
> >> ps. The above open PRs could use reviews by committers.
> >>
> >
>
>

Reply via email to