I think that different use cases benefit from or even require different solutions. I think enabling options in Spark SQL is helpful, but allowing some configurations to be done in SQLConf is also helpful. For Cheng Pan's use case (to disable locality), I think providing a conf (which can be added to spark-defaults.conf by a cluster admin) is useful. For my customer's use case (https://github.com/apache/iceberg/pull/7790), being able to set the write mode per Spark job (where right now it can only be set as a table property) is useful. Allowing this to be done in the SQL with an option/hint could also work, but as I understand it, Szehon's PR ( https://github.com/apache/spark/pull/416830) is only applicable to reads, not writes.
- Wing Yew On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote: > Ryan, I understand that option should be job-specific, and introducing an > OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame > API does. > > My point is, some of the Iceberg options should not be job-specific. > > For example, Iceberg has an option “locality” which only allows setting at > the job level, but Spark has a configuration > “spark.shuffle.reduceLocality.enabled” which allows setting at the cluster > level, this is a gap block Spark administers migrate to Iceberg because > they can not disable it at the cluster level. > > So, what’s the principle in the Iceberg of classifying a configuration > into SQLConf or OPTION? > > Thanks, > Cheng Pan > > > > > > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote: > > > > I would argue that the SQLConf way is more in line with Spark > user/administrator habits. > > > > It’s a common practice that Spark administrators set configurations in > spark-defaults.conf at the cluster level , and when the user has issues > with their Spark SQL/Jobs, the first question they asked mostly is: can it > be fixed by adding a spark configuration? > > > > The OPTIONS way brings additional learning efforts to Spark users and > how can Spark administrators set them at cluster level? > > > > Thanks, > > Cheng Pan > > > > > > > > > >> On Jun 17, 2023, at 04:01, Wing Yew Poon <wyp...@cloudera.com.INVALID> > wrote: > >> > >> Hi, > >> I recently put up a PR, https://github.com/apache/iceberg/pull/7790, > to allow the write mode (copy-on-write/merge-on-read) to be specified in > SQLConf. The use case is explained in the PR. > >> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733, > to allow locality to be specified in SQLConf. > >> In the recent past, https://github.com/apache/iceberg/pull/6838/ was a > PR to allow the write distribution mode to be specified in SQLConf. This > was merged. > >> Cheng Pan asks if there is any guidance on when we should allow configs > to be specified in SQLConf. > >> Thanks, > >> Wing Yew > >> > >> ps. The above open PRs could use reviews by committers. > >> > > > >