Also, in the case of write mode (I mean write.delete.mode, write.update.mode, write.merge.mode), these cannot be set as options currently; they are only settable as table properties.
On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon <wyp...@cloudera.com> wrote: > I think that different use cases benefit from or even require different > solutions. I think enabling options in Spark SQL is helpful, but allowing > some configurations to be done in SQLConf is also helpful. > For Cheng Pan's use case (to disable locality), I think providing a conf > (which can be added to spark-defaults.conf by a cluster admin) is useful. > For my customer's use case (https://github.com/apache/iceberg/pull/7790), > being able to set the write mode per Spark job (where right now it can only > be set as a table property) is useful. Allowing this to be done in the SQL > with an option/hint could also work, but as I understand it, Szehon's PR ( > https://github.com/apache/spark/pull/416830) is only applicable to reads, > not writes. > > - Wing Yew > > > On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote: > >> Ryan, I understand that option should be job-specific, and introducing an >> OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame >> API does. >> >> My point is, some of the Iceberg options should not be job-specific. >> >> For example, Iceberg has an option “locality” which only allows setting >> at the job level, but Spark has a configuration >> “spark.shuffle.reduceLocality.enabled” which allows setting at the cluster >> level, this is a gap block Spark administers migrate to Iceberg because >> they can not disable it at the cluster level. >> >> So, what’s the principle in the Iceberg of classifying a configuration >> into SQLConf or OPTION? >> >> Thanks, >> Cheng Pan >> >> >> >> >> > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote: >> > >> > I would argue that the SQLConf way is more in line with Spark >> user/administrator habits. >> > >> > It’s a common practice that Spark administrators set configurations in >> spark-defaults.conf at the cluster level , and when the user has issues >> with their Spark SQL/Jobs, the first question they asked mostly is: can it >> be fixed by adding a spark configuration? >> > >> > The OPTIONS way brings additional learning efforts to Spark users and >> how can Spark administrators set them at cluster level? >> > >> > Thanks, >> > Cheng Pan >> > >> > >> > >> > >> >> On Jun 17, 2023, at 04:01, Wing Yew Poon <wyp...@cloudera.com.INVALID> >> wrote: >> >> >> >> Hi, >> >> I recently put up a PR, https://github.com/apache/iceberg/pull/7790, >> to allow the write mode (copy-on-write/merge-on-read) to be specified in >> SQLConf. The use case is explained in the PR. >> >> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733, >> to allow locality to be specified in SQLConf. >> >> In the recent past, https://github.com/apache/iceberg/pull/6838/ was >> a PR to allow the write distribution mode to be specified in SQLConf. This >> was merged. >> >> Cheng Pan asks if there is any guidance on when we should allow >> configs to be specified in SQLConf. >> >> Thanks, >> >> Wing Yew >> >> >> >> ps. The above open PRs could use reviews by committers. >> >> >> > >> >>