Re: allowing configs to be specified in SQLConf for Spark reads/writes

Wing Yew Poon Fri, 14 Jul 2023 18:09:48 -0700

Also, in the case of write mode (I mean write.delete.mode,
write.update.mode, write.merge.mode), these cannot be set as options
currently; they are only settable as table properties.


On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon <wyp...@cloudera.com> wrote:

> I think that different use cases benefit from or even require different
> solutions. I think enabling options in Spark SQL is helpful, but allowing
> some configurations to be done in SQLConf is also helpful.
> For Cheng Pan's use case (to disable locality), I think providing a conf
> (which can be added to spark-defaults.conf by a cluster admin) is useful.
> For my customer's use case (https://github.com/apache/iceberg/pull/7790),
> being able to set the write mode per Spark job (where right now it can only
> be set as a table property) is useful. Allowing this to be done in the SQL
> with an option/hint could also work, but as I understand it, Szehon's PR (
> https://github.com/apache/spark/pull/416830) is only applicable to reads,
> not writes.
>
> - Wing Yew
>
>
> On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote:
>
>> Ryan, I understand that option should be job-specific, and introducing an
>> OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame
>> API does.
>>
>> My point is, some of the Iceberg options should not be job-specific.
>>
>> For example, Iceberg has an option “locality” which only allows setting
>> at the job level, but Spark has a configuration
>> “spark.shuffle.reduceLocality.enabled” which allows setting at the cluster
>> level, this is a gap block Spark administers migrate to Iceberg because
>> they can not disable it at the cluster level.
>>
>> So, what’s the principle in the Iceberg of classifying a configuration
>> into SQLConf or OPTION?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>>
>> > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote:
>> >
>> > I would argue that the SQLConf way is more in line with Spark
>> user/administrator habits.
>> >
>> > It’s a common practice that Spark administrators set configurations in
>> spark-defaults.conf at the cluster level , and when the user has issues
>> with their Spark SQL/Jobs, the first question they asked mostly is: can it
>> be fixed by adding a spark configuration?
>> >
>> > The OPTIONS way brings additional learning efforts to Spark users and
>> how can Spark administrators set them at cluster level?
>> >
>> > Thanks,
>> > Cheng Pan
>> >
>> >
>> >
>> >
>> >> On Jun 17, 2023, at 04:01, Wing Yew Poon <wyp...@cloudera.com.INVALID>
>> wrote:
>> >>
>> >> Hi,
>> >> I recently put up a PR, https://github.com/apache/iceberg/pull/7790,
>> to allow the write mode (copy-on-write/merge-on-read) to be specified in
>> SQLConf. The use case is explained in the PR.
>> >> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733,
>> to allow locality to be specified in SQLConf.
>> >> In the recent past, https://github.com/apache/iceberg/pull/6838/ was
>> a PR to allow the write distribution mode to be specified in SQLConf. This
>> was merged.
>> >> Cheng Pan asks if there is any guidance on when we should allow
>> configs to be specified in SQLConf.
>> >> Thanks,
>> >> Wing Yew
>> >>
>> >> ps. The above open PRs could use reviews by committers.
>> >>
>> >
>>
>>

Re: allowing configs to be specified in SQLConf for Spark reads/writes

Reply via email to