Re: allowing configs to be specified in SQLConf for Spark reads/writes

Cheng Pan Thu, 13 Jul 2023 01:04:42 -0700

Ryan, I understand that option should be job-specific, and introducing an 
OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame API 
does.


My point is, some of the Iceberg options should not be job-specific.

For example, Iceberg has an option “locality” which only allows setting at the 
job level, but Spark has a configuration “spark.shuffle.reduceLocality.enabled” 
which allows setting at the cluster level, this is a gap block Spark 
administers migrate to Iceberg because they can not disable it at the cluster 
level.

So, what’s the principle in the Iceberg of classifying a configuration into 
SQLConf or OPTION?

Thanks,
Cheng Pan




> On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote:
> 
> I would argue that the SQLConf way is more in line with Spark 
> user/administrator habits.
> 
> It’s a common practice that Spark administrators set configurations in 
> spark-defaults.conf at the cluster level , and when the user has issues with 
> their Spark SQL/Jobs, the first question they asked mostly is: can it be 
> fixed by adding a spark configuration?
> 
> The OPTIONS way brings additional learning efforts to Spark users and how can 
> Spark administrators set them at cluster level?
> 
> Thanks,
> Cheng Pan
> 
> 
> 
> 
>> On Jun 17, 2023, at 04:01, Wing Yew Poon <wyp...@cloudera.com.INVALID> wrote:
>> 
>> Hi,
>> I recently put up a PR, https://github.com/apache/iceberg/pull/7790, to 
>> allow the write mode (copy-on-write/merge-on-read) to be specified in 
>> SQLConf. The use case is explained in the PR.
>> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733, to 
>> allow locality to be specified in SQLConf.
>> In the recent past, https://github.com/apache/iceberg/pull/6838/ was a PR to 
>> allow the write distribution mode to be specified in SQLConf. This was 
>> merged.
>> Cheng Pan asks if there is any guidance on when we should allow configs to 
>> be specified in SQLConf.
>> Thanks,
>> Wing Yew
>> 
>> ps. The above open PRs could use reviews by committers.
>> 
>

Re: allowing configs to be specified in SQLConf for Spark reads/writes

Reply via email to