Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-05 Thread Ryan Blue
Cheng, that's true of certain options that are targeted at administrators. But the DataFrameReader or DataFrameWriter options are job-specific, which is why a hint makes the most sense. On Wed, Jul 5, 2023 at 1:26 AM Cheng Pan wrote: > I would argue that the SQLConf way is more in line with Spar

[PROPOSAL] Preparing first Apache Iceberg Summit

2023-07-05 Thread Jean-Baptiste Onofré
Hi everyone, I started a discussion on the private mailing list, and, as there are no objections from the PMC members, I'm moving the thread to the dev mailing list. I propose to organize the first Apache Iceberg Summit \o/ For the format, I think the best option is a virtual event with a mix of

Re: Ad-hoc partition bucketing

2023-07-05 Thread russell . spitzer
We have been discussing something like this as well, either an arbitrary partitioning scheme or just a more extensive and customizable transform. An example I’m interested in is a geo hash index where we store offsets on a large grid to denote partitions. The total offset file for the whole plan

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-05 Thread Cheng Pan
I would argue that the SQLConf way is more in line with Spark user/administrator habits. It’s a common practice that Spark administrators set configurations in spark-defaults.conf at the cluster level , and when the user has issues with their Spark SQL/Jobs, the first question they asked mostly