Sure, the prs are https://github.com/apache/spark/pull/44119 (merge),
https://github.com/apache/spark/pull/47233 (update), and delete in progress.
Thanks
Szehon
On Tue, Jul 9, 2024 at 10:27 PM Wing Yew Poon
wrote:
> Hi Szehon,
> Thanks for the update.
> Can you please point me to the work on su
Hi Szehon,
Thanks for the update.
Can you please point me to the work on supporting DELETE/UPDATE/MERGE in
the DataFrame API?
Thanks,
Wing Yew
On Tue, Jul 9, 2024 at 10:05 PM Szehon Ho wrote:
> Hi,
>
> Just FYI, good news, this change is merged on the Spark side :
> https://github.com/apache/sp
Hi,
Just FYI, good news, this change is merged on the Spark side :
https://github.com/apache/spark/pull/46707 (its the third effort!). In
next version of Spark, we will be able to pass read properties via SQL to a
particular Iceberg table such as
SELECT * FROM iceberg.db.table1 WITH (`locality`
We are talking about DELETE/UPDATE/MERGE operations. There is only SQL
support for these operations. There is no DataFrame API support for them.*
Therefore write options are not applicable. Thus SQLConf is the only
available mechanism I can use to override the table property.
For reference, we curr
I think we should aim to have the same behavior across properties that are
set in SQL conf, table config, and write options. Having SQL conf override
table config for this doesn't make sense to me. If the need is to override
table configuration, then write options are the right way to do it.
On We
I was on vacation.
Currently, write modes (copy-on-write/merge-on-read) can only be set as
table properties, and default to copy-on-write. We have a customer who
wants to use copy-on-write for certain Spark jobs that write to some
Iceberg table and merge-on-read for other Spark jobs writing to the
Yes, I agree that there is value for administrators from having some things
exposed as Spark SQL configuration. That gets much harder when you want to
use the SQLConf for table-level settings, though. For example, the target
split size is something that was an engine setting in the Hadoop world,
ev
Also, in the case of write mode (I mean write.delete.mode,
write.update.mode, write.merge.mode), these cannot be set as options
currently; they are only settable as table properties.
On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon wrote:
> I think that different use cases benefit from or even requ
I think that different use cases benefit from or even require different
solutions. I think enabling options in Spark SQL is helpful, but allowing
some configurations to be done in SQLConf is also helpful.
For Cheng Pan's use case (to disable locality), I think providing a conf
(which can be added t
Ryan, I understand that option should be job-specific, and introducing an
OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame API
does.
My point is, some of the Iceberg options should not be job-specific.
For example, Iceberg has an option “locality” which only allows set
Also +1 for OPTIONS hints. It is useful to allow some options to be
specified in SQLConf.
On Thu, Jul 6, 2023 at 1:05 AM Ryan Blue wrote:
> Cheng, that's true of certain options that are targeted at administrators.
> But the DataFrameReader or DataFrameWriter options are job-specific, which
> i
Cheng, that's true of certain options that are targeted at administrators.
But the DataFrameReader or DataFrameWriter options are job-specific, which
is why a hint makes the most sense.
On Wed, Jul 5, 2023 at 1:26 AM Cheng Pan wrote:
> I would argue that the SQLConf way is more in line with Spar
I would argue that the SQLConf way is more in line with Spark
user/administrator habits.
It’s a common practice that Spark administrators set configurations in
spark-defaults.conf at the cluster level , and when the user has issues with
their Spark SQL/Jobs, the first question they asked mostly
+1 for adding OPTIONS hints. If we can do that in SQL extensions then that
makes sense to me for the existing Spark versions that don't support it.
On Mon, Jun 26, 2023 at 11:18 AM Szehon Ho wrote:
> Hi,
>
> Yea that sounds good to me.
>
> Btw, that being said, I'm not opposed to configuring som
Hi,
Yea that sounds good to me.
Btw, that being said, I'm not opposed to configuring some of options in the
thread, especially write options, as sql conf either. (Not sure this
mechanism can support write conf without some changes to parser). And in
any case, it could be cascading: sql_dynamic_
If the Spark community doesn’t accept this solution, how about adding it as
an extension in Iceberg? I’m also wondering what people here think about it.
Thanks for reviving the effort.
Manu
Szehon Ho 于2023年6月22日 周四00:45写道:
> Hi,
>
> Yea, its definitely an issue.
>
> Fwiw, I was looking at revivi
Hi,
Yea, its definitely an issue.
Fwiw, I was looking at reviving the old effort in Spark to pass in configs
dynamically in Spark SQL statement, which is probably the cleanest
solution. (https://github.com/apache/spark/pull/34072 was the old effort,
and I made https://github.com/apache/spark/pul
Hi,
I recently put up a PR, https://github.com/apache/iceberg/pull/7790, to
allow the write mode (copy-on-write/merge-on-read) to be specified in
SQLConf. The use case is explained in the PR.
Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733, to
allow locality to be specified in
18 matches
Mail list logo