Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Nimrod Ofek Thu, 25 Apr 2024 04:31:02 -0700

I will also appreciate some material that describes the differences between
Spark native tables vs hive tables and why each should be used...


Thanks
Nimrod

בתאריך יום ה׳, 25 באפר׳ 2024, 14:27, מאת Mich Talebzadeh ‏<
[email protected]>:

> I see a statement made as below  and I quote
>
> "The proposal of SPARK-46122 is to switch the default value of this
> configuration from `true` to `false` to use Spark native tables because
> we support better."
>
> Can you please elaborate on the above specifically with regard to the
> phrase ".. because
> we support better."
>
> Are you referring to the performance of Spark catalog (I believe it is
> internal) or integration with Spark?
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <[email protected]> wrote:
>
>> +1
>>
>> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <[email protected]> wrote:
>>
>>> +1
>>>
>>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444.
>>>
>>> Thanks,
>>> Kent Yao
>>>
>>> Dongjoon Hyun <[email protected]> 于2024年4月25日周四 14:39写道：
>>> >
>>> > Hi, All.
>>> >
>>> > It's great to see community activities to polish 4.0.0 more and more.
>>> > Thank you all.
>>> >
>>> > I'd like to bring SPARK-46122 (another SQL topic) to you from the
>>> subtasks
>>> > of SPARK-44444 (Prepare Apache Spark 4.0.0),
>>> >
>>> > - https://issues.apache.org/jira/browse/SPARK-46122
>>> >    Set `spark.sql.legacy.createHiveTableByDefault` to `false` by
>>> default
>>> >
>>> > This legacy configuration is about `CREATE TABLE` SQL syntax without
>>> > `USING` and `STORED AS`, which is currently mapped to `Hive` table.
>>> > The proposal of SPARK-46122 is to switch the default value of this
>>> > configuration from `true` to `false` to use Spark native tables because
>>> > we support better.
>>> >
>>> > In other words, Spark will use the value of `spark.sql.sources.default`
>>> > as the table provider instead of `Hive` like the other Spark APIs. Of
>>> course,
>>> > the users can get all the legacy behavior by setting back to `true`.
>>> >
>>> > Historically, this behavior change was merged once at Apache Spark
>>> 3.0.0
>>> > preparation via SPARK-30098 already, but reverted during the 3.0.0 RC
>>> period.
>>> >
>>> > 2019-12-06: SPARK-30098 Use default datasource as provider for CREATE
>>> TABLE
>>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as
>>> >             provider for CREATE TABLE command
>>> >
>>> > At Apache Spark 3.1.0, we had another discussion about this and
>>> defined it
>>> > as one of legacy behavior via this configuration via reused ID,
>>> SPARK-30098.
>>> >
>>> > 2020-12-01:
>>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
>>> > 2020-12-03: SPARK-30098 Add a configuration to use default datasource
>>> as
>>> >             provider for CREATE TABLE command
>>> >
>>> > Last year, we received two additional requests twice to switch this
>>> because
>>> > Apache Spark 4.0.0 is a good time to make a decision for the future
>>> direction.
>>> >
>>> > 2023-02-27: SPARK-42603 as an independent idea.
>>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea
>>> >
>>> >
>>> > WDYT? The technical scope is defined in the following PR which is one
>>> line of main
>>> > code, one line of migration guide, and a few lines of test code.
>>> >
>>> > - https://github.com/apache/spark/pull/46207
>>> >
>>> > Dongjoon.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Reply via email to