Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Wenchen Fan Mon, 29 Apr 2024 19:47:12 -0700

@Mich Talebzadeh <mich.talebza...@gmail.com> there seems to be a
misunderstanding here. The Spark native data source table is still stored
in the Hive metastore, it's just that Spark will use a different (and
faster) reader/writer for it. `hive-site.xml` should work as it is today.


On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon <gurwls...@apache.org> wrote:

> +1
>
> It's a legacy conf that we should eventually remove it away. Spark should
> create Spark table by default, not Hive table.
>
> Mich, for your workload, you can simply switch that conf off if it
> concerns you. We also enabled ANSI as well (that you agreed on). It's a bit
> akwakrd to stop in the middle for this compatibility reason during making
> Spark sound. The compatibility has been tested in production for a long
> time so I don't see any particular issue about the compatibility case you
> mentioned.
>
> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>>
>> Hi @Wenchen Fan <cloud0...@gmail.com>
>>
>> Thanks for your response. I believe we have not had enough time to
>> "DISCUSS" this matter.
>>
>> Currently in order to make Spark take advantage of Hive, I create a soft
>> link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is 3.1.1
>>
>>  /opt/spark/conf/hive-site.xml ->
>> /data6/hduser/hive-3.1.1/conf/hive-site.xml
>>
>> This works fine for me in my lab. So in the future if we opt to use the
>> setting "spark.sql.legacy.createHiveTableByDefault" to False, there will
>> not be a need for this logical link.?
>> On the face of it, this looks fine but in real life it may require a
>> number of changes to the old scripts. Hence my concern.
>> As a matter of interest has anyone liaised with the Hive team to ensure
>> they have introduced the additional changes you outlined?
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Sun, 28 Apr 2024 at 09:34, Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>>> @Mich Talebzadeh <mich.talebza...@gmail.com> thanks for sharing your
>>> concern!
>>>
>>> Note: creating Spark native data source tables is usually Hive
>>> compatible as well, unless we use features that Hive does not support
>>> (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to
>>> create Spark native table in this case, instead of creating Hive table and
>>> fail.
>>>
>>> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan <pan3...@gmail.com> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Thanks,
>>>> Cheng Pan
>>>>
>>>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau <holden.ka...@gmail.com>
>>>> wrote:
>>>> >
>>>> > +1
>>>> >
>>>> > Twitter: https://twitter.com/holdenkarau
>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> >
>>>> >
>>>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh <vii...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> +1
>>>> >>
>>>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun <dongj...@apache.org>
>>>> wrote:
>>>> >> >
>>>> >> > I'll start with my +1.
>>>> >> >
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
>>>> >> > > Please vote on SPARK-46122 to set
>>>> spark.sql.legacy.createHiveTableByDefault
>>>> >> > > to `false` by default. The technical scope is defined in the
>>>> following PR.
>>>> >> > >
>>>> >> > > - DISCUSSION:
>>>> >> > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
>>>> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
>>>> >> > > - PR: https://github.com/apache/spark/pull/46207
>>>> >> > >
>>>> >> > > The vote is open until April 30th 1AM (PST) and passes
>>>> >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1
>>>> votes.
>>>> >> > >
>>>> >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by
>>>> default
>>>> >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault
>>>> because ...
>>>> >> > >
>>>> >> > > Thank you in advance.
>>>> >> > >
>>>> >> > > Dongjoon
>>>> >> > >
>>>> >> >
>>>> >> >
>>>> ---------------------------------------------------------------------
>>>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Reply via email to