Hi all,

I'd like to bring this up again to share the status and get more feedback.
Currently, we all agree to unify the CREATE TABLE syntax by merging the
native and Hive-style syntaxes.

The unified CREATE TABLE syntax will become the native syntax and there is
no Hive-style syntax anymore. This brings several changes:
1. support PARTITION BY (col type, ...). This can't co-exist with PARTITION
BY (col, ...), and simply adds partition columns to the end.
2. support SKEWED BY, which just fails
3. support STORE AS/BY, which can't co-exist with USING provider
4. support EXTERNAL as well

All the behaviors will remain the same as before, for the builtin catalog.
However, the native CREATE TABLE syntax needs to support the v2 CreateTable
command and we need to translate the new syntax changes to catalog plugin
API calls, and we are still working on reaching an agreement about how to
do it.

To unblock 3.0, I think there are two choices:
1. Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default,
which effectively revert SPARK-30098. The CREATE TABLE syntax is still
confusing but it's the same as 2.4
2. Do not support the v2 CreateTable command if STORE AS/BY or EXTERNAL is
specified. This gives us more time to think about how to do it in 3.1.

If you have other ideas, please reply to this thread.

Thanks,
Wenchen

On Thu, Mar 26, 2020 at 7:28 AM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Thanks, filed SPARK-31257
> <https://issues.apache.org/jira/browse/SPARK-31257>. Thanks again for
> looking into this - I'll take a look whenever I get time sooner.
>
> On Thu, Mar 26, 2020 at 8:06 AM Ryan Blue <rb...@netflix.com> wrote:
>
>> Feel free to open another issue, I just used that one since it describes
>> this and doesn't appear to be done.
>>
>> On Wed, Mar 25, 2020 at 4:03 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> UPDATE: Sorry I just missed the PR (
>>> https://github.com/apache/spark/pull/28026). I still think it'd be nice
>>> to avoid recycling the JIRA issue which was resolved before. Shall we have
>>> a new JIRA issue with linking to SPARK-30098, and set proper priority?
>>>
>>> On Thu, Mar 26, 2020 at 7:59 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> Would it be better to prioritize this to make sure the change is
>>>> included in Spark 3.0? (Maybe filing an issue and set as a blocker)
>>>>
>>>> Looks like there's consensus that SPARK-30098 brought ambiguous issue
>>>> which should be fixed (though the consideration of severity seems to be
>>>> different), and once we notice the issue it would be really odd if we
>>>> publish it as it is, and try to fix it later (the fix may not be even
>>>> included in 3.0.x as it might bring behavioral change).
>>>>
>>>> On Tue, Mar 24, 2020 at 3:37 PM Wenchen Fan <cloud0...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ryan,
>>>>>
>>>>> It's great to hear that you are cleaning up this long-standing mess.
>>>>> Please let me know if you hit any problems that I can help with.
>>>>>
>>>>> Thanks,
>>>>> Wenchen
>>>>>
>>>>> On Sat, Mar 21, 2020 at 3:16 AM Nicholas Chammas <
>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>
>>>>>> On Thu, Mar 19, 2020 at 3:46 AM Wenchen Fan <cloud0...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> 2. PARTITIONED BY colTypeList: I think we can support it in the
>>>>>>> unified syntax. Just make sure it doesn't appear together with 
>>>>>>> PARTITIONED
>>>>>>> BY transformList.
>>>>>>>
>>>>>>
>>>>>> Another side note: Perhaps as part of (or after) unifying the CREATE
>>>>>> TABLE syntax, we can also update Catalog.createTable() to support
>>>>>> creating partitioned tables
>>>>>> <https://issues.apache.org/jira/browse/SPARK-31001>.
>>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Reply via email to