Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

Jungtaek Lim Tue, 12 May 2020 01:34:01 -0700

It's not only for end users, but also for us. Spark itself uses the config
"true" and "false" in tests and it still brings confusion. We still have to
deal with both situations.


I'm wondering how long days it would be needed to revert it cleanly, but if
we worry about the amount of code change just around the new RC, what about
make the code dirty (should be fixed soon) but less headache via applying
traditional (and bad) way?

Let's just remove the config so that the config cannot be used in any way
(even in Spark codebase), and set corresponding field in parser to the
constant value so that no one can modify in any way. This would make the
dead code by intention which should be cleaned it up later, so let's add
FIXME comment there so that anyone can take it up for cleaning up the code
later. (If no one volunteers then I'll probably pick up.)

That is a bad pattern, but still better as we prevent end users (even early
adopters) go through the undocumented path in any way, and that will be
explicitly marked as "should be fixed". This is different from retaining
config - I don't expect unified create table syntax will be landed in
bugfix version, so even unified create table syntax can be landed in 3.1.0
(this is also not guaranteed) the config will live in 3.0.x in any way. If
we temporarily go dirty way then we can clean up the code in any version,
even from bugfix version, maybe within a couple of weeks just after 3.0.0
is released.

Does it sound valid?

On Tue, May 12, 2020 at 2:35 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> SPARK-30098 was merged about 6 months ago. It's not a clean revert and we
> may need to spend quite a bit of time to resolve conflicts and fix tests.
>
> I don't see why it's still a problem if a feature is disabled and hidden
> from end-users (it's undocumented, the config is internal). The related
> code will be replaced in the master branch sooner or later, when we unify
> the syntaxes.
>
>
>
> On Tue, May 12, 2020 at 6:16 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> I'm all for getting the unified syntax into master. The only issue
>> appears to be whether or not to pass the presence of the EXTERNAL keyword
>> through to a catalog in v2. Maybe it's time to start a discuss thread for
>> that issue so we're not stuck for another 6 weeks on it.
>>
>> On Mon, May 11, 2020 at 3:13 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Btw another wondering here is, is it good to retain the flag on master
>>> as an intermediate step? Wouldn't it be better for us to start "unified
>>> create table syntax" from scratch?
>>>
>>>
>>> On Tue, May 12, 2020 at 6:50 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> I'm sorry, but I have to agree with Ryan and Russell. I chose the
>>>> option 1 because it's less worse than option 2, but it doesn't mean I fully
>>>> agree with option 1.
>>>>
>>>> Let's make below things clear if we really go with option 1, otherwise
>>>> please consider reverting it.
>>>>
>>>> * Do you fully indicate about "all" the paths where the second create
>>>> table syntax is taken?
>>>> * Could you explain "why" to end users without any confusion? Do you
>>>> think end users will understand it easily?
>>>> * Do you have an actual end users to guide to turn this on? Or do you
>>>> have a plan to turn this on for your team/customers and deal with
>>>> the ambiguity?
>>>> * Could you please document about how things will change if the flag is
>>>> turned on?
>>>>
>>>> I guess the option 1 is to leave a flag as "undocumented" one and
>>>> forget about the path to turn on, but I think that would lead to make the
>>>> feature be "broken window" even we are not able to touch.
>>>>
>>>> On Tue, May 12, 2020 at 6:45 AM Russell Spitzer <
>>>> russell.spit...@gmail.com> wrote:
>>>>
>>>>> I think reverting 30098 is the right decision here if we want to
>>>>> unblock 3.0. We shouldn't ship with features which we know do not function
>>>>> in the way we intend, regardless of how little exposure most users have to
>>>>> them. Even if it's off my default, we should probably work to avoid
>>>>> switches that cause things to behave unpredictably or require a flow chart
>>>>> to actually determine what will happen.
>>>>>
>>>>> On Mon, May 11, 2020 at 3:07 PM Ryan Blue <rb...@netflix.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> I'm all for fixing behavior in master by turning this off as an
>>>>>> intermediate step, but I don't think that Spark 3.0 can safely include
>>>>>> SPARK-30098.
>>>>>>
>>>>>> The problem is that SPARK-30098 introduces strange behavior, as
>>>>>> Jungtaek pointed out. And that behavior is not fully understood. While
>>>>>> working on a unified CREATE TABLE syntax, I hit additional test
>>>>>> failures
>>>>>> <https://github.com/apache/spark/pull/28026#issuecomment-606967363>
>>>>>> where the wrong create path was being used.
>>>>>>
>>>>>> Unless we plan to NOT support the behavior
>>>>>> when spark.sql.legacy.createHiveTableByDefault.enabled is disabled, we
>>>>>> should not ship Spark 3.0 with SPARK-30098. Otherwise, we will have to 
>>>>>> deal
>>>>>> with this problem for years to come.
>>>>>>
>>>>>> On Mon, May 11, 2020 at 1:06 AM JackyLee <qcsd2...@163.com> wrote:
>>>>>>
>>>>>>> +1. Agree with Xiao Li and Jungtaek Lim.
>>>>>>>
>>>>>>> This seems to be controversial, and can not be done in a short time.
>>>>>>> It is
>>>>>>> necessary to choose option 1 to unblock Spark 3.0 and support it in
>>>>>>> 3.1.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

Reply via email to