Re: [Discuss] Follow ANSI SQL on table insertion

2019-08-05 Thread Wenchen Fan
roduce invalid results. My intuition is yes, because >>> different users have different levels of tolerance for different kinds of >>> errors. I’d expect these sorts of configurations to be set up at an >>> infrastructure level, e.g. to maintain consistent standards throughout

Re: [Discuss] Follow ANSI SQL on table insertion

2019-08-05 Thread Gengliang Wang
t;> situations that could produce invalid results. My intuition is yes, because >>> different users have different levels of tolerance for different kinds of >>> errors. I’d expect these sorts of configurations to be set up at an >>> infrastructure level, e.g. to maintain co

Re: [Discuss] Follow ANSI SQL on table insertion

2019-08-05 Thread Ryan Blue
le organization. >> >> >> >> *From: *Gengliang Wang >> *Date: *Thursday, August 1, 2019 at 3:07 AM >> *To: *Marco Gaido >> *Cc: *Wenchen Fan , Hyukjin Kwon < >> gurwls...@gmail.com>, Russell Spitzer , Ryan >> Blue , Reynold Xin , Matt Che

Re: [Discuss] Follow ANSI SQL on table insertion

2019-08-05 Thread Wenchen Fan
m: *Gengliang Wang > *Date: *Thursday, August 1, 2019 at 3:07 AM > *To: *Marco Gaido > *Cc: *Wenchen Fan , Hyukjin Kwon , > Russell Spitzer , Ryan Blue , > Reynold Xin , Matt Cheah , > Takeshi Yamamuro , Spark dev list < > dev@spark.apache.org> > *Subject: *Re: [Discuss] Fo

Re: [Discuss] Follow ANSI SQL on table insertion

2019-08-02 Thread Matt Cheah
ido Cc: Wenchen Fan , Hyukjin Kwon , Russell Spitzer , Ryan Blue , Reynold Xin , Matt Cheah , Takeshi Yamamuro , Spark dev list Subject: Re: [Discuss] Follow ANSI SQL on table insertion Hi all, Let me explain a little bit on the proposal. By default, we follow the store assignmen

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-31 Thread Ryan Blue
arranted to do so. >> >> >> >> -Matt Cheah >> >> >> >> *From: *Reynold Xin >> *Date: *Wednesday, July 31, 2019 at 9:58 AM >> *To: *Matt Cheah >> *Cc: *Russell Spitzer , Takeshi Yamamuro < >> linguin@gmail.com>, Ge

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-31 Thread Reynold Xin
w...@databricks.com >, Ryan Blue < > >rb...@netflix.com > >, Spark dev list < dev@spark.apache.org >, Hyukjin Kwon < gurwls...@gmail.com > >, Wenchen Fan < cloud0...@gmail.com > > *Subject:* Re: [Discuss] Follow ANSI SQL on table insertion > > >

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-31 Thread Matt Cheah
Date: Wednesday, July 31, 2019 at 9:58 AM To: Matt Cheah Cc: Russell Spitzer , Takeshi Yamamuro , Gengliang Wang , Ryan Blue , Spark dev list , Hyukjin Kwon , Wenchen Fan Subject: Re: [Discuss] Follow ANSI SQL on table insertion Matt what do you mean by maximizing 3, while allowing not

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-31 Thread Reynold Xin
n Fan < cloud0...@gmail.com > > *Cc:* Russell Spitzer < russell.spit...@gmail.com >, Takeshi Yamamuro < > linguin@gmail.com > >, Gengliang Wang < gengliang.w...@databricks.com >, Ryan Blue < > >rb...@netflix.com > >, Spark dev list < dev@

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-31 Thread Matt Cheah
perhaps the behavior can be flagged by the destination writer at write time. -Matt Cheah From: Hyukjin Kwon Date: Monday, July 29, 2019 at 11:33 PM To: Wenchen Fan Cc: Russell Spitzer , Takeshi Yamamuro , Gengliang Wang , Ryan Blue , Spark dev list Subject: Re: [Discuss] Follow ANSI SQL

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Hyukjin Kwon
>From my look, +1 on the proposal, considering ASCI and other DBMSes in general. 2019년 7월 30일 (화) 오후 3:21, Wenchen Fan 님이 작성: > We can add a config for a certain behavior if it makes sense, but the most > important thing we want to reach an agreement here is: what should be the > default behavior

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Wenchen Fan
We can add a config for a certain behavior if it makes sense, but the most important thing we want to reach an agreement here is: what should be the default behavior? Let's explore the solution space of table insertion behavior first: At compile time, 1. always add cast 2. add cast following the A

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Russell Spitzer
I understand spark is making the decisions, i'm say the actual final effect of the null decision would be different depending on the insertion target if the target has different behaviors for null. On Mon, Jul 29, 2019 at 5:26 AM Wenchen Fan wrote: > > I'm a big -1 on null values for invalid cas

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-29 Thread Wenchen Fan
> I'm a big -1 on null values for invalid casts. This is why we want to introduce the ANSI mode, so that invalid cast fails at runtime. But we have to keep the null behavior for a while, to keep backward compatibility. Spark returns null for invalid cast since the first day of Spark SQL, we can't

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-27 Thread Russell Spitzer
I'm a big -1 on null values for invalid casts. This can lead to a lot of even more unexpected errors and runtime behavior since null is 1. Not allowed in all schemas (Leading to a runtime error anyway) 2. Is the same as delete in some systems (leading to data loss) And this would be dependent on

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-27 Thread Takeshi Yamamuro
Hi, all +1 for implementing this new store cast mode. >From a viewpoint of DBMS users, this cast is pretty common for INSERTs and I think this functionality could promote migrations from existing DBMSs to Spark. The most important thing for DBMS users is that they could optionally choose this mod

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-27 Thread Gengliang Wang
Hi Ryan, Thanks for the suggestions on the proposal and doc. Currently, there is no data type validation in table insertion of V1. We are on the same page that we should improve it. But using UpCast is from one extreme to another. It is possible that many queries are broken after upgrading to Spar

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-26 Thread Wenchen Fan
I don't agree with handling literal values specially. Although Postgres does it, I can't find anything about it in the SQL standard. And it introduces inconsistent behaviors which may be strange to users: * What about something like "INSERT INTO t SELECT float_col + 1.1"? * The same insert with a d

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-26 Thread Ryan Blue
I don’t think this is a good idea. Following the ANSI standard is usually fine, but here it would *silently corrupt data*. >From your proposal doc, ANSI allows implicitly casting from long to int (any numeric type to any other numeric type) and inserts NULL when a value overflows. That would drop

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-25 Thread Wenchen Fan
I have heard about many complaints about the old table insertion behavior. Blindly casting everything will leak the user mistake to a late stage of the data pipeline, and make it very hard to debug. When a user writes string values to an int column, it's probably a mistake and the columns are misor