Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Ryan Blue
I don’t think Spark ever claims to be 100% Hive compatible. I just found some relevant documentation on this, where Databricks claims that “Apache Spark SQL in Databricks is designed to be compat

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Ryan Blue
I don’t think Spark ever claims to be 100% Hive compatible. By accepting the EXTERNAL keyword in some circumstances, Spark is providing compatibility with Hive DDL. Yes, there are places where it breaks. The question is whether we should deliberately break what a Hive catalog could implement, when

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Wenchen Fan
> I have some hive queries that I want to run on Spark. Spark is not compatible with Hive in many places. Decoupling EXTERNAL and LOCATION can't help you too much here. If you do have this use case, we need a much wider discussion about how to achieve it. For this particular topic, we need concre

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Ryan Blue
how about LOCATION without EXTERNAL? Currently Spark treats it as an external table. I think there is some confusion about what Spark has to handle. Regardless of what Spark allows as DDL, these tables can exist in a Hive MetaStore that Spark connects to, and the general expectation is that Spark

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Holden Karau
On Wed, Oct 7, 2020 at 9:57 AM Wenchen Fan wrote: > I don't think Hive compatibility itself is a "use case". > Ok let's add on top of this: I have some hive queries that I want to run on Spark. I believe that makes it a use case. > The Nessie example you m

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Wenchen Fan
I don't think Hive compatibility itself is a "use case". The Nessie example you mentioned is a reasonable use case to me: some frameworks/applications want to create external tables without user-specified location, so that they can manage the table directory

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Ryan Blue
Wenchen, why are you ignoring Hive as a “reasonable use case”? The keyword came from Hive and we all agree that a Hive catalog with Hive behavior can’t be implemented if Spark chooses to couple this with LOCATION. Why is this use case not a justification? Also, the option to keep behavior the sam

Re: Official support of CREATE EXTERNAL TABLE

2020-10-07 Thread Wenchen Fan
> As someone who's had the job of porting different SQL dialects to Spark, I'm also very much in favor of keeping EXTERNAL Just to be clear: no one is proposing to remove EXTERNAL. The 2 options we are discussing are: 1. Keep the behavior the same as before, i.e. EXTERNAL must co-exists with LOCAT

Re: Official support of CREATE EXTERNAL TABLE

2020-10-06 Thread Russell Spitzer
I don't feel differently than I did on the thread linked above, I think treating "External" as a table option is still the safest way to go about things. For the Cassandra catalog this option wouldn't appear on our whitelist of allowed options, the same as "path" and other options that don't apply

Re: Official support of CREATE EXTERNAL TABLE

2020-10-06 Thread Holden Karau
As someone who's had the job of porting different SQL dialects to Spark, I'm also very much in favor of keeping EXTERNAL, and I think Ryan's suggestion of leaving it up to the catalogs on how to handle this makes sense. On Tue, Oct 6, 2020 at 1:54 PM Ryan Blue wrote: > I would summarize both the

Re: Official support of CREATE EXTERNAL TABLE

2020-10-06 Thread Ryan Blue
I would summarize both the problem and the current state differently. Currently, Spark parses the EXTERNAL keyword for compatibility with Hive SQL, but Spark’s built-in catalog doesn’t allow creating a table with EXTERNAL unless LOCATION is also present. *This “hidden feature” breaks compatibility

Official support of CREATE EXTERNAL TABLE

2020-10-06 Thread Wenchen Fan
Hi all, I'd like to start a discussion thread about this topic, as it blocks an important feature that we target for Spark 3.1: unify the CREATE TABLE SQL syntax. A bit more background for CREATE EXTERNAL TABLE: it's kind of a hidden feature in Spark for Hive compatibility. When you write native