I agree it sucks. We started with some decision that might have made sense back
in 2013 (let's use Hive as the default source, and guess what, pick the slowest
possible serde by default). We are paying that debt ever since.
Thanks for bringing this thread up though. We don't have a clear solutio
Technically, I has been suffered with (1) `CREATE TABLE` due to many
difference for a long time (since 2017). So, I had a wrong assumption for
the implication of that "(2) FYI: SPARK-30098 Use default datasource as
provider for CREATE TABLE syntax", Reynold. I admit that. You may not feel
in the si
You are joking when you said " informed widely and discussed in many ways
twice" right?
This thread doesn't even talk about char/varchar:
https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E
(Yes it talked about changing the
+1 for Wenchen's suggestion.
I believe that the difference and effects are informed widely and discussed
in many ways twice.
First, this was shared on last December.
"FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE
syntax", 2019/12/06
https://lists.apache.org/thread.htm
Anything would be OK if the create table DDL provides a "clear way" to
expect the table provider "before" they run the query. Great news that it
doesn't require major rework - looking forward to the PR.
Thanks again to jump in and sort this out.
- Jungtaek Lim (HeartSaVioR)
On Fri, Mar 20, 2020
I test it and cannot reproduce the issue.
I build Spark-3.1.0 and Spark2.3.1.
After many tests, it is found that there is little difference between them,
and they win and lose each other.
And from the view of event timeline, Spark-3.1.0 looks more accurate.
--
Sent from: http://apache-spark-deve
I have an update to the parser that unifies the CREATE TABLE rules. It took
surprisingly little work to get the parser updated to produce
CreateTableStatement and CreateTableAsSelectStatement with the Hive info.
And the only fields I need to add to those statements were serde: SerdeInfo
and externa
Big +1 to have one single unified CREATE TABLE syntax.
In general, we can say there are 2 ways to specify the table provider:
USING clause and ROW FORMAT/STORED AS clause. These 2 ways are mutually
exclusive. If none is specified, it implicitly indicates USING defaultSource
.
I'm fine with a few