Re:Official support of CREATE EXTERNAL TABLE

2020-10-12 Thread 大啊
Personally, I think EXTERNAL is a special feture supported by Hive.
If Spark SQL want support it, only consider it for Hive.
We only unify `CREATE EXTERNAL TABLE in parser and check for unsupported data 
sources.










At 2020-10-06 22:06:28, "Wenchen Fan"  wrote:

Hi all,



I'd like to start a discussion thread about this topic, as it blocks an 
important feature that we target for Spark 3.1: unify the CREATE TABLE SQL 
syntax.


A bit more background for CREATE EXTERNAL TABLE: it's kind of a hidden feature 
in Spark for Hive compatibility.


When you write native CREATE TABLE syntax such as `CREATE EXTERNAL TABLE ... 
USING parquet`, the parser fails and tells you that EXTERNAL can't be specified.


When we write Hive CREATE TABLE syntax, the EXTERNAL can be specified if 
LOCATION clause or path option is present. For example, `CREATE EXTERNAL TABLE 
... STORED AS parquet` is not allowed as there is no LOCATION clause or path 
option. This is not 100% Hive compatible.


As we are unifying the CREATE TABLE SQL syntax, one problem is how to deal with 
CREATE EXTERNAL TABLE. We can keep it as a hidden feature as it was, or we can 
officially support it.


Please let us know your thoughts:
1. As an end-user, what do you expect CREATE EXTERNAL TABLE to do? Have you 
used it in production before? For what use cases?
2. As a catalog developer, how are you going to implement EXTERNAL TABLE? It 
seems to me that it only makes sense for file source, as the table directory 
can be managed. I'm not sure how to interpret EXTERNAL in catalogs like jdbc, 
cassandra, etc.


For more details, please refer to the long discussion in 
https://github.com/apache/spark/pull/28026


Thanks,
Wenchen

-Phadoop-provided still includes hadoop jars

2020-10-12 Thread Kimahriman
When I try to build a distribution with either -Phive or -Phadoop-cloud along
with -Phadoop-provided, I still end up with hadoop jars in the distribution.

Specifically, with -Phive and -Phadoop-provided, you end up with
hadoop-annotations, hadoop-auth, and hadoop-common included in the Spark
jars, and with -Phadoop-cloud and -Phadoop-provided, you end up with
hadoop-annotations, as well as the hadoop-{aws,azure,openstack} jars. Is
this supposed to be the case or is there something I'm doing wrong? I just
want the spark-hive and spark-hadoop-cloud jars without the hadoop
dependencies, and right now I just have to delete the hadoop jars after the
fact.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: -Phadoop-provided still includes hadoop jars

2020-10-12 Thread Sean Owen
I don't have a good answer, Steve may know more, but from looking at
dependency:tree, it looks mostly like it's hadoop-common that's at issue.
Without -Phive it remains 'provided' in the assembly/ module, but -Phive
causes it to come back in. Either there's some good reason for that, or,
maybe we need to explicitly manage the scope of hadoop-common along with
everything else Hadoop, even though Spark doesn't reference it directly.

On Mon, Oct 12, 2020 at 12:38 PM Kimahriman  wrote:

> When I try to build a distribution with either -Phive or -Phadoop-cloud
> along
> with -Phadoop-provided, I still end up with hadoop jars in the
> distribution.
>
> Specifically, with -Phive and -Phadoop-provided, you end up with
> hadoop-annotations, hadoop-auth, and hadoop-common included in the Spark
> jars, and with -Phadoop-cloud and -Phadoop-provided, you end up with
> hadoop-annotations, as well as the hadoop-{aws,azure,openstack} jars. Is
> this supposed to be the case or is there something I'm doing wrong? I just
> want the spark-hive and spark-hadoop-cloud jars without the hadoop
> dependencies, and right now I just have to delete the hadoop jars after the
> fact.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


[UPDATE] Apache Spark 3.1.0 Release Window

2020-10-12 Thread Dongjoon Hyun
Hi, All.

Apache Spark 3.1.0 Release Window is adjusted like the following today.
Please check the latest information on the official website.

-
https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
- https://spark.apache.org/versioning-policy.html

Bests,
Dongjoon.


Re: [UPDATE] Apache Spark 3.1.0 Release Window

2020-10-12 Thread Xiao Li
Thank you, Dongjoon

Xiao

On Mon, Oct 12, 2020 at 4:19 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Apache Spark 3.1.0 Release Window is adjusted like the following today.
> Please check the latest information on the official website.
>
> -
> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
> - https://spark.apache.org/versioning-policy.html
>
> Bests,
> Dongjoon.
>


--


Unit test failure in spark-core

2020-10-12 Thread Stephen Coy
Hi all,

When trying to build current master with a simple:

mvn clean install

I get a consistent unit test failure in core:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.403 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.015 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:274)

I believe the applicable messages from the unit-tests.log file are:

20/10/13 12:20:35.875 spark-app-1: '' WARN InProcessAppHandle: 
Application failed with exception.
org.apache.spark.SparkException: Failed to get main class in JAR with error 
'File spark-internal does not exist'.  Please specify one with --class.
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:942)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:457)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.InProcessSparkSubmit$.main(SparkSubmit.scala:954)
at org.apache.spark.deploy.InProcessSparkSubmit.main(SparkSubmit.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.launcher.InProcessAppHandle.lambda$start$0(InProcessAppHandle.java:72)
at java.lang.Thread.run(Thread.java:748)


org.apache.spark.launcher.SparkLauncherSuite#testSparkLauncherGetError is the 
failing test, so I improved the the failing assertion by changing it from:

  
assertTrue(handle.getError().get().getMessage().contains(EXCEPTION_MESSAGE));

to:

  assertThat(handle.getError().get().getMessage(), 
containsString(EXCEPTION_MESSAGE));

This yields:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.155 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.02 s  <<< FAILURE!
java.lang.AssertionError:

Expected: a string containing "dummy-exception"
 but: was "Error: Failed to load class 
org.apache.spark.launcher.SparkLauncherSuite$ErrorInProcessTestApp."
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:276)


Which loosely correlates with error in unit-tests.log.

Any ideas?

Thanks,

Steve C



This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia's privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Re: Unit test failure in spark-core

2020-10-12 Thread Stephen Coy
Sorry, I forgot:

[scoy@Steves-Core-i9-2 core]$ java -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.262-b10, mixed mode)

which is on MacOS 10.15.7

On 13 Oct 2020, at 12:47 pm, Stephen Coy 
mailto:s...@infomedia.com.au.INVALID>> wrote:

Hi all,

When trying to build current master with a simple:

mvn clean install

I get a consistent unit test failure in core:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.403 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.015 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:274)

I believe the applicable messages from the unit-tests.log file are:

20/10/13 12:20:35.875 spark-app-1: '' WARN InProcessAppHandle: 
Application failed with exception.
org.apache.spark.SparkException: Failed to get main class in JAR with error 
'File spark-internal does not exist'.  Please specify one with --class.
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:942)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:457)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.InProcessSparkSubmit$.main(SparkSubmit.scala:954)
at org.apache.spark.deploy.InProcessSparkSubmit.main(SparkSubmit.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.launcher.InProcessAppHandle.lambda$start$0(InProcessAppHandle.java:72)
at java.lang.Thread.run(Thread.java:748)


org.apache.spark.launcher.SparkLauncherSuite#testSparkLauncherGetError is the 
failing test, so I improved the the failing assertion by changing it from:

  
assertTrue(handle.getError().get().getMessage().contains(EXCEPTION_MESSAGE));

to:

  assertThat(handle.getError().get().getMessage(), 
containsString(EXCEPTION_MESSAGE));

This yields:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.155 s 
<<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite
[ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) 
 Time elapsed: 2.02 s  <<< FAILURE!
java.lang.AssertionError:

Expected: a string containing "dummy-exception"
 but: was "Error: Failed to load class 
org.apache.spark.launcher.SparkLauncherSuite$ErrorInProcessTestApp."
at 
org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:276)


Which loosely correlates with error in unit-tests.log.

Any ideas?

Thanks,

Steve C



This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/