Re:Official support of CREATE EXTERNAL TABLE
Personally, I think EXTERNAL is a special feture supported by Hive. If Spark SQL want support it, only consider it for Hive. We only unify `CREATE EXTERNAL TABLE in parser and check for unsupported data sources. At 2020-10-06 22:06:28, "Wenchen Fan" wrote: Hi all, I'd like to start a discussion thread about this topic, as it blocks an important feature that we target for Spark 3.1: unify the CREATE TABLE SQL syntax. A bit more background for CREATE EXTERNAL TABLE: it's kind of a hidden feature in Spark for Hive compatibility. When you write native CREATE TABLE syntax such as `CREATE EXTERNAL TABLE ... USING parquet`, the parser fails and tells you that EXTERNAL can't be specified. When we write Hive CREATE TABLE syntax, the EXTERNAL can be specified if LOCATION clause or path option is present. For example, `CREATE EXTERNAL TABLE ... STORED AS parquet` is not allowed as there is no LOCATION clause or path option. This is not 100% Hive compatible. As we are unifying the CREATE TABLE SQL syntax, one problem is how to deal with CREATE EXTERNAL TABLE. We can keep it as a hidden feature as it was, or we can officially support it. Please let us know your thoughts: 1. As an end-user, what do you expect CREATE EXTERNAL TABLE to do? Have you used it in production before? For what use cases? 2. As a catalog developer, how are you going to implement EXTERNAL TABLE? It seems to me that it only makes sense for file source, as the table directory can be managed. I'm not sure how to interpret EXTERNAL in catalogs like jdbc, cassandra, etc. For more details, please refer to the long discussion in https://github.com/apache/spark/pull/28026 Thanks, Wenchen
-Phadoop-provided still includes hadoop jars
When I try to build a distribution with either -Phive or -Phadoop-cloud along with -Phadoop-provided, I still end up with hadoop jars in the distribution. Specifically, with -Phive and -Phadoop-provided, you end up with hadoop-annotations, hadoop-auth, and hadoop-common included in the Spark jars, and with -Phadoop-cloud and -Phadoop-provided, you end up with hadoop-annotations, as well as the hadoop-{aws,azure,openstack} jars. Is this supposed to be the case or is there something I'm doing wrong? I just want the spark-hive and spark-hadoop-cloud jars without the hadoop dependencies, and right now I just have to delete the hadoop jars after the fact. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: -Phadoop-provided still includes hadoop jars
I don't have a good answer, Steve may know more, but from looking at dependency:tree, it looks mostly like it's hadoop-common that's at issue. Without -Phive it remains 'provided' in the assembly/ module, but -Phive causes it to come back in. Either there's some good reason for that, or, maybe we need to explicitly manage the scope of hadoop-common along with everything else Hadoop, even though Spark doesn't reference it directly. On Mon, Oct 12, 2020 at 12:38 PM Kimahriman wrote: > When I try to build a distribution with either -Phive or -Phadoop-cloud > along > with -Phadoop-provided, I still end up with hadoop jars in the > distribution. > > Specifically, with -Phive and -Phadoop-provided, you end up with > hadoop-annotations, hadoop-auth, and hadoop-common included in the Spark > jars, and with -Phadoop-cloud and -Phadoop-provided, you end up with > hadoop-annotations, as well as the hadoop-{aws,azure,openstack} jars. Is > this supposed to be the case or is there something I'm doing wrong? I just > want the spark-hive and spark-hadoop-cloud jars without the hadoop > dependencies, and right now I just have to delete the hadoop jars after the > fact. > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
[UPDATE] Apache Spark 3.1.0 Release Window
Hi, All. Apache Spark 3.1.0 Release Window is adjusted like the following today. Please check the latest information on the official website. - https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca - https://spark.apache.org/versioning-policy.html Bests, Dongjoon.
Re: [UPDATE] Apache Spark 3.1.0 Release Window
Thank you, Dongjoon Xiao On Mon, Oct 12, 2020 at 4:19 PM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.1.0 Release Window is adjusted like the following today. > Please check the latest information on the official website. > > - > https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca > - https://spark.apache.org/versioning-policy.html > > Bests, > Dongjoon. > --
Unit test failure in spark-core
Hi all, When trying to build current master with a simple: mvn clean install I get a consistent unit test failure in core: [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.403 s <<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite [ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 2.015 s <<< FAILURE! java.lang.AssertionError at org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:274) I believe the applicable messages from the unit-tests.log file are: 20/10/13 12:20:35.875 spark-app-1: '' WARN InProcessAppHandle: Application failed with exception. org.apache.spark.SparkException: Failed to get main class in JAR with error 'File spark-internal does not exist'. Please specify one with --class. at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:942) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:457) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.InProcessSparkSubmit$.main(SparkSubmit.scala:954) at org.apache.spark.deploy.InProcessSparkSubmit.main(SparkSubmit.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.launcher.InProcessAppHandle.lambda$start$0(InProcessAppHandle.java:72) at java.lang.Thread.run(Thread.java:748) org.apache.spark.launcher.SparkLauncherSuite#testSparkLauncherGetError is the failing test, so I improved the the failing assertion by changing it from: assertTrue(handle.getError().get().getMessage().contains(EXCEPTION_MESSAGE)); to: assertThat(handle.getError().get().getMessage(), containsString(EXCEPTION_MESSAGE)); This yields: [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.155 s <<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite [ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 2.02 s <<< FAILURE! java.lang.AssertionError: Expected: a string containing "dummy-exception" but: was "Error: Failed to load class org.apache.spark.launcher.SparkLauncherSuite$ErrorInProcessTestApp." at org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:276) Which loosely correlates with error in unit-tests.log. Any ideas? Thanks, Steve C This email contains confidential information of and is the copyright of Infomedia. It must not be forwarded, amended or disclosed without consent of the sender. If you received this message by mistake, please advise the sender and delete all copies. Security of transmission on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you should ensure you have suitable antivirus protection in place. By sending us your or any third party personal details, you consent to (or confirm you have obtained consent from such third parties) to Infomedia's privacy policy. http://www.infomedia.com.au/privacy-policy/
Re: Unit test failure in spark-core
Sorry, I forgot: [scoy@Steves-Core-i9-2 core]$ java -version openjdk version "1.8.0_262" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_262-b10) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.262-b10, mixed mode) which is on MacOS 10.15.7 On 13 Oct 2020, at 12:47 pm, Stephen Coy mailto:s...@infomedia.com.au.INVALID>> wrote: Hi all, When trying to build current master with a simple: mvn clean install I get a consistent unit test failure in core: [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.403 s <<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite [ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 2.015 s <<< FAILURE! java.lang.AssertionError at org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:274) I believe the applicable messages from the unit-tests.log file are: 20/10/13 12:20:35.875 spark-app-1: '' WARN InProcessAppHandle: Application failed with exception. org.apache.spark.SparkException: Failed to get main class in JAR with error 'File spark-internal does not exist'. Please specify one with --class. at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:942) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:457) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.InProcessSparkSubmit$.main(SparkSubmit.scala:954) at org.apache.spark.deploy.InProcessSparkSubmit.main(SparkSubmit.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.launcher.InProcessAppHandle.lambda$start$0(InProcessAppHandle.java:72) at java.lang.Thread.run(Thread.java:748) org.apache.spark.launcher.SparkLauncherSuite#testSparkLauncherGetError is the failing test, so I improved the the failing assertion by changing it from: assertTrue(handle.getError().get().getMessage().contains(EXCEPTION_MESSAGE)); to: assertThat(handle.getError().get().getMessage(), containsString(EXCEPTION_MESSAGE)); This yields: [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.155 s <<< FAILURE! - in org.apache.spark.launcher.SparkLauncherSuite [ERROR] testSparkLauncherGetError(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 2.02 s <<< FAILURE! java.lang.AssertionError: Expected: a string containing "dummy-exception" but: was "Error: Failed to load class org.apache.spark.launcher.SparkLauncherSuite$ErrorInProcessTestApp." at org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError(SparkLauncherSuite.java:276) Which loosely correlates with error in unit-tests.log. Any ideas? Thanks, Steve C This email contains confidential information of and is the copyright of Infomedia. It must not be forwarded, amended or disclosed without consent of the sender. If you received this message by mistake, please advise the sender and delete all copies. Security of transmission on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you should ensure you have suitable antivirus protection in place. By sending us your or any third party personal details, you consent to (or confirm you have obtained consent from such third parties) to Infomedia’s privacy policy. http://www.infomedia.com.au/privacy-policy/