Hi Josh, Appreciate the response! Also, Steve - we meet again :) At any rate, here's the output (a lot of it anyway) of running spark-sql with the verbose option so that you can get a sense of the settings and the classpath. Does anything stand out?
Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.port.maxRetries=999 Adding default property: spark.broadcast.port=45200 Adding default property: spark.executor.extraJavaOptions=-XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Djava.library.path=/opt/hadoop/lib/native/ Adding default property: spark.history.fs.logDirectory=hdfs:///logs/spark-history Adding default property: spark.eventLog.enabled=true Adding default property: spark.ui.port=45100 Adding default property: spark.driver.port=45055 Adding default property: spark.executor.port=45250 Adding default property: spark.logConf=true Adding default property: spark.replClassServer.port=45070 Adding default property: spark.blockManager.port=45300 Adding default property: spark.fileserver.port=45090 Adding default property: spark.history.retainedApplications=9999999 Adding default property: spark.eventLog.dir=hdfs:///logs/spark-history Adding default property: spark.history.ui.port=18080 Adding default property: spark.shuffle.consolidateFiles=true Parsed arguments: master yarn deployMode client executorMemory 1G executorCores 2 totalExecutorCores null propertiesFile /opt/spark/conf/spark-defaults.conf driverMemory 1G driverCores null driverExtraClassPath hive-site.xml:/opt/spark/sql/hive/target/spark-hive_2.10-1.6.1.jar:/opt/spark/sql/hive-thriftserver/target/spark-hive-thriftserver_2.10-1.6.1.jar:/opt/hive/lib/plexus-utils-1.5.6.jar:/opt/hive/lib/apache-log4j-extras-1.2.17.jar:/opt/hive/lib/hive-testutils-1.2.0.jar:/opt/hive/lib/jta-1.1.jar:/opt/hive/lib/oro-2.0.8.jar:/opt/hive/lib/commons-httpclient-3.0.1.jar:/opt/hive/lib/antlr-2.7.7.jar:/opt/hive/lib/hive-metastore-1.2.0.jar:/opt/hive/lib/antlr-runtime-3.4.jar:/opt/hive/lib/asm-tree-3.1.jar:/opt/hive/lib/libfb303-0.9.2.jar:/opt/hive/lib/netty-3.7.0.Final.jar:/opt/hive/lib/parquet-hadoop-bundle-1.6.0.jar:/opt/hive/lib/commons-beanutils-core-1.8.0.jar:/opt/hive/lib/calcite-core-1.2.0-incubating.jar:/opt/hive/lib/janino-2.7.6.jar:/opt/hive/lib/hive-shims-0.20S-1.2.0.jar:/opt/hive/lib/stringtemplate-3.2.1.jar:/opt/hive/lib/guava-14.0.1.jar:/opt/hive/lib/hive-exec-1.2.0.jar:/opt/hive/lib/bonecp-0.8.0.RELEASE.jar:/opt/hive/lib/opencsv-2.3.jar:/opt/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar:/opt/hive/lib/accumulo-core-1.6.0.jar:/opt/hive/lib/commons-math-2.1.jar:/opt/hive/lib/jetty-all-server-7.6.0.v20120127.jar:/opt/hive/lib/hive-ant-1.2.0.jar:/opt/hive/lib/avro-1.7.5.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar:/opt/hive/lib/hive-beeline-1.2.0.jar:/opt/hive/lib/jcommander-1.32.jar:/opt/hive/lib/commons-cli-1.2.jar:/opt/hive/lib/hive-cli-1.2.0.jar:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/hive-shims-1.2.0.jar:/opt/hive/lib/xz-1.0.jar:/opt/hive/lib/commons-beanutils-1.7.0.jar:/opt/hive/lib/commons-dbcp-1.4.jar:/opt/hive/lib/maven-scm-provider-svnexe-1.4.jar:/opt/hive/lib/hive-shims-scheduler-1.2.0.jar:/opt/hive/lib/hive-service-1.2.0.jar:/opt/hive/lib/commons-collections-3.2.1.jar:/opt/hive/lib/jsr305-3.0.0.jar:/opt/hive/lib/hive-shims-0.23-1.2.0.jar:/opt/hive/lib/maven-scm-provider-svn-commons-1.4.jar:/opt/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar:/opt/hive/lib/curator-framework-2.6.0.jar:/opt/hive/lib/libthrift-0.9.2.jar:/opt/hive/lib/json-20090211.jar:/opt/hive/lib/commons-configuration-1.6.jar:/opt/hive/lib/servlet-api-2.5.jar:/opt/hive/lib/jline-2.12.jar:/opt/hive/lib/joda-time-2.5.jar:/opt/hive/lib/derby-10.11.1.1.jar:/opt/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar:/opt/hive/lib/httpcore-4.4.jar:/opt/hive/lib/junit-4.11.jar:/opt/hive/lib/curator-recipes-2.6.0.jar:/opt/hive/lib/hive-hbase-handler-1.2.0.jar:/opt/hive/lib/accumulo-trace-1.6.0.jar:/opt/hive/lib/accumulo-fate-1.6.0.jar:/opt/hive/lib/curator-client-2.6.0.jar:/opt/hive/lib/tempus-fugit-1.1.jar:/opt/hive/lib/commons-pool-1.5.4.jar:/opt/hive/lib/commons-vfs2-2.0.jar:/opt/hive/lib/ant-1.9.1.jar:/opt/hive/lib/snappy-java-1.0.5.jar:/opt/hive/lib/stax-api-1.0.1.jar:/opt/hive/lib/jetty-all-7.6.0.v20120127.jar:/opt/hive/lib/jdo-api-3.0.1.jar:/opt/hive/lib/groovy-all-2.1.6.jar:/opt/hive/lib/hive-hwi-1.2.0.jar:/opt/hive/lib/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar:/opt/hive/lib/hive-common-1.2.0.jar:/opt/hive/lib/maven-scm-api-1.4.jar:/opt/hive/lib/calcite-linq4j-1.2.0-incubating.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/jpam-1.1.jar:/opt/hive/lib/velocity-1.5.jar:/opt/hive/lib/activation-1.1.jar:/opt/hive/lib/hive-accumulo-handler-1.2.0.jar:/opt/hive/lib/ant-launcher-1.9.1.jar:/opt/hive/lib/hive-jdbc-1.2.0.jar:/opt/hive/lib/commons-compress-1.4.1.jar:/opt/hive/lib/commons-logging-1.1.3.jar:/opt/hive/lib/hive-serde-1.2.0.jar:/opt/hive/lib/zookeeper-3.4.6.jar:/opt/hive/lib/accumulo-start-1.6.0.jar:/opt/hive/lib/hive-contrib-1.2.0.jar:/opt/hive/lib/log4j-1.2.16.jar:/opt/hive/lib/commons-compiler-2.7.6.jar:/opt/hive/lib/ST4-4.0.4.jar:/opt/hive/lib/calcite-avatica-1.2.0-incubating.jar:/opt/hive/lib/httpclient-4.4.jar:/opt/hive/lib/commons-codec-1.4.jar:/opt/hive/lib/commons-io-2.4.jar:/opt/hive/lib/commons-digester-1.8.jar:/opt/hive/lib/regexp-1.3.jar:/opt/hive/lib/ivy-2.4.0.jar:/opt/hive/lib/eigenbase-properties-1.1.5.jar:/opt/hive/lib/paranamer-2.3.jar:/opt/hive/lib/mail-1.4.1.jar:/opt/hive/lib/asm-commons-3.1.jar:/opt/hive/lib/commons-lang-2.6.jar:/opt/hive/lib/hive-jdbc-1.2.0-standalone.jar:/opt/hive/lib/hive-shims-common-1.2.0.jar:/opt/hive/lib/hamcrest-core-1.1.jar:/opt/hive/lib/super-csv-2.2.0.jar: driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue research numExecutors 4 files null pyFiles null archives null mainClass org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver primaryResource spark-internal name org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver childArgs [] jars null packages null packagesExclusions null repositories null verbose true Spark properties used, including those specified through --conf and those from the properties file /opt/spark/conf/spark-defaults.conf: spark.blockManager.port -> 45300 spark.broadcast.port -> 45200 spark.driver.memory -> 1G spark.logConf -> true spark.replClassServer.port -> 45070 spark.eventLog.enabled -> true spark.yarn.dist.files -> /etc/spark/hive-site.xml,/opt/spark/sql/hive/target/spark-hive_2.10-1.6.1.jar,/opt/spark/sql/hive-thriftserver/target/spark-hive-thriftserver_2.10-1.6.1.jar,/opt/hive/lib/plexus-utils-1.5.6.jar,/opt/hive/lib/apache-log4j-extras-1.2.17.jar,/opt/hive/lib/hive-testutils-1.2.0.jar,/opt/hive/lib/jta-1.1.jar,/opt/hive/lib/oro-2.0.8.jar,/opt/hive/lib/commons-httpclient-3.0.1.jar,/opt/hive/lib/antlr-2.7.7.jar,/opt/hive/lib/hive-metastore-1.2.0.jar,/opt/hive/lib/antlr-runtime-3.4.jar,/opt/hive/lib/asm-tree-3.1.jar,/opt/hive/lib/libfb303-0.9.2.jar,/opt/hive/lib/netty-3.7.0.Final.jar,/opt/hive/lib/parquet-hadoop-bundle-1.6.0.jar,/opt/hive/lib/commons-beanutils-core-1.8.0.jar,/opt/hive/lib/calcite-core-1.2.0-incubating.jar,/opt/hive/lib/janino-2.7.6.jar,/opt/hive/lib/hive-shims-0.20S-1.2.0.jar,/opt/hive/lib/stringtemplate-3.2.1.jar,/opt/hive/lib/guava-14.0.1.jar,/opt/hive/lib/hive-exec-1.2.0.jar,/opt/hive/lib/bonecp-0.8.0.RELEASE.jar,/opt/hive/lib/opencsv-2.3.jar,/opt/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar,/opt/hive/lib/accumulo-core-1.6.0.jar,/opt/hive/lib/commons-math-2.1.jar,/opt/hive/lib/jetty-all-server-7.6.0.v20120127.jar,/opt/hive/lib/hive-ant-1.2.0.jar,/opt/hive/lib/avro-1.7.5.jar,/opt/hive/lib/datanucleus-rdbms-3.2.9.jar,/opt/hive/lib/hive-beeline-1.2.0.jar,/opt/hive/lib/jcommander-1.32.jar,/opt/hive/lib/commons-cli-1.2.jar,/opt/hive/lib/hive-cli-1.2.0.jar,/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar,/opt/hive/lib/hive-shims-1.2.0.jar,/opt/hive/lib/xz-1.0.jar,/opt/hive/lib/commons-beanutils-1.7.0.jar,/opt/hive/lib/commons-dbcp-1.4.jar,/opt/hive/lib/maven-scm-provider-svnexe-1.4.jar,/opt/hive/lib/hive-shims-scheduler-1.2.0.jar,/opt/hive/lib/hive-service-1.2.0.jar,/opt/hive/lib/commons-collections-3.2.1.jar,/opt/hive/lib/jsr305-3.0.0.jar,/opt/hive/lib/hive-shims-0.23-1.2.0.jar,/opt/hive/lib/maven-scm-provider-svn-commons-1.4.jar,/opt/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar,/opt/hive/lib/curator-framework-2.6.0.jar,/opt/hive/lib/libthrift-0.9.2.jar,/opt/hive/lib/json-20090211.jar,/opt/hive/lib/commons-configuration-1.6.jar,/opt/hive/lib/servlet-api-2.5.jar,/opt/hive/lib/jline-2.12.jar,/opt/hive/lib/joda-time-2.5.jar,/opt/hive/lib/derby-10.11.1.1.jar,/opt/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar,/opt/hive/lib/httpcore-4.4.jar,/opt/hive/lib/junit-4.11.jar,/opt/hive/lib/curator-recipes-2.6.0.jar,/opt/hive/lib/hive-hbase-handler-1.2.0.jar,/opt/hive/lib/accumulo-trace-1.6.0.jar,/opt/hive/lib/accumulo-fate-1.6.0.jar,/opt/hive/lib/curator-client-2.6.0.jar,/opt/hive/lib/tempus-fugit-1.1.jar,/opt/hive/lib/commons-pool-1.5.4.jar,/opt/hive/lib/commons-vfs2-2.0.jar,/opt/hive/lib/ant-1.9.1.jar,/opt/hive/lib/snappy-java-1.0.5.jar,/opt/hive/lib/stax-api-1.0.1.jar,/opt/hive/lib/jetty-all-7.6.0.v20120127.jar,/opt/hive/lib/jdo-api-3.0.1.jar,/opt/hive/lib/groovy-all-2.1.6.jar,/opt/hive/lib/hive-hwi-1.2.0.jar,/opt/hive/lib/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,/opt/hive/lib/hive-common-1.2.0.jar,/opt/hive/lib/maven-scm-api-1.4.jar,/opt/hive/lib/calcite-linq4j-1.2.0-incubating.jar,/opt/hive/lib/datanucleus-core-3.2.10.jar,/opt/hive/lib/jpam-1.1.jar,/opt/hive/lib/velocity-1.5.jar,/opt/hive/lib/activation-1.1.jar,/opt/hive/lib/hive-accumulo-handler-1.2.0.jar,/opt/hive/lib/ant-launcher-1.9.1.jar,/opt/hive/lib/hive-jdbc-1.2.0.jar,/opt/hive/lib/commons-compress-1.4.1.jar,/opt/hive/lib/commons-logging-1.1.3.jar,/opt/hive/lib/hive-serde-1.2.0.jar,/opt/hive/lib/zookeeper-3.4.6.jar,/opt/hive/lib/accumulo-start-1.6.0.jar,/opt/hive/lib/hive-contrib-1.2.0.jar,/opt/hive/lib/log4j-1.2.16.jar,/opt/hive/lib/commons-compiler-2.7.6.jar,/opt/hive/lib/ST4-4.0.4.jar,/opt/hive/lib/calcite-avatica-1.2.0-incubating.jar,/opt/hive/lib/httpclient-4.4.jar,/opt/hive/lib/commons-codec-1.4.jar,/opt/hive/lib/commons-io-2.4.jar,/opt/hive/lib/commons-digester-1.8.jar,/opt/hive/lib/regexp-1.3.jar,/opt/hive/lib/ivy-2.4.0.jar,/opt/hive/lib/eigenbase-properties-1.1.5.jar,/opt/hive/lib/paranamer-2.3.jar,/opt/hive/lib/mail-1.4.1.jar,/opt/hive/lib/asm-commons-3.1.jar,/opt/hive/lib/commons-lang-2.6.jar,/opt/hive/lib/hive-jdbc-1.2.0-standalone.jar,/opt/hive/lib/hive-shims-common-1.2.0.jar,/opt/hive/lib/hamcrest-core-1.1.jar,/opt/hive/lib/super-csv-2.2.0.jar, spark.history.ui.port -> 18080 spark.fileserver.port -> 45090 spark.history.retainedApplications -> 9999999 spark.ui.port -> 45100 spark.shuffle.consolidateFiles -> true spark.executor.extraJavaOptions -> -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Djava.library.path=/opt/hadoop/lib/native/ spark.history.fs.logDirectory -> hdfs:///logs/spark-history spark.eventLog.dir -> hdfs:///logs/spark-history/alti_soam/ spark.executor.extraClassPath -> spark-hive_2.10-1.6.1.jar:spark-hive-thriftserver_2.10-1.6.1.jar spark.driver.port -> 45055 spark.port.maxRetries -> 999 spark.executor.port -> 45250 spark.driver.extraClassPath -> ....... On Wed, Apr 6, 2016 at 6:59 PM, Josh Rosen <joshro...@databricks.com> wrote: > > Spark is compiled against a custom fork of Hive 1.2.1 which added shading > of Protobuf and removed shading of Kryo. What I think that what's happening > here is that stock Hive 1.2.1 is taking precedence so the Kryo instance > that it's returning is an instance of shaded/relocated Hive version rather > than the unshaded, stock Kryo that Spark is expecting here. > > I just so happen to have a patch which reintroduces the shading of Kryo > (motivated by other factors): https://github.com/apache/spark/pull/12215; > there's a chance that a backport of this patch might fix this problem. > > However, I'm a bit curious about how your classpath is set up and why > stock 1.2.1's shaded Kryo is being used here. > > /cc +Marcelo Vanzin <van...@cloudera.com> and +Steve Loughran > <ste...@hortonworks.com>, who may know more. > > On Wed, Apr 6, 2016 at 6:08 PM Soam Acharya <s...@altiscale.com> wrote: > >> Hi folks, >> >> I have a build of Spark 1.6.1 on which spark sql seems to be functional >> outside of windowing functions. For example, I can create a simple external >> table via Hive: >> >> CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd >> string) >> ROW FORMAT DELIMITED >> FIELDS TERMINATED BY ',' >> LINES TERMINATED BY '\n' >> STORED AS TEXTFILE >> LOCATION '/user/test/ps'; >> >> Ensure that the table is pointing to some valid data, set up spark sql to >> point to the Hive metastore (we're running Hive 1.2.1) and run a basic test: >> >> spark-sql> select * from PSTable; >> 7239 pts/0 00:24:31 java >> 9993 pts/9 00:00:00 ps >> 9994 pts/9 00:00:00 tail >> 9995 pts/9 00:00:00 sed >> 9996 pts/9 00:00:00 sed >> >> But when I try to run a windowing function which I know runs onHive, I >> get: >> >> spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by >> a.cmd order by a.time ) from PSTable a; >> org.apache.spark.SparkException: Task not serializable >> at >> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) >> at >> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) >> at >> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) >> at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) >> : >> : >> Caused by: java.lang.ClassCastException: >> org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to >> com.esotericsoftware.kryo.Kryo >> at >> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178) >> at >> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191) >> at >> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458) >> at >> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) >> at >> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) >> >> Any thoughts or ideas would be appreciated! >> >> Regards, >> >> Soam >> >