Hi Josh,

Appreciate the response! Also, Steve - we meet again :) At any rate, here's
the output (a lot of it anyway) of running spark-sql with the verbose
option so that you can get a sense of the settings and the classpath. Does
anything stand out?

Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property: spark.port.maxRetries=999
Adding default property: spark.broadcast.port=45200
Adding default property:
spark.executor.extraJavaOptions=-XX:+PrintReferenceGC -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-Djava.library.path=/opt/hadoop/lib/native/
Adding default property:
spark.history.fs.logDirectory=hdfs:///logs/spark-history
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.ui.port=45100
Adding default property: spark.driver.port=45055
Adding default property: spark.executor.port=45250
Adding default property: spark.logConf=true
Adding default property: spark.replClassServer.port=45070
Adding default property: spark.blockManager.port=45300
Adding default property: spark.fileserver.port=45090
Adding default property: spark.history.retainedApplications=9999999
Adding default property: spark.eventLog.dir=hdfs:///logs/spark-history
Adding default property: spark.history.ui.port=18080
Adding default property: spark.shuffle.consolidateFiles=true
Parsed arguments:
  master                  yarn
  deployMode              client
  executorMemory          1G
  executorCores           2
  totalExecutorCores      null
  propertiesFile          /opt/spark/conf/spark-defaults.conf
  driverMemory            1G
  driverCores             null
  driverExtraClassPath
 
hive-site.xml:/opt/spark/sql/hive/target/spark-hive_2.10-1.6.1.jar:/opt/spark/sql/hive-thriftserver/target/spark-hive-thriftserver_2.10-1.6.1.jar:/opt/hive/lib/plexus-utils-1.5.6.jar:/opt/hive/lib/apache-log4j-extras-1.2.17.jar:/opt/hive/lib/hive-testutils-1.2.0.jar:/opt/hive/lib/jta-1.1.jar:/opt/hive/lib/oro-2.0.8.jar:/opt/hive/lib/commons-httpclient-3.0.1.jar:/opt/hive/lib/antlr-2.7.7.jar:/opt/hive/lib/hive-metastore-1.2.0.jar:/opt/hive/lib/antlr-runtime-3.4.jar:/opt/hive/lib/asm-tree-3.1.jar:/opt/hive/lib/libfb303-0.9.2.jar:/opt/hive/lib/netty-3.7.0.Final.jar:/opt/hive/lib/parquet-hadoop-bundle-1.6.0.jar:/opt/hive/lib/commons-beanutils-core-1.8.0.jar:/opt/hive/lib/calcite-core-1.2.0-incubating.jar:/opt/hive/lib/janino-2.7.6.jar:/opt/hive/lib/hive-shims-0.20S-1.2.0.jar:/opt/hive/lib/stringtemplate-3.2.1.jar:/opt/hive/lib/guava-14.0.1.jar:/opt/hive/lib/hive-exec-1.2.0.jar:/opt/hive/lib/bonecp-0.8.0.RELEASE.jar:/opt/hive/lib/opencsv-2.3.jar:/opt/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar:/opt/hive/lib/accumulo-core-1.6.0.jar:/opt/hive/lib/commons-math-2.1.jar:/opt/hive/lib/jetty-all-server-7.6.0.v20120127.jar:/opt/hive/lib/hive-ant-1.2.0.jar:/opt/hive/lib/avro-1.7.5.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar:/opt/hive/lib/hive-beeline-1.2.0.jar:/opt/hive/lib/jcommander-1.32.jar:/opt/hive/lib/commons-cli-1.2.jar:/opt/hive/lib/hive-cli-1.2.0.jar:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/hive-shims-1.2.0.jar:/opt/hive/lib/xz-1.0.jar:/opt/hive/lib/commons-beanutils-1.7.0.jar:/opt/hive/lib/commons-dbcp-1.4.jar:/opt/hive/lib/maven-scm-provider-svnexe-1.4.jar:/opt/hive/lib/hive-shims-scheduler-1.2.0.jar:/opt/hive/lib/hive-service-1.2.0.jar:/opt/hive/lib/commons-collections-3.2.1.jar:/opt/hive/lib/jsr305-3.0.0.jar:/opt/hive/lib/hive-shims-0.23-1.2.0.jar:/opt/hive/lib/maven-scm-provider-svn-commons-1.4.jar:/opt/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar:/opt/hive/lib/curator-framework-2.6.0.jar:/opt/hive/lib/libthrift-0.9.2.jar:/opt/hive/lib/json-20090211.jar:/opt/hive/lib/commons-configuration-1.6.jar:/opt/hive/lib/servlet-api-2.5.jar:/opt/hive/lib/jline-2.12.jar:/opt/hive/lib/joda-time-2.5.jar:/opt/hive/lib/derby-10.11.1.1.jar:/opt/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar:/opt/hive/lib/httpcore-4.4.jar:/opt/hive/lib/junit-4.11.jar:/opt/hive/lib/curator-recipes-2.6.0.jar:/opt/hive/lib/hive-hbase-handler-1.2.0.jar:/opt/hive/lib/accumulo-trace-1.6.0.jar:/opt/hive/lib/accumulo-fate-1.6.0.jar:/opt/hive/lib/curator-client-2.6.0.jar:/opt/hive/lib/tempus-fugit-1.1.jar:/opt/hive/lib/commons-pool-1.5.4.jar:/opt/hive/lib/commons-vfs2-2.0.jar:/opt/hive/lib/ant-1.9.1.jar:/opt/hive/lib/snappy-java-1.0.5.jar:/opt/hive/lib/stax-api-1.0.1.jar:/opt/hive/lib/jetty-all-7.6.0.v20120127.jar:/opt/hive/lib/jdo-api-3.0.1.jar:/opt/hive/lib/groovy-all-2.1.6.jar:/opt/hive/lib/hive-hwi-1.2.0.jar:/opt/hive/lib/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar:/opt/hive/lib/hive-common-1.2.0.jar:/opt/hive/lib/maven-scm-api-1.4.jar:/opt/hive/lib/calcite-linq4j-1.2.0-incubating.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/jpam-1.1.jar:/opt/hive/lib/velocity-1.5.jar:/opt/hive/lib/activation-1.1.jar:/opt/hive/lib/hive-accumulo-handler-1.2.0.jar:/opt/hive/lib/ant-launcher-1.9.1.jar:/opt/hive/lib/hive-jdbc-1.2.0.jar:/opt/hive/lib/commons-compress-1.4.1.jar:/opt/hive/lib/commons-logging-1.1.3.jar:/opt/hive/lib/hive-serde-1.2.0.jar:/opt/hive/lib/zookeeper-3.4.6.jar:/opt/hive/lib/accumulo-start-1.6.0.jar:/opt/hive/lib/hive-contrib-1.2.0.jar:/opt/hive/lib/log4j-1.2.16.jar:/opt/hive/lib/commons-compiler-2.7.6.jar:/opt/hive/lib/ST4-4.0.4.jar:/opt/hive/lib/calcite-avatica-1.2.0-incubating.jar:/opt/hive/lib/httpclient-4.4.jar:/opt/hive/lib/commons-codec-1.4.jar:/opt/hive/lib/commons-io-2.4.jar:/opt/hive/lib/commons-digester-1.8.jar:/opt/hive/lib/regexp-1.3.jar:/opt/hive/lib/ivy-2.4.0.jar:/opt/hive/lib/eigenbase-properties-1.1.5.jar:/opt/hive/lib/paranamer-2.3.jar:/opt/hive/lib/mail-1.4.1.jar:/opt/hive/lib/asm-commons-3.1.jar:/opt/hive/lib/commons-lang-2.6.jar:/opt/hive/lib/hive-jdbc-1.2.0-standalone.jar:/opt/hive/lib/hive-shims-common-1.2.0.jar:/opt/hive/lib/hamcrest-core-1.1.jar:/opt/hive/lib/super-csv-2.2.0.jar:
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   research
  numExecutors            4
  files                   null
  pyFiles                 null
  archives                null
  mainClass
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
  primaryResource         spark-internal
  name
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
  childArgs               []
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file
/opt/spark/conf/spark-defaults.conf:
  spark.blockManager.port -> 45300
  spark.broadcast.port -> 45200
  spark.driver.memory -> 1G
  spark.logConf -> true
  spark.replClassServer.port -> 45070
  spark.eventLog.enabled -> true
  spark.yarn.dist.files ->
/etc/spark/hive-site.xml,/opt/spark/sql/hive/target/spark-hive_2.10-1.6.1.jar,/opt/spark/sql/hive-thriftserver/target/spark-hive-thriftserver_2.10-1.6.1.jar,/opt/hive/lib/plexus-utils-1.5.6.jar,/opt/hive/lib/apache-log4j-extras-1.2.17.jar,/opt/hive/lib/hive-testutils-1.2.0.jar,/opt/hive/lib/jta-1.1.jar,/opt/hive/lib/oro-2.0.8.jar,/opt/hive/lib/commons-httpclient-3.0.1.jar,/opt/hive/lib/antlr-2.7.7.jar,/opt/hive/lib/hive-metastore-1.2.0.jar,/opt/hive/lib/antlr-runtime-3.4.jar,/opt/hive/lib/asm-tree-3.1.jar,/opt/hive/lib/libfb303-0.9.2.jar,/opt/hive/lib/netty-3.7.0.Final.jar,/opt/hive/lib/parquet-hadoop-bundle-1.6.0.jar,/opt/hive/lib/commons-beanutils-core-1.8.0.jar,/opt/hive/lib/calcite-core-1.2.0-incubating.jar,/opt/hive/lib/janino-2.7.6.jar,/opt/hive/lib/hive-shims-0.20S-1.2.0.jar,/opt/hive/lib/stringtemplate-3.2.1.jar,/opt/hive/lib/guava-14.0.1.jar,/opt/hive/lib/hive-exec-1.2.0.jar,/opt/hive/lib/bonecp-0.8.0.RELEASE.jar,/opt/hive/lib/opencsv-2.3.jar,/opt/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar,/opt/hive/lib/accumulo-core-1.6.0.jar,/opt/hive/lib/commons-math-2.1.jar,/opt/hive/lib/jetty-all-server-7.6.0.v20120127.jar,/opt/hive/lib/hive-ant-1.2.0.jar,/opt/hive/lib/avro-1.7.5.jar,/opt/hive/lib/datanucleus-rdbms-3.2.9.jar,/opt/hive/lib/hive-beeline-1.2.0.jar,/opt/hive/lib/jcommander-1.32.jar,/opt/hive/lib/commons-cli-1.2.jar,/opt/hive/lib/hive-cli-1.2.0.jar,/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar,/opt/hive/lib/hive-shims-1.2.0.jar,/opt/hive/lib/xz-1.0.jar,/opt/hive/lib/commons-beanutils-1.7.0.jar,/opt/hive/lib/commons-dbcp-1.4.jar,/opt/hive/lib/maven-scm-provider-svnexe-1.4.jar,/opt/hive/lib/hive-shims-scheduler-1.2.0.jar,/opt/hive/lib/hive-service-1.2.0.jar,/opt/hive/lib/commons-collections-3.2.1.jar,/opt/hive/lib/jsr305-3.0.0.jar,/opt/hive/lib/hive-shims-0.23-1.2.0.jar,/opt/hive/lib/maven-scm-provider-svn-commons-1.4.jar,/opt/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar,/opt/hive/lib/curator-framework-2.6.0.jar,/opt/hive/lib/libthrift-0.9.2.jar,/opt/hive/lib/json-20090211.jar,/opt/hive/lib/commons-configuration-1.6.jar,/opt/hive/lib/servlet-api-2.5.jar,/opt/hive/lib/jline-2.12.jar,/opt/hive/lib/joda-time-2.5.jar,/opt/hive/lib/derby-10.11.1.1.jar,/opt/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar,/opt/hive/lib/httpcore-4.4.jar,/opt/hive/lib/junit-4.11.jar,/opt/hive/lib/curator-recipes-2.6.0.jar,/opt/hive/lib/hive-hbase-handler-1.2.0.jar,/opt/hive/lib/accumulo-trace-1.6.0.jar,/opt/hive/lib/accumulo-fate-1.6.0.jar,/opt/hive/lib/curator-client-2.6.0.jar,/opt/hive/lib/tempus-fugit-1.1.jar,/opt/hive/lib/commons-pool-1.5.4.jar,/opt/hive/lib/commons-vfs2-2.0.jar,/opt/hive/lib/ant-1.9.1.jar,/opt/hive/lib/snappy-java-1.0.5.jar,/opt/hive/lib/stax-api-1.0.1.jar,/opt/hive/lib/jetty-all-7.6.0.v20120127.jar,/opt/hive/lib/jdo-api-3.0.1.jar,/opt/hive/lib/groovy-all-2.1.6.jar,/opt/hive/lib/hive-hwi-1.2.0.jar,/opt/hive/lib/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,/opt/hive/lib/hive-common-1.2.0.jar,/opt/hive/lib/maven-scm-api-1.4.jar,/opt/hive/lib/calcite-linq4j-1.2.0-incubating.jar,/opt/hive/lib/datanucleus-core-3.2.10.jar,/opt/hive/lib/jpam-1.1.jar,/opt/hive/lib/velocity-1.5.jar,/opt/hive/lib/activation-1.1.jar,/opt/hive/lib/hive-accumulo-handler-1.2.0.jar,/opt/hive/lib/ant-launcher-1.9.1.jar,/opt/hive/lib/hive-jdbc-1.2.0.jar,/opt/hive/lib/commons-compress-1.4.1.jar,/opt/hive/lib/commons-logging-1.1.3.jar,/opt/hive/lib/hive-serde-1.2.0.jar,/opt/hive/lib/zookeeper-3.4.6.jar,/opt/hive/lib/accumulo-start-1.6.0.jar,/opt/hive/lib/hive-contrib-1.2.0.jar,/opt/hive/lib/log4j-1.2.16.jar,/opt/hive/lib/commons-compiler-2.7.6.jar,/opt/hive/lib/ST4-4.0.4.jar,/opt/hive/lib/calcite-avatica-1.2.0-incubating.jar,/opt/hive/lib/httpclient-4.4.jar,/opt/hive/lib/commons-codec-1.4.jar,/opt/hive/lib/commons-io-2.4.jar,/opt/hive/lib/commons-digester-1.8.jar,/opt/hive/lib/regexp-1.3.jar,/opt/hive/lib/ivy-2.4.0.jar,/opt/hive/lib/eigenbase-properties-1.1.5.jar,/opt/hive/lib/paranamer-2.3.jar,/opt/hive/lib/mail-1.4.1.jar,/opt/hive/lib/asm-commons-3.1.jar,/opt/hive/lib/commons-lang-2.6.jar,/opt/hive/lib/hive-jdbc-1.2.0-standalone.jar,/opt/hive/lib/hive-shims-common-1.2.0.jar,/opt/hive/lib/hamcrest-core-1.1.jar,/opt/hive/lib/super-csv-2.2.0.jar,
  spark.history.ui.port -> 18080
  spark.fileserver.port -> 45090
  spark.history.retainedApplications -> 9999999
  spark.ui.port -> 45100
  spark.shuffle.consolidateFiles -> true
  spark.executor.extraJavaOptions -> -XX:+PrintReferenceGC -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-Djava.library.path=/opt/hadoop/lib/native/
  spark.history.fs.logDirectory -> hdfs:///logs/spark-history
  spark.eventLog.dir -> hdfs:///logs/spark-history/alti_soam/
  spark.executor.extraClassPath ->
spark-hive_2.10-1.6.1.jar:spark-hive-thriftserver_2.10-1.6.1.jar
  spark.driver.port -> 45055
  spark.port.maxRetries -> 999
  spark.executor.port -> 45250
  spark.driver.extraClassPath -> .......



On Wed, Apr 6, 2016 at 6:59 PM, Josh Rosen <joshro...@databricks.com> wrote:

>
> Spark is compiled against a custom fork of Hive 1.2.1 which added shading
> of Protobuf and removed shading of Kryo. What I think that what's happening
> here is that stock Hive 1.2.1 is taking precedence so the Kryo instance
> that it's returning is an instance of shaded/relocated Hive version rather
> than the unshaded, stock Kryo that Spark is expecting here.
>
> I just so happen to have a patch which reintroduces the shading of Kryo
> (motivated by other factors): https://github.com/apache/spark/pull/12215;
> there's a chance that a backport of this patch might fix this problem.
>
> However, I'm a bit curious about how your classpath is set up and why
> stock 1.2.1's shaded Kryo is being used here.
>
> /cc +Marcelo Vanzin <van...@cloudera.com> and +Steve Loughran
> <ste...@hortonworks.com>, who may know more.
>
> On Wed, Apr 6, 2016 at 6:08 PM Soam Acharya <s...@altiscale.com> wrote:
>
>> Hi folks,
>>
>> I have a build of Spark 1.6.1 on which spark sql seems to be functional
>> outside of windowing functions. For example, I can create a simple external
>> table via Hive:
>>
>> CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd
>> string)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY ','
>> LINES TERMINATED BY '\n'
>> STORED AS TEXTFILE
>> LOCATION '/user/test/ps';
>>
>> Ensure that the table is pointing to some valid data, set up spark sql to
>> point to the Hive metastore (we're running Hive 1.2.1) and run a basic test:
>>
>> spark-sql> select * from PSTable;
>> 7239    pts/0   00:24:31        java
>> 9993    pts/9   00:00:00        ps
>> 9994    pts/9   00:00:00        tail
>> 9995    pts/9   00:00:00        sed
>> 9996    pts/9   00:00:00        sed
>>
>> But when I try to run a windowing function which I know runs onHive, I
>> get:
>>
>> spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by
>> a.cmd order by a.time ) from PSTable a;
>> org.apache.spark.SparkException: Task not serializable
>>         at
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>>         at
>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>>         at
>> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>>         at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
>> :
>> :
>> Caused by: java.lang.ClassCastException:
>> org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to
>> com.esotericsoftware.kryo.Kryo
>>         at
>> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178)
>>         at
>> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191)
>>         at
>> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)
>>         at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>>         at
>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>
>> Any thoughts or ideas would be appreciated!
>>
>> Regards,
>>
>> Soam
>>
>

Reply via email to