Hi,

Using spark sql with HiveContext. Spark version is 1.3.1
When running local spark everything works fine. When running on spark
cluster I get ClassNotFoundError org.apache.hadoop.hive.shims.Hadoop23Shims.
This class belongs to hive-shims-0.23, and is a runtime dependency for
spark-hive:

[INFO] org.apache.spark:spark-hive_2.10:jar:1.3.1
[INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20:jar:0.13.1a:runtime
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common-secure:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20S:jar:0.13.1a:runtime
[INFO] |  |  \-
org.spark-project.hive.shims:hive-shims-0.23:jar:0.13.1a:runtime



My spark distribution is:
make-distribution.sh --tgz  -Phive -Phive-thriftserver -DskipTests


If I try to add this dependency to my driver project, then the exception
disappears, but then the task is stuck when registering an rdd as a table
(I get timeout after 30 seconds). I should emphasize that the first rdd I
register as a table is a very small one (about 60K row), and as I said - it
runs swiftly in local.
I suspect maybe other dependencies are missing, but they fail silently.

Would be grateful if anyone knows how to solve it.

Lior

Reply via email to