SPARK-2420 is fixed. I don't think it will be in 1.1, though - might be too risky at this point.
I'm not familiar with spark-sql. On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee <alee...@hotmail.com> wrote: > Hopefully there could be some progress on SPARK-2420. It looks like shading > may be the voted solution among downgrading. > > Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark > 1.1.2? > > By the way, regarding bin/spark-sql? Is this more of a debugging tool for > Spark job integrating with Hive? > How does people use spark-sql? I'm trying to understand the rationale and > motivation behind this script, any idea? > > >> Date: Thu, 21 Aug 2014 16:31:08 -0700 > >> Subject: Re: Hive From Spark >> From: van...@cloudera.com >> To: l...@yahoo-inc.com.invalid >> CC: user@spark.apache.org; u...@spark.incubator.apache.org; >> pwend...@gmail.com > >> >> Hi Du, >> >> I don't believe the Guava change has made it to the 1.1 branch. The >> Guava doc says "hashInt" was added in 12.0, so what's probably >> happening is that you have and old version of Guava in your classpath >> before the Spark jars. (Hadoop ships with Guava 11, so that may be the >> source of your problem.) >> >> On Thu, Aug 21, 2014 at 4:23 PM, Du Li <l...@yahoo-inc.com.invalid> wrote: >> > Hi, >> > >> > This guava dependency conflict problem should have been fixed as of >> > yesterday according to https://issues.apache.org/jira/browse/SPARK-2420 >> > >> > However, I just got java.lang.NoSuchMethodError: >> > >> > com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; >> > by the following code snippet and “mvn3 test” on Mac. I built the latest >> > version of spark (1.1.0-SNAPSHOT) and installed the jar files to the >> > local >> > maven repo. From my pom file I explicitly excluded guava from almost all >> > possible dependencies, such as spark-hive_2.10-1.1.0.SNAPSHOT, and >> > hadoop-client. This snippet is abstracted from a larger project. So the >> > pom.xml includes many dependencies although not all are required by this >> > snippet. The pom.xml is attached. >> > >> > Anybody knows what to fix it? >> > >> > Thanks, >> > Du >> > ------- >> > >> > package com.myself.test >> > >> > import org.scalatest._ >> > import org.apache.hadoop.io.{NullWritable, BytesWritable} >> > import org.apache.spark.{SparkContext, SparkConf} >> > import org.apache.spark.SparkContext._ >> > >> > class MyRecord(name: String) extends Serializable { >> > def getWritable(): BytesWritable = { >> > new >> > BytesWritable(Option(name).getOrElse("\\N").toString.getBytes("UTF-8")) >> > } >> > >> > final override def equals(that: Any): Boolean = { >> > if( !that.isInstanceOf[MyRecord] ) >> > false >> > else { >> > val other = that.asInstanceOf[MyRecord] >> > this.getWritable == other.getWritable >> > } >> > } >> > } >> > >> > class MyRecordTestSuite extends FunSuite { >> > // construct an MyRecord by Consumer.schema >> > val rec: MyRecord = new MyRecord("James Bond") >> > >> > test("generated SequenceFile should be readable from spark") { >> > val path = "./testdata/" >> > >> > val conf = new SparkConf(false).setMaster("local").setAppName("test data >> > exchange with Hive") >> > conf.set("spark.driver.host", "localhost") >> > val sc = new SparkContext(conf) >> > val rdd = sc.makeRDD(Seq(rec)) >> > rdd.map((x: MyRecord) => (NullWritable.get(), x.getWritable())) >> > .saveAsSequenceFile(path) >> > >> > val bytes = sc.sequenceFile(path, classOf[NullWritable], >> > classOf[BytesWritable]).first._2 >> > assert(rec.getWritable() == bytes) >> > >> > sc.stop() >> > System.clearProperty("spark.driver.port") >> > } >> > } >> > >> > >> > From: Andrew Lee <alee...@hotmail.com> >> > Reply-To: "user@spark.apache.org" <user@spark.apache.org> >> > Date: Monday, July 21, 2014 at 10:27 AM >> > To: "user@spark.apache.org" <user@spark.apache.org>, >> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org> >> > >> > Subject: RE: Hive From Spark >> > >> > Hi All, >> > >> > Currently, if you are running Spark HiveContext API with Hive 0.12, it >> > won't >> > work due to the following 2 libraries which are not consistent with Hive >> > 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a >> > common >> > practice, they should be consistent to work inter-operable). >> > >> > These are under discussion in the 2 JIRA tickets: >> > >> > https://issues.apache.org/jira/browse/HIVE-7387 >> > >> > https://issues.apache.org/jira/browse/SPARK-2420 >> > >> > When I ran the command by tweaking the classpath and build for Spark >> > 1.0.1-rc3, I was able to create table through HiveContext, however, when >> > I >> > fetch the data, due to incompatible API calls in Guava, it breaks. This >> > is >> > critical since it needs to map the cllumns to the RDD schema. >> > >> > Hive and Hadoop are using an older version of guava libraries (11.0.1) >> > where >> > Spark Hive is using guava 14.0.1+. >> > The community isn't willing to downgrade to 11.0.1 which is the current >> > version for Hadoop 2.2 and Hive 0.12. >> > Be aware of protobuf version as well in Hive 0.12 (it uses protobuf >> > 2.4). >> > >> > scala> >> > >> > scala> import org.apache.spark.SparkContext >> > import org.apache.spark.SparkContext >> > >> > scala> import org.apache.spark.sql.hive._ >> > import org.apache.spark.sql.hive._ >> > >> > scala> >> > >> > scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >> > hiveContext: org.apache.spark.sql.hive.HiveContext = >> > org.apache.spark.sql.hive.HiveContext@34bee01a >> > >> > scala> >> > >> > scala> hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value >> > STRING)") >> > res0: org.apache.spark.sql.SchemaRDD = >> > SchemaRDD[0] at RDD at SchemaRDD.scala:104 >> > == Query Plan == >> > <Native command: executed by Hive> >> > >> > scala> hiveContext.hql("LOAD DATA LOCAL INPATH >> > 'examples/src/main/resources/kv1.txt' INTO TABLE src") >> > res1: org.apache.spark.sql.SchemaRDD = >> > SchemaRDD[3] at RDD at SchemaRDD.scala:104 >> > == Query Plan == >> > <Native command: executed by Hive> >> > >> > scala> >> > >> > scala> // Queries are expressed in HiveQL >> > >> > scala> hiveContext.hql("FROM src SELECT key, >> > value").collect().foreach(println) >> > java.lang.NoSuchMethodError: >> > >> > com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; >> > at >> > >> > org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) >> > at >> > >> > org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) >> > at >> > >> > org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) >> > at >> > >> > org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) >> > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >> > at >> > org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) >> > at >> > >> > org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) >> > at >> > >> > org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) >> > at >> > org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) >> > at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) >> > at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) >> > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) >> > at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) >> > at >> > org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) >> > at >> > org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52) >> > at >> > >> > org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35) >> > at >> > >> > org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29) >> > at >> > >> > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) >> > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) >> > at >> > org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:60) >> > at >> > >> > org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:70) >> > at >> > >> > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply(HiveStrategies.scala:73) >> > at >> > >> > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply(HiveStrategies.scala:73) >> > at >> > >> > org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:280) >> > at >> > >> > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:69) >> > at >> > >> > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >> > at >> > >> > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >> > at >> > >> > org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) >> > at >> > >> > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:316) >> > at >> > >> > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:316) >> > at >> > >> > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:319) >> > at >> > >> > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:319) >> > at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:420) >> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >> > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >> > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >> > at $iwC$$iwC$$iwC.<init>(<console>:28) >> > at $iwC$$iwC.<init>(<console>:30) >> > at $iwC.<init>(<console>:32) >> > at <init>(<console>:34) >> > at .<init>(<console>:38) >> > at .<clinit>(<console>) >> > at .<init>(<console>:7) >> > at .<clinit>(<console>) >> > at $print(<console>) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > at >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > at >> > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) >> > at >> > >> > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) >> > at >> > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) >> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) >> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) >> > at >> > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) >> > at >> > >> > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) >> > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) >> > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) >> > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) >> > at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) >> > at >> > >> > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) >> > at >> > >> > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) >> > at >> > >> > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) >> > at >> > >> > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) >> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) >> > at org.apache.spark.repl.Main$.main(Main.scala:31) >> > at org.apache.spark.repl.Main.main(Main.scala) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > at >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > at java.lang.reflect.Method.invoke(Method.java:606) >> > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) >> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) >> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> > >> > >> > >> >> From: hao.ch...@intel.com >> >> To: user@spark.apache.org; u...@spark.incubator.apache.org >> >> Subject: RE: Hive From Spark >> >> Date: Mon, 21 Jul 2014 01:14:19 +0000 >> >> >> >> JiaJia, I've checkout the latest 1.0 branch, and then do the following >> >> steps: >> >> SPAKR_HIVE=true sbt/sbt clean assembly >> >> cd examples >> >> ../bin/run-example sql.hive.HiveFromSpark >> >> >> >> It works well in my local >> >> >> >> From your log output, it shows "Invalid method name: 'get_table', seems >> >> an >> >> incompatible jar version or something wrong between the Hive metastore >> >> service and client, can you double check the jar versions of Hive >> >> metastore >> >> service or thrift? >> >> >> >> >> >> -----Original Message----- >> >> From: JiajiaJing [mailto:jj.jing0...@gmail.com] >> >> Sent: Saturday, July 19, 2014 7:29 AM >> >> To: u...@spark.incubator.apache.org >> >> Subject: RE: Hive From Spark >> >> >> >> Hi Cheng Hao, >> >> >> >> Thank you very much for your reply. >> >> >> >> Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 . >> >> >> >> Some setups of the environment are done by running "SPARK_HIVE=true >> >> sbt/sbt assembly/assembly", including the jar in all the workers, and >> >> copying the hive-site.xml to spark's conf dir. >> >> >> >> And then run the program as: " ./bin/run-example >> >> org.apache.spark.examples.sql.hive.HiveFromSpark " >> >> >> >> It's good to know that this example runs well on your machine, could >> >> you >> >> please give me some insight about your have done as well? >> >> >> >> Thank you very much! >> >> >> >> Jiajia >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> >> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110p10215.html >> >> Sent from the Apache Spark User List mailing list archive at >> >> Nabble.com. >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> -- >> Marcelo >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org