Re: Hive From Spark

Marcelo Vanzin Fri, 22 Aug 2014 11:42:39 -0700

SPARK-2420 is fixed. I don't think it will be in 1.1, though - might
be too risky at this point.


I'm not familiar with spark-sql.

On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee <alee...@hotmail.com> wrote:
> Hopefully there could be some progress on SPARK-2420. It looks like shading
> may be the voted solution among downgrading.
>
> Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark
> 1.1.2?
>
> By the way, regarding bin/spark-sql? Is this more of a debugging tool for
> Spark job integrating with Hive?
> How does people use spark-sql? I'm trying to understand the rationale and
> motivation behind this script, any idea?
>
>
>> Date: Thu, 21 Aug 2014 16:31:08 -0700
>
>> Subject: Re: Hive From Spark
>> From: van...@cloudera.com
>> To: l...@yahoo-inc.com.invalid
>> CC: user@spark.apache.org; u...@spark.incubator.apache.org;
>> pwend...@gmail.com
>
>>
>> Hi Du,
>>
>> I don't believe the Guava change has made it to the 1.1 branch. The
>> Guava doc says "hashInt" was added in 12.0, so what's probably
>> happening is that you have and old version of Guava in your classpath
>> before the Spark jars. (Hadoop ships with Guava 11, so that may be the
>> source of your problem.)
>>
>> On Thu, Aug 21, 2014 at 4:23 PM, Du Li <l...@yahoo-inc.com.invalid> wrote:
>> > Hi,
>> >
>> > This guava dependency conflict problem should have been fixed as of
>> > yesterday according to https://issues.apache.org/jira/browse/SPARK-2420
>> >
>> > However, I just got java.lang.NoSuchMethodError:
>> >
>> > com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>> > by the following code snippet and “mvn3 test” on Mac. I built the latest
>> > version of spark (1.1.0-SNAPSHOT) and installed the jar files to the
>> > local
>> > maven repo. From my pom file I explicitly excluded guava from almost all
>> > possible dependencies, such as spark-hive_2.10-1.1.0.SNAPSHOT, and
>> > hadoop-client. This snippet is abstracted from a larger project. So the
>> > pom.xml includes many dependencies although not all are required by this
>> > snippet. The pom.xml is attached.
>> >
>> > Anybody knows what to fix it?
>> >
>> > Thanks,
>> > Du
>> > -------
>> >
>> > package com.myself.test
>> >
>> > import org.scalatest._
>> > import org.apache.hadoop.io.{NullWritable, BytesWritable}
>> > import org.apache.spark.{SparkContext, SparkConf}
>> > import org.apache.spark.SparkContext._
>> >
>> > class MyRecord(name: String) extends Serializable {
>> > def getWritable(): BytesWritable = {
>> > new
>> > BytesWritable(Option(name).getOrElse("\\N").toString.getBytes("UTF-8"))
>> > }
>> >
>> > final override def equals(that: Any): Boolean = {
>> > if( !that.isInstanceOf[MyRecord] )
>> > false
>> > else {
>> > val other = that.asInstanceOf[MyRecord]
>> > this.getWritable == other.getWritable
>> > }
>> > }
>> > }
>> >
>> > class MyRecordTestSuite extends FunSuite {
>> > // construct an MyRecord by Consumer.schema
>> > val rec: MyRecord = new MyRecord("James Bond")
>> >
>> > test("generated SequenceFile should be readable from spark") {
>> > val path = "./testdata/"
>> >
>> > val conf = new SparkConf(false).setMaster("local").setAppName("test data
>> > exchange with Hive")
>> > conf.set("spark.driver.host", "localhost")
>> > val sc = new SparkContext(conf)
>> > val rdd = sc.makeRDD(Seq(rec))
>> > rdd.map((x: MyRecord) => (NullWritable.get(), x.getWritable()))
>> > .saveAsSequenceFile(path)
>> >
>> > val bytes = sc.sequenceFile(path, classOf[NullWritable],
>> > classOf[BytesWritable]).first._2
>> > assert(rec.getWritable() == bytes)
>> >
>> > sc.stop()
>> > System.clearProperty("spark.driver.port")
>> > }
>> > }
>> >
>> >
>> > From: Andrew Lee <alee...@hotmail.com>
>> > Reply-To: "user@spark.apache.org" <user@spark.apache.org>
>> > Date: Monday, July 21, 2014 at 10:27 AM
>> > To: "user@spark.apache.org" <user@spark.apache.org>,
>> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
>> >
>> > Subject: RE: Hive From Spark
>> >
>> > Hi All,
>> >
>> > Currently, if you are running Spark HiveContext API with Hive 0.12, it
>> > won't
>> > work due to the following 2 libraries which are not consistent with Hive
>> > 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a
>> > common
>> > practice, they should be consistent to work inter-operable).
>> >
>> > These are under discussion in the 2 JIRA tickets:
>> >
>> > https://issues.apache.org/jira/browse/HIVE-7387
>> >
>> > https://issues.apache.org/jira/browse/SPARK-2420
>> >
>> > When I ran the command by tweaking the classpath and build for Spark
>> > 1.0.1-rc3, I was able to create table through HiveContext, however, when
>> > I
>> > fetch the data, due to incompatible API calls in Guava, it breaks. This
>> > is
>> > critical since it needs to map the cllumns to the RDD schema.
>> >
>> > Hive and Hadoop are using an older version of guava libraries (11.0.1)
>> > where
>> > Spark Hive is using guava 14.0.1+.
>> > The community isn't willing to downgrade to 11.0.1 which is the current
>> > version for Hadoop 2.2 and Hive 0.12.
>> > Be aware of protobuf version as well in Hive 0.12 (it uses protobuf
>> > 2.4).
>> >
>> > scala>
>> >
>> > scala> import org.apache.spark.SparkContext
>> > import org.apache.spark.SparkContext
>> >
>> > scala> import org.apache.spark.sql.hive._
>> > import org.apache.spark.sql.hive._
>> >
>> > scala>
>> >
>> > scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> > hiveContext: org.apache.spark.sql.hive.HiveContext =
>> > org.apache.spark.sql.hive.HiveContext@34bee01a
>> >
>> > scala>
>> >
>> > scala> hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
>> > STRING)")
>> > res0: org.apache.spark.sql.SchemaRDD =
>> > SchemaRDD[0] at RDD at SchemaRDD.scala:104
>> > == Query Plan ==
>> > <Native command: executed by Hive>
>> >
>> > scala> hiveContext.hql("LOAD DATA LOCAL INPATH
>> > 'examples/src/main/resources/kv1.txt' INTO TABLE src")
>> > res1: org.apache.spark.sql.SchemaRDD =
>> > SchemaRDD[3] at RDD at SchemaRDD.scala:104
>> > == Query Plan ==
>> > <Native command: executed by Hive>
>> >
>> > scala>
>> >
>> > scala> // Queries are expressed in HiveQL
>> >
>> > scala> hiveContext.hql("FROM src SELECT key,
>> > value").collect().foreach(println)
>> > java.lang.NoSuchMethodError:
>> >
>> > com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>> > at
>> >
>> > org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
>> > at
>> >
>> > org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
>> > at
>> >
>> > org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
>> > at
>> >
>> > org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
>> > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>> > at
>> > org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
>> > at
>> >
>> > org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
>> > at
>> >
>> > org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
>> > at
>> > org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
>> > at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75)
>> > at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92)
>> > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661)
>> > at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
>> > at
>> > org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812)
>> > at
>> > org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52)
>> > at
>> >
>> > org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35)
>> > at
>> >
>> > org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29)
>> > at
>> >
>> > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>> > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776)
>> > at
>> > org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:60)
>> > at
>> >
>> > org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:70)
>> > at
>> >
>> > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply(HiveStrategies.scala:73)
>> > at
>> >
>> > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply(HiveStrategies.scala:73)
>> > at
>> >
>> > org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:280)
>> > at
>> >
>> > org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:69)
>> > at
>> >
>> > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> > at
>> >
>> > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>> > at
>> >
>> > org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>> > at
>> >
>> > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:316)
>> > at
>> >
>> > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:316)
>> > at
>> >
>> > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:319)
>> > at
>> >
>> > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:319)
>> > at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:420)
>> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
>> > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
>> > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
>> > at $iwC$$iwC$$iwC.<init>(<console>:28)
>> > at $iwC$$iwC.<init>(<console>:30)
>> > at $iwC.<init>(<console>:32)
>> > at <init>(<console>:34)
>> > at .<init>(<console>:38)
>> > at .<clinit>(<console>)
>> > at .<init>(<console>:7)
>> > at .<clinit>(<console>)
>> > at $print(<console>)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > at
>> >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> > at
>> > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
>> > at
>> >
>> > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
>> > at
>> > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
>> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
>> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
>> > at
>> > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
>> > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
>> > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
>> > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
>> > at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
>> > at
>> >
>> > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
>> > at org.apache.spark.repl.Main$.main(Main.scala:31)
>> > at org.apache.spark.repl.Main.main(Main.scala)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > at
>> >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
>> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
>> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> >
>> >
>> >
>> >> From: hao.ch...@intel.com
>> >> To: user@spark.apache.org; u...@spark.incubator.apache.org
>> >> Subject: RE: Hive From Spark
>> >> Date: Mon, 21 Jul 2014 01:14:19 +0000
>> >>
>> >> JiaJia, I've checkout the latest 1.0 branch, and then do the following
>> >> steps:
>> >> SPAKR_HIVE=true sbt/sbt clean assembly
>> >> cd examples
>> >> ../bin/run-example sql.hive.HiveFromSpark
>> >>
>> >> It works well in my local
>> >>
>> >> From your log output, it shows "Invalid method name: 'get_table', seems
>> >> an
>> >> incompatible jar version or something wrong between the Hive metastore
>> >> service and client, can you double check the jar versions of Hive
>> >> metastore
>> >> service or thrift?
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: JiajiaJing [mailto:jj.jing0...@gmail.com]
>> >> Sent: Saturday, July 19, 2014 7:29 AM
>> >> To: u...@spark.incubator.apache.org
>> >> Subject: RE: Hive From Spark
>> >>
>> >> Hi Cheng Hao,
>> >>
>> >> Thank you very much for your reply.
>> >>
>> >> Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 .
>> >>
>> >> Some setups of the environment are done by running "SPARK_HIVE=true
>> >> sbt/sbt assembly/assembly", including the jar in all the workers, and
>> >> copying the hive-site.xml to spark's conf dir.
>> >>
>> >> And then run the program as: " ./bin/run-example
>> >> org.apache.spark.examples.sql.hive.HiveFromSpark "
>> >>
>> >> It's good to know that this example runs well on your machine, could
>> >> you
>> >> please give me some insight about your have done as well?
>> >>
>> >> Thank you very much!
>> >>
>> >> Jiajia
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> >> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110p10215.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> >> Nabble.com.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Hive From Spark

Reply via email to