Hi Akhil, This is for version 1.2.1. Well the other thread that you reference was me attempting it in 1.3.0 to see if the issue was related to 1.2.1. I did not build Spark but used the version from the Spark download site for 1.2.1 Pre Built for Hadoop 2.4 or Later.
Since I get the error in both 1.2.1 and 1.3.0, 15/04/01 14:41:49 INFO ParseDriver: Parse Completed Exception in thread "main" java.lang.ClassNotFoundException: json_tuple at java.net.URLClassLoader$1.run( It looks like I just don't have the jar. Even including all jars in the $HIVE/lib directory did not seem to work. Though when looking in $HIVE/lib for 0.13.1, I do not see any json serde or jackson files. I do see that hive-exec.jar contains the org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple class. Do you know if there is another Jar that is required or should it work just by including all jars from $HIVE/lib? I can build it locally, but did not think that was required based on the version I downloaded; is that not the case? Thanks for the assistance. -Todd On Fri, Apr 3, 2015 at 2:06 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > How did you build spark? which version of spark are you having? Doesn't > this thread already explains it? > https://www.mail-archive.com/user@spark.apache.org/msg25505.html > > Thanks > Best Regards > > On Thu, Apr 2, 2015 at 11:10 PM, Todd Nist <tsind...@gmail.com> wrote: > >> Hi Akhil, >> >> Tried your suggestion to no avail. I actually to not see and "jackson" >> or "json serde" jars in the $HIVE/lib directory. This is hive 0.13.1 and >> spark 1.2.1 >> >> Here is what I did: >> >> I have added the lib folder to the –jars option when starting the >> spark-shell, >> but the job fails. The hive-site.xml is in the $SPARK_HOME/conf directory. >> >> I start the spark-shell as follows: >> >> ./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2 >> --driver-class-path /usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar >> >> and like this >> >> ./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2 >> --driver-class-path /usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar >> --jars /opt/hive/0.13.1/lib/* >> >> I’m just doing this in the spark-shell now: >> >> import org.apache.spark.sql.hive._val sqlContext = new HiveContext(sc)import >> sqlContext._case class MetricTable(path: String, pathElements: String, name: >> String, value: String)val mt = new MetricTable("""path": "/DC1/HOST1/""", >> """pathElements": [{"node": "DataCenter","value": "DC1"},{"node": >> "host","value": "HOST1"}]""", >> """name": "Memory Usage (%)""", >> """value": 29.590943279257175""")val rdd1 = sc.makeRDD(List(mt)) >> rdd1.printSchema() >> rdd1.registerTempTable("metric_table") >> sql( >> """SELECT path, name, value, v1.peValue, v1.peName >> FROM metric_table >> lateral view json_tuple(pathElements, 'name', 'value') v1 >> as peName, peValue >> """) >> .collect.foreach(println(_)) >> >> It results in the same error: >> >> 15/04/02 12:33:59 INFO ParseDriver: Parsing command: SELECT path, name, >> value, v1.peValue, v1.peName FROM metric_table lateral >> view json_tuple(pathElements, 'name', 'value') v1 as peName, >> peValue >> 15/04/02 12:34:00 INFO ParseDriver: Parse Completed >> res2: org.apache.spark.sql.SchemaRDD = >> SchemaRDD[5] at RDD at SchemaRDD.scala:108== Query Plan ==== Physical Plan == >> java.lang.ClassNotFoundException: json_tuple >> >> Any other suggestions or am I doing something else wrong here? >> >> -Todd >> >> >> >> On Thu, Apr 2, 2015 at 2:00 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Try adding all the jars in your $HIVE/lib directory. If you want the >>> specific jar, you could look fr jackson or json serde in it. >>> >>> Thanks >>> Best Regards >>> >>> On Thu, Apr 2, 2015 at 12:49 AM, Todd Nist <tsind...@gmail.com> wrote: >>> >>>> I have a feeling I’m missing a Jar that provides the support or could >>>> this may be related to https://issues.apache.org/jira/browse/SPARK-5792. >>>> If it is a Jar where would I find that ? I would have thought in the >>>> $HIVE/lib folder, but not sure which jar contains it. >>>> >>>> Error: >>>> >>>> Create Metric Temporary Table for querying15/04/01 14:41:44 INFO >>>> HiveMetaStore: 0: Opening raw store with implemenation >>>> class:org.apache.hadoop.hive.metastore.ObjectStore15/04/01 14:41:44 INFO >>>> ObjectStore: ObjectStore, initialize called15/04/01 14:41:45 INFO >>>> Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will >>>> be ignored15/04/01 14:41:45 INFO Persistence: Property >>>> datanucleus.cache.level2 unknown - will be ignored15/04/01 14:41:45 INFO >>>> BlockManager: Removing broadcast 015/04/01 14:41:45 INFO BlockManager: >>>> Removing block broadcast_015/04/01 14:41:45 INFO MemoryStore: Block >>>> broadcast_0 of size 1272 dropped from memory (free 278018571)15/04/01 >>>> 14:41:45 INFO BlockManager: Removing block broadcast_0_piece015/04/01 >>>> 14:41:45 INFO MemoryStore: Block broadcast_0_piece0 of size 869 dropped >>>> from memory (free 278019440)15/04/01 14:41:45 INFO BlockManagerInfo: >>>> Removed broadcast_0_piece0 on 192.168.1.5:63230 in memory (size: 869.0 B, >>>> free: 265.1 MB)15/04/01 14:41:45 INFO BlockManagerMaster: Updated info of >>>> block broadcast_0_piece015/04/01 14:41:45 INFO BlockManagerInfo: Removed >>>> broadcast_0_piece0 on 192.168.1.5:63278 in memory (size: 869.0 B, free: >>>> 530.0 MB)15/04/01 14:41:45 INFO ContextCleaner: Cleaned broadcast >>>> 015/04/01 14:41:46 INFO ObjectStore: Setting MetaStore object pin classes >>>> with >>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"15/04/01 >>>> 14:41:46 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>> "embedded-only" so does not have its own datastore table.15/04/01 14:41:46 >>>> INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" >>>> is tagged as "embedded-only" so does not have its own datastore >>>> table.15/04/01 14:41:47 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>> "embedded-only" so does not have its own datastore table.15/04/01 14:41:47 >>>> INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" >>>> is tagged as "embedded-only" so does not have its own datastore >>>> table.15/04/01 14:41:47 INFO Query: Reading in results for query >>>> "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used >>>> is closing15/04/01 14:41:47 INFO ObjectStore: Initialized >>>> ObjectStore15/04/01 14:41:47 INFO HiveMetaStore: Added admin role in >>>> metastore15/04/01 14:41:47 INFO HiveMetaStore: Added public role in >>>> metastore15/04/01 14:41:48 INFO HiveMetaStore: No user is added in admin >>>> role, since config is empty15/04/01 14:41:48 INFO SessionState: No Tez >>>> session required at this point. hive.execution.engine=mr.15/04/01 14:41:49 >>>> INFO ParseDriver: Parsing command: SELECT path, name, value, v1.peValue, >>>> v1.peName >>>> FROM metric >>>> lateral view json_tuple(pathElements, 'name', 'value') v1 >>>> as peName, peValue15/04/01 14:41:49 INFO ParseDriver: Parse >>>> CompletedException in thread "main" java.lang.ClassNotFoundException: >>>> json_tuple >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>> at >>>> org.apache.spark.sql.hive.HiveFunctionWrapper.createFunction(Shim13.scala:141) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.function$lzycompute(hiveUdfs.scala:261) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.function(hiveUdfs.scala:261) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.outputInspector$lzycompute(hiveUdfs.scala:267) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.outputInspector(hiveUdfs.scala:267) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.outputDataTypes$lzycompute(hiveUdfs.scala:272) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.outputDataTypes(hiveUdfs.scala:272) >>>> at >>>> org.apache.spark.sql.hive.HiveGenericUdtf.makeOutput(hiveUdfs.scala:278) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.Generator.output(generators.scala:60) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.Generate$$anonfun$1.apply(basicOperators.scala:50) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.Generate$$anonfun$1.apply(basicOperators.scala:50) >>>> at scala.Option.map(Option.scala:145) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.Generate.generatorOutput(basicOperators.scala:50) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.Generate.output(basicOperators.scala:60) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveChildren$1.apply(LogicalPlan.scala:118) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveChildren$1.apply(LogicalPlan.scala:118) >>>> at >>>> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) >>>> at >>>> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) >>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>> at >>>> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) >>>> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) >>>> at >>>> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:118) >>>> at >>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$6$$anonfun$applyOrElse$1.applyOrElse(Analyzer.scala:159) >>>> at >>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$6$$anonfun$applyOrElse$1.applyOrElse(Analyzer.scala:156) >>>> at >>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >>>> at >>>> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1(QueryPlan.scala:71) >>>> at >>>> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1$$anonfun$apply$1.apply(QueryPlan.scala:85) >>>> at >>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>> at >>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>> at >>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>> at >>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105) >>>> at >>>> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:84) >>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>> at >>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >>>> at >>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >>>> at >>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >>>> at >>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >>>> at >>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >>>> at >>>> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:89) >>>> at >>>> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:60) >>>> at >>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$6.applyOrElse(Analyzer.scala:156) >>>> at >>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$6.applyOrElse(Analyzer.scala:153) >>>> at >>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:206) >>>> at >>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:153) >>>> at >>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:152) >>>> at >>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) >>>> at >>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) >>>> at >>>> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) >>>> at scala.collection.immutable.List.foldLeft(List.scala:84) >>>> at >>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) >>>> at >>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) >>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>> at >>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) >>>> at >>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) >>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) >>>> at >>>> com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite$.main(ElasticSearchReadWrite.scala:119) >>>> at >>>> com.opsdatastore.elasticsearch.spark.ElasticSearchReadWrite.main(ElasticSearchReadWrite.scala) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:483) >>>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> >>>> Json: >>>> >>>> "metric": { >>>> >>>> "path": "/PA/Pittsburgh/12345 Westbrook Drive/main/theromostat-1", >>>> "pathElements": [ >>>> { >>>> "node": "State", >>>> "value": "PA" >>>> }, >>>> { >>>> "node": "City", >>>> "value": "Pittsburgh" >>>> }, >>>> { >>>> "node": "Street", >>>> "value": "12345 Westbrook Drive" >>>> }, >>>> { >>>> "node": "level", >>>> "value": "main" >>>> }, >>>> { >>>> "node": "device", >>>> "value": "thermostat" >>>> } >>>> ], >>>> "name": "Current Temperature", >>>> "value": 29.590943279257175, >>>> "timestamp": "2015-03-27T14:53:46+0000" >>>> } >>>> >>>> Here is the code that produces the error: >>>> >>>> // Spark importsimport org.apache.spark.{SparkConf, SparkContext}import >>>> org.apache.spark.SparkContext._ >>>> import org.apache.spark.rdd.RDD >>>> import org.apache.spark.sql.{SchemaRDD,SQLContext}import >>>> org.apache.spark.sql.hive._ >>>> // ES importsimport org.elasticsearch.spark._import >>>> org.elasticsearch.spark.sql._ >>>> def main(args: Array[String]) { >>>> val sc = sparkInit >>>> >>>> @transient >>>> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>> >>>> import hiveContext._ >>>> >>>> val start = System.currentTimeMillis() >>>> >>>> /* >>>> * Read from ES and provide some insights with SparkSQL >>>> */ >>>> val esData = sc.esRDD(s"${ElasticSearch.Index}/${ElasticSearch.Type}") >>>> >>>> esData.collect.foreach(println(_)) >>>> >>>> val end = System.currentTimeMillis() >>>> println(s"Total time: ${end-start} ms") >>>> >>>> println("Create Metric Temporary Table for querying") >>>> >>>> val schemaRDD = hiveContext.sql( >>>> "CREATE TEMPORARY TABLE metric " + >>>> "USING org.elasticsearch.spark.sql " + >>>> "OPTIONS (resource 'device/metric')" ) >>>> >>>> hiveContext.sql( >>>> """SELECT path, name, value, v1.peValue, v1.peName >>>> FROM metric >>>> lateral view json_tuple(pathElements, 'name', 'value') v1 >>>> as peName, peValue >>>> """) >>>> .collect.foreach(println(_)) >>>> } >>>> } >>>> >>>> More than likely I’m missing a jar, but not sure what that would be. >>>> >>>> -Todd >>>> >>> >>> >> >