Thanks for the suggestion Moon, unfortunately, I got the Invocation exception:
*sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w string, a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")* *sqlContext.sql("select * from test3").registerTempTable("test1")* *%sql select * from test1* *java.lang.reflect.InvocationTargetException* To clarify, my data is in a csv format within the directory /user/flume/ so: user/flume/csv1.csv user/flume/csv2.csv The reason I created HiveContext was because when I do: *hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")* *val results = hc.sql("select * from test3")* *results.take(10) // I am able to get the results, but when I replace hc with sqlContext, I can't do results.take()* *This last line results an Array of rows* Please let me know if I am doing anything wrong, thanks! On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote: > Hi, > > Please try not create hc, sqlContext manually, and use zeppelin created > sqlContext. After run > > sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y > string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED > BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'") > > it supposed to access table 'test1' in your query. > > You can also do registerTempTable("test2"), for accessing table 'test2', > but it supposed from valid dataframe. So not > > sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2") > > but like this > > sqlContext("select * from test1").registerTempTable("test1") > > Tell me if it helps. > > Best, > moon > > > On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote: > >> Hey Moon/All, >> >> sorry for the late reply. >> >> This is the problem I'm encountering when trying to register Hive as a >> temptable. It seems that it cannot find a table, I have bolded this in the >> error message that I've c/p below. Please let me know if this is the best >> way for doing this. My end goal is to execute: >> >> *z.show(hc.sql("select * from test1"))* >> >> Thank you for the help! >> >> *//Code:* >> import sys.process._ >> import org.apache.spark.sql.hive._ >> val hc = new HiveContext(sc) >> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >> >> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, >> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' >> LOCATION >> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2") >> >> val results = hc.sql("select * from test2 limit 100") //have also tried >> test1 >> >> *//everything works fine upto here, but due to lazy evaluation, i guess >> that doesn't mean much* >> results.map(t => "Name: " + t(0)).collect().foreach(println) >> >> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at >> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project >> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date >> int, date_time string, time string, sensor int, value int) ROW FORMAT >> DELIMITED FIELDS TERMINATED BY ',' LOCATION >> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/', >> [result#112] org.apache.spark.SparkException: Job aborted due to stage >> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task >> 0.0 in stage 7.0 (TID 4, localhost): >> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding >> attribute, tree: result#105 at >> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) >> at >> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47) >> at >> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) >> at >> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46) >> at >> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >> at >> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at >> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at >> scala.collection.AbstractTraversable.map(Traversable.scala:105) at >> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >> at >> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44) >> at >> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) >> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) *Caused by: >> java.lang.RuntimeException: Couldn't find result#105 in [result#112]* at >> scala.sys.package$.error(package.scala:27) at >> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53) >> at >> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47) >> at >> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) >> ... 33 more >> >> Thank you! >> >> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote: >> >>> Hi, >>> >>> Yes, %sql function is only for the tables that has been registered. >>> Using DataFrame is basically similar to what currently you're doing. It >>> needs registerTempTable. >>> >>> Could you share little bit about your problem when registering tables? >>> >>> And really appreciate for reporting a bug! >>> >>> Thanks, >>> moon >>> >>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org> >>> wrote: >>> >>>> Yes, you can change the number of records. The default value is 1000 >>>> >>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani < >>>> nihal_bhagchand...@yahoo.com> wrote: >>>> >>>>> Hi Su, >>>>> >>>>> as per my understanding you can change the limit of 1000record from >>>>> the interpreter section by setting up the value for variable >>>>> "zeppelin.spark.maxResult", >>>>> moon could you please confirm my understanding? >>>>> >>>>> Regards >>>>> Nihal >>>>> >>>>> >>>>> >>>>> On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> Hello Everyone, >>>>> >>>>> Excited to be making progress, and thanks for the community for >>>>> providing help along the way.This stuff is all really cool. >>>>> >>>>> >>>>> *Questions:* >>>>> >>>>> *1) *I noticed that the limit for the visual representation is 1000 >>>>> results. Are there any short term plans to expand the limit? It seemed a >>>>> little on the low side as many of the reasons for working with >>>>> spark/hadoop >>>>> is to work with large datasets. >>>>> >>>>> *2) *When can I use the %sql function? Is it only on tables that have >>>>> been registered? I have been having trouble registering tables unless I >>>>> do: >>>>> >>>>> // Apply the schema to the RDD.val peopleSchemaRDD = >>>>> sqlContext.applySchema(rowRDD, schema) >>>>> // Register the SchemaRDD as a >>>>> table.peopleSchemaRDD.registerTempTable("people") >>>>> >>>>> >>>>> I am having lots of trouble registering tables through HiveContext or >>>>> even duplicating the Zeppelin tutorial, is this issue mitigated by using >>>>> DataFrames ( I am planning to move to 1.3 very soon)? >>>>> >>>>> >>>>> *Bug:* >>>>> >>>>> When I do this: >>>>> z.show(sqlContext.sql("select * from sensortable limit 100")) >>>>> >>>>> I get the table, but I also get text results in the bottom, please see >>>>> attached image. For some reason, if the image doesn't go through, i >>>>> basically get the table, and everything works well, but the select >>>>> statement also returns text (regardless of its 100 results or all) >>>>> >>>>> >>>>> Thank you ! >>>>> >>>>> Best, >>>>> >>>>> Su >>>>> >>>>> >>>>> >>>> >>