Okay, I'm really confused haha, some things are working, but I'm not sure why..
So this works: hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS *test3*(date int, date_time string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/'") val results = hc.sql("select * from *test3*") val schemaString = "x, y, z, a, value" import org.apache.spark.sql._ val schema =StructType(schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) val wikiSchemaRDD = sqlContext.applySchema(results, schema) wikiSchemaRDD.registerTempTable("*test4*") *z.show(sqlContext.sql("select * from test4 limit 1001"))* Why does this last line work, but %sql select * from test 4 does not??. Zeppelin is able to recognize test4, which was only registered as a temp table, so why can I not use %sql on this? Why can't I do *z.show(sqlContext.sql("select * from results limit 1001"))* All I see as the difference between test4 and results is that test4 was applied a schema (but, results is of type org.apache.spark.sql.Row (not catalyst.Row)). Sorry for all the questions, thanks for the answers! On Tue, Jun 30, 2015 at 6:38 PM, Su She <suhsheka...@gmail.com> wrote: > Thanks for the suggestion Moon, unfortunately, I got the Invocation > exception: > > *sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w > string, a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY > ',' LOCATION 'hdfs://ip:8020/user/flume/'")* > > > *sqlContext.sql("select * from test3").registerTempTable("test1")* > > > *%sql select * from test1* > > *java.lang.reflect.InvocationTargetException* > > To clarify, my data is in a csv format within the directory /user/flume/ > > so: > > user/flume/csv1.csv > user/flume/csv2.csv > > The reason I created HiveContext was because when I do: > > *hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time > string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")* > > *val results = hc.sql("select * from test3")* > > *results.take(10) // I am able to get the results, but when I replace hc > with sqlContext, I can't do results.take()* > > *This last line results an Array of rows* > > Please let me know if I am doing anything wrong, thanks! > > > > > On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote: > >> Hi, >> >> Please try not create hc, sqlContext manually, and use zeppelin created >> sqlContext. After run >> >> sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y >> string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED >> BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'") >> >> it supposed to access table 'test1' in your query. >> >> You can also do registerTempTable("test2"), for accessing table 'test2', >> but it supposed from valid dataframe. So not >> >> sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2") >> >> but like this >> >> sqlContext("select * from test1").registerTempTable("test1") >> >> Tell me if it helps. >> >> Best, >> moon >> >> >> On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote: >> >>> Hey Moon/All, >>> >>> sorry for the late reply. >>> >>> This is the problem I'm encountering when trying to register Hive as a >>> temptable. It seems that it cannot find a table, I have bolded this in the >>> error message that I've c/p below. Please let me know if this is the best >>> way for doing this. My end goal is to execute: >>> >>> *z.show(hc.sql("select * from test1"))* >>> >>> Thank you for the help! >>> >>> *//Code:* >>> import sys.process._ >>> import org.apache.spark.sql.hive._ >>> val hc = new HiveContext(sc) >>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>> >>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, >>> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' >>> LOCATION >>> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2") >>> >>> val results = hc.sql("select * from test2 limit 100") //have also tried >>> test1 >>> >>> *//everything works fine upto here, but due to lazy evaluation, i guess >>> that doesn't mean much* >>> results.map(t => "Name: " + t(0)).collect().foreach(println) >>> >>> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at >>> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project >>> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date >>> int, date_time string, time string, sensor int, value int) ROW FORMAT >>> DELIMITED FIELDS TERMINATED BY ',' LOCATION >>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/', >>> [result#112] org.apache.spark.SparkException: Job aborted due to stage >>> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task >>> 0.0 in stage 7.0 (TID 4, localhost): >>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding >>> attribute, tree: result#105 at >>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) >>> at >>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47) >>> at >>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46) >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) >>> at >>> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46) >>> at >>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >>> at >>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at >>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at >>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at >>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54) >>> at >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >>> at >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >>> at >>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44) >>> at >>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) >>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> at org.apache.spark.scheduler.Task.run(Task.scala:56) at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) *Caused by: >>> java.lang.RuntimeException: Couldn't find result#105 in [result#112]* >>> at scala.sys.package$.error(package.scala:27) at >>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53) >>> at >>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47) >>> at >>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) >>> ... 33 more >>> >>> Thank you! >>> >>> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote: >>> >>>> Hi, >>>> >>>> Yes, %sql function is only for the tables that has been registered. >>>> Using DataFrame is basically similar to what currently you're doing. It >>>> needs registerTempTable. >>>> >>>> Could you share little bit about your problem when registering tables? >>>> >>>> And really appreciate for reporting a bug! >>>> >>>> Thanks, >>>> moon >>>> >>>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org> >>>> wrote: >>>> >>>>> Yes, you can change the number of records. The default value is 1000 >>>>> >>>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani < >>>>> nihal_bhagchand...@yahoo.com> wrote: >>>>> >>>>>> Hi Su, >>>>>> >>>>>> as per my understanding you can change the limit of 1000record from >>>>>> the interpreter section by setting up the value for variable >>>>>> "zeppelin.spark.maxResult", >>>>>> moon could you please confirm my understanding? >>>>>> >>>>>> Regards >>>>>> Nihal >>>>>> >>>>>> >>>>>> >>>>>> On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> Hello Everyone, >>>>>> >>>>>> Excited to be making progress, and thanks for the community for >>>>>> providing help along the way.This stuff is all really cool. >>>>>> >>>>>> >>>>>> *Questions:* >>>>>> >>>>>> *1) *I noticed that the limit for the visual representation is 1000 >>>>>> results. Are there any short term plans to expand the limit? It seemed a >>>>>> little on the low side as many of the reasons for working with >>>>>> spark/hadoop >>>>>> is to work with large datasets. >>>>>> >>>>>> *2) *When can I use the %sql function? Is it only on tables that >>>>>> have been registered? I have been having trouble registering tables >>>>>> unless >>>>>> I do: >>>>>> >>>>>> // Apply the schema to the RDD.val peopleSchemaRDD = >>>>>> sqlContext.applySchema(rowRDD, schema) >>>>>> // Register the SchemaRDD as a >>>>>> table.peopleSchemaRDD.registerTempTable("people") >>>>>> >>>>>> >>>>>> I am having lots of trouble registering tables through HiveContext or >>>>>> even duplicating the Zeppelin tutorial, is this issue mitigated by using >>>>>> DataFrames ( I am planning to move to 1.3 very soon)? >>>>>> >>>>>> >>>>>> *Bug:* >>>>>> >>>>>> When I do this: >>>>>> z.show(sqlContext.sql("select * from sensortable limit 100")) >>>>>> >>>>>> I get the table, but I also get text results in the bottom, please >>>>>> see attached image. For some reason, if the image doesn't go through, i >>>>>> basically get the table, and everything works well, but the select >>>>>> statement also returns text (regardless of its 100 results or all) >>>>>> >>>>>> >>>>>> Thank you ! >>>>>> >>>>>> Best, >>>>>> >>>>>> Su >>>>>> >>>>>> >>>>>> >>>>> >>> >