Hi, Thanks for asking questions.
Once you do yourRDD.registerTempTable("table") It supposed to work both * z.show(sqlContext.sql("select * from table")* and * %sql select * from table* The result should be identical. Could you double check if you're not created sqlContext manually? If you have created sqlContext manually, then %sql will not recognize the table created with registerTempTable(). Thanks, moon On Tue, Jun 30, 2015 at 7:16 PM Su She <suhsheka...@gmail.com> wrote: > Okay, I'm really confused haha, some things are working, but I'm not sure > why.. > > > So this works: > > hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS *test3*(date int, date_time > string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' LOCATION > 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/'") > > val results = hc.sql("select * from *test3*") > > val schemaString = "x, y, z, a, value" > > import org.apache.spark.sql._ > > val schema =StructType(schemaString.split(" ").map(fieldName => > StructField(fieldName, StringType, true))) > > val wikiSchemaRDD = sqlContext.applySchema(results, schema) > > wikiSchemaRDD.registerTempTable("*test4*") > > > *z.show(sqlContext.sql("select * from test4 limit 1001"))* > > Why does this last line work, but %sql select * from test 4 does not??. > Zeppelin is able to recognize test4, which was only registered as a temp > table, so why can I not use %sql on this? > > Why can't I do *z.show(sqlContext.sql("select * from results limit > 1001"))* > > All I see as the difference between test4 and results is that test4 was > applied a schema (but, results is of type org.apache.spark.sql.Row (not > catalyst.Row)). > > Sorry for all the questions, thanks for the answers! > > > On Tue, Jun 30, 2015 at 6:38 PM, Su She <suhsheka...@gmail.com> wrote: > >> Thanks for the suggestion Moon, unfortunately, I got the Invocation >> exception: >> >> *sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w >> string, a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY >> ',' LOCATION 'hdfs://ip:8020/user/flume/'")* >> >> >> *sqlContext.sql("select * from test3").registerTempTable("test1")* >> >> >> *%sql select * from test1* >> >> *java.lang.reflect.InvocationTargetException* >> >> To clarify, my data is in a csv format within the directory /user/flume/ >> >> so: >> >> user/flume/csv1.csv >> user/flume/csv2.csv >> >> The reason I created HiveContext was because when I do: >> >> *hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time >> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS >> TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")* >> >> *val results = hc.sql("select * from test3")* >> >> *results.take(10) // I am able to get the results, but when I replace hc >> with sqlContext, I can't do results.take()* >> >> *This last line results an Array of rows* >> >> Please let me know if I am doing anything wrong, thanks! >> >> >> >> >> On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote: >> >>> Hi, >>> >>> Please try not create hc, sqlContext manually, and use zeppelin created >>> sqlContext. After run >>> >>> sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y >>> string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED >>> BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'") >>> >>> it supposed to access table 'test1' in your query. >>> >>> You can also do registerTempTable("test2"), for accessing table 'test2', >>> but it supposed from valid dataframe. So not >>> >>> sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2") >>> >>> but like this >>> >>> sqlContext("select * from test1").registerTempTable("test1") >>> >>> Tell me if it helps. >>> >>> Best, >>> moon >>> >>> >>> On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote: >>> >>>> Hey Moon/All, >>>> >>>> sorry for the late reply. >>>> >>>> This is the problem I'm encountering when trying to register Hive as a >>>> temptable. It seems that it cannot find a table, I have bolded this in the >>>> error message that I've c/p below. Please let me know if this is the best >>>> way for doing this. My end goal is to execute: >>>> >>>> *z.show(hc.sql("select * from test1"))* >>>> >>>> Thank you for the help! >>>> >>>> *//Code:* >>>> import sys.process._ >>>> import org.apache.spark.sql.hive._ >>>> val hc = new HiveContext(sc) >>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>> >>>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, >>>> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' >>>> LOCATION >>>> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2") >>>> >>>> val results = hc.sql("select * from test2 limit 100") //have also tried >>>> test1 >>>> >>>> *//everything works fine upto here, but due to lazy evaluation, i guess >>>> that doesn't mean much* >>>> results.map(t => "Name: " + t(0)).collect().foreach(println) >>>> >>>> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at >>>> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project >>>> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date >>>> int, date_time string, time string, sensor int, value int) ROW FORMAT >>>> DELIMITED FIELDS TERMINATED BY ',' LOCATION >>>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/', >>>> [result#112] org.apache.spark.SparkException: Job aborted due to stage >>>> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task >>>> 0.0 in stage 7.0 (TID 4, localhost): >>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding >>>> attribute, tree: result#105 at >>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46) >>>> at >>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >>>> at >>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >>>> at >>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>> at >>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>> at >>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at >>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at >>>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at >>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54) >>>> at >>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >>>> at >>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >>>> at >>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44) >>>> at >>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) >>>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >>>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at >>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at >>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>>> at >>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) at >>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) *Caused by: >>>> java.lang.RuntimeException: Couldn't find result#105 in [result#112]* >>>> at scala.sys.package$.error(package.scala:27) at >>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47) >>>> at >>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) >>>> ... 33 more >>>> >>>> Thank you! >>>> >>>> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> Yes, %sql function is only for the tables that has been registered. >>>>> Using DataFrame is basically similar to what currently you're doing. >>>>> It needs registerTempTable. >>>>> >>>>> Could you share little bit about your problem when registering tables? >>>>> >>>>> And really appreciate for reporting a bug! >>>>> >>>>> Thanks, >>>>> moon >>>>> >>>>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org> >>>>> wrote: >>>>> >>>>>> Yes, you can change the number of records. The default value is 1000 >>>>>> >>>>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani < >>>>>> nihal_bhagchand...@yahoo.com> wrote: >>>>>> >>>>>>> Hi Su, >>>>>>> >>>>>>> as per my understanding you can change the limit of 1000record from >>>>>>> the interpreter section by setting up the value for variable >>>>>>> "zeppelin.spark.maxResult", >>>>>>> moon could you please confirm my understanding? >>>>>>> >>>>>>> Regards >>>>>>> Nihal >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hello Everyone, >>>>>>> >>>>>>> Excited to be making progress, and thanks for the community for >>>>>>> providing help along the way.This stuff is all really cool. >>>>>>> >>>>>>> >>>>>>> *Questions:* >>>>>>> >>>>>>> *1) *I noticed that the limit for the visual representation is 1000 >>>>>>> results. Are there any short term plans to expand the limit? It seemed a >>>>>>> little on the low side as many of the reasons for working with >>>>>>> spark/hadoop >>>>>>> is to work with large datasets. >>>>>>> >>>>>>> *2) *When can I use the %sql function? Is it only on tables that >>>>>>> have been registered? I have been having trouble registering tables >>>>>>> unless >>>>>>> I do: >>>>>>> >>>>>>> // Apply the schema to the RDD.val peopleSchemaRDD = >>>>>>> sqlContext.applySchema(rowRDD, schema) >>>>>>> // Register the SchemaRDD as a >>>>>>> table.peopleSchemaRDD.registerTempTable("people") >>>>>>> >>>>>>> >>>>>>> I am having lots of trouble registering tables through HiveContext >>>>>>> or even duplicating the Zeppelin tutorial, is this issue mitigated by >>>>>>> using >>>>>>> DataFrames ( I am planning to move to 1.3 very soon)? >>>>>>> >>>>>>> >>>>>>> *Bug:* >>>>>>> >>>>>>> When I do this: >>>>>>> z.show(sqlContext.sql("select * from sensortable limit 100")) >>>>>>> >>>>>>> I get the table, but I also get text results in the bottom, please >>>>>>> see attached image. For some reason, if the image doesn't go through, i >>>>>>> basically get the table, and everything works well, but the select >>>>>>> statement also returns text (regardless of its 100 results or all) >>>>>>> >>>>>>> >>>>>>> Thank you ! >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Su >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >> >