Ahh okay, so when I don't use or create HiveContext it now works. It seems though that I have to apply the schema to "val results = sqlContext.sql("select * from table)" before being able to register the table in a manner sqlContext is able to see the table.
Thanks for the answers Moon! On Tue, Jun 30, 2015 at 7:27 PM, moon soo Lee <m...@apache.org> wrote: > Hi, > > Thanks for asking questions. > > Once you do > yourRDD.registerTempTable("table") > > It supposed to work both > * z.show(sqlContext.sql("select * from table")* > and > * %sql select * from table* > The result should be identical. > > Could you double check if you're not created sqlContext manually? > If you have created sqlContext manually, then %sql will not recognize the > table created with registerTempTable(). > > Thanks, > moon > > On Tue, Jun 30, 2015 at 7:16 PM Su She <suhsheka...@gmail.com> wrote: > >> Okay, I'm really confused haha, some things are working, but I'm not sure >> why.. >> >> >> So this works: >> >> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS *test3*(date int, date_time >> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS >> TERMINATED BY ',' LOCATION >> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/'") >> >> val results = hc.sql("select * from *test3*") >> >> val schemaString = "x, y, z, a, value" >> >> import org.apache.spark.sql._ >> >> val schema =StructType(schemaString.split(" ").map(fieldName => >> StructField(fieldName, StringType, true))) >> >> val wikiSchemaRDD = sqlContext.applySchema(results, schema) >> >> wikiSchemaRDD.registerTempTable("*test4*") >> >> >> *z.show(sqlContext.sql("select * from test4 limit 1001"))* >> >> Why does this last line work, but %sql select * from test 4 does not??. >> Zeppelin is able to recognize test4, which was only registered as a temp >> table, so why can I not use %sql on this? >> >> Why can't I do *z.show(sqlContext.sql("select * from results limit >> 1001"))* >> >> All I see as the difference between test4 and results is that test4 was >> applied a schema (but, results is of type org.apache.spark.sql.Row (not >> catalyst.Row)). >> >> Sorry for all the questions, thanks for the answers! >> >> >> On Tue, Jun 30, 2015 at 6:38 PM, Su She <suhsheka...@gmail.com> wrote: >> >>> Thanks for the suggestion Moon, unfortunately, I got the Invocation >>> exception: >>> >>> *sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w >>> string, a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY >>> ',' LOCATION 'hdfs://ip:8020/user/flume/'")* >>> >>> >>> *sqlContext.sql("select * from test3").registerTempTable("test1")* >>> >>> >>> *%sql select * from test1* >>> >>> *java.lang.reflect.InvocationTargetException* >>> >>> To clarify, my data is in a csv format within the directory /user/flume/ >>> >>> so: >>> >>> user/flume/csv1.csv >>> user/flume/csv2.csv >>> >>> The reason I created HiveContext was because when I do: >>> >>> *hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time >>> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS >>> TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")* >>> >>> *val results = hc.sql("select * from test3")* >>> >>> *results.take(10) // I am able to get the results, but when I replace hc >>> with sqlContext, I can't do results.take()* >>> >>> *This last line results an Array of rows* >>> >>> Please let me know if I am doing anything wrong, thanks! >>> >>> >>> >>> >>> On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote: >>> >>>> Hi, >>>> >>>> Please try not create hc, sqlContext manually, and use zeppelin created >>>> sqlContext. After run >>>> >>>> sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y >>>> string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED >>>> BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'") >>>> >>>> it supposed to access table 'test1' in your query. >>>> >>>> You can also do registerTempTable("test2"), for accessing table >>>> 'test2', but it supposed from valid dataframe. So not >>>> >>>> sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2") >>>> >>>> but like this >>>> >>>> sqlContext("select * from test1").registerTempTable("test1") >>>> >>>> Tell me if it helps. >>>> >>>> Best, >>>> moon >>>> >>>> >>>> On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote: >>>> >>>>> Hey Moon/All, >>>>> >>>>> sorry for the late reply. >>>>> >>>>> This is the problem I'm encountering when trying to register Hive as a >>>>> temptable. It seems that it cannot find a table, I have bolded this in the >>>>> error message that I've c/p below. Please let me know if this is the best >>>>> way for doing this. My end goal is to execute: >>>>> >>>>> *z.show(hc.sql("select * from test1"))* >>>>> >>>>> Thank you for the help! >>>>> >>>>> *//Code:* >>>>> import sys.process._ >>>>> import org.apache.spark.sql.hive._ >>>>> val hc = new HiveContext(sc) >>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>>>> >>>>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, >>>>> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' >>>>> LOCATION >>>>> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2") >>>>> >>>>> val results = hc.sql("select * from test2 limit 100") //have also >>>>> tried test1 >>>>> >>>>> *//everything works fine upto here, but due to lazy evaluation, i >>>>> guess that doesn't mean much* >>>>> results.map(t => "Name: " + t(0)).collect().foreach(println) >>>>> >>>>> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at >>>>> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 >>>>> !Project >>>>> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date >>>>> int, date_time string, time string, sensor int, value int) ROW FORMAT >>>>> DELIMITED FIELDS TERMINATED BY ',' LOCATION >>>>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/', >>>>> [result#112] org.apache.spark.SparkException: Job aborted due to stage >>>>> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost >>>>> task >>>>> 0.0 in stage 7.0 (TID 4, localhost): >>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding >>>>> attribute, tree: result#105 at >>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) >>>>> at >>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47) >>>>> at >>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46) >>>>> at >>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >>>>> at >>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) >>>>> at >>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46) >>>>> at >>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >>>>> at >>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) >>>>> at >>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>>> at >>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>>> at >>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at >>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at >>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at >>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54) >>>>> at >>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >>>>> at >>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) >>>>> at >>>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44) >>>>> at >>>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) >>>>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >>>>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at >>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >>>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>>>> at >>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at >>>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) at >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) *Caused by: >>>>> java.lang.RuntimeException: Couldn't find result#105 in [result#112]* >>>>> at scala.sys.package$.error(package.scala:27) at >>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53) >>>>> at >>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47) >>>>> at >>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) >>>>> ... 33 more >>>>> >>>>> Thank you! >>>>> >>>>> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Yes, %sql function is only for the tables that has been registered. >>>>>> Using DataFrame is basically similar to what currently you're doing. >>>>>> It needs registerTempTable. >>>>>> >>>>>> Could you share little bit about your problem when registering tables? >>>>>> >>>>>> And really appreciate for reporting a bug! >>>>>> >>>>>> Thanks, >>>>>> moon >>>>>> >>>>>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien < >>>>>> cornead...@apache.org> wrote: >>>>>> >>>>>>> Yes, you can change the number of records. The default value is 1000 >>>>>>> >>>>>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani < >>>>>>> nihal_bhagchand...@yahoo.com> wrote: >>>>>>> >>>>>>>> Hi Su, >>>>>>>> >>>>>>>> as per my understanding you can change the limit of 1000record from >>>>>>>> the interpreter section by setting up the value for variable >>>>>>>> "zeppelin.spark.maxResult", >>>>>>>> moon could you please confirm my understanding? >>>>>>>> >>>>>>>> Regards >>>>>>>> Nihal >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hello Everyone, >>>>>>>> >>>>>>>> Excited to be making progress, and thanks for the community for >>>>>>>> providing help along the way.This stuff is all really cool. >>>>>>>> >>>>>>>> >>>>>>>> *Questions:* >>>>>>>> >>>>>>>> *1) *I noticed that the limit for the visual representation is >>>>>>>> 1000 results. Are there any short term plans to expand the limit? It >>>>>>>> seemed >>>>>>>> a little on the low side as many of the reasons for working with >>>>>>>> spark/hadoop is to work with large datasets. >>>>>>>> >>>>>>>> *2) *When can I use the %sql function? Is it only on tables that >>>>>>>> have been registered? I have been having trouble registering tables >>>>>>>> unless >>>>>>>> I do: >>>>>>>> >>>>>>>> // Apply the schema to the RDD.val peopleSchemaRDD = >>>>>>>> sqlContext.applySchema(rowRDD, schema) >>>>>>>> // Register the SchemaRDD as a >>>>>>>> table.peopleSchemaRDD.registerTempTable("people") >>>>>>>> >>>>>>>> >>>>>>>> I am having lots of trouble registering tables through HiveContext >>>>>>>> or even duplicating the Zeppelin tutorial, is this issue mitigated by >>>>>>>> using >>>>>>>> DataFrames ( I am planning to move to 1.3 very soon)? >>>>>>>> >>>>>>>> >>>>>>>> *Bug:* >>>>>>>> >>>>>>>> When I do this: >>>>>>>> z.show(sqlContext.sql("select * from sensortable limit 100")) >>>>>>>> >>>>>>>> I get the table, but I also get text results in the bottom, please >>>>>>>> see attached image. For some reason, if the image doesn't go through, i >>>>>>>> basically get the table, and everything works well, but the select >>>>>>>> statement also returns text (regardless of its 100 results or all) >>>>>>>> >>>>>>>> >>>>>>>> Thank you ! >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Su >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >>