Re: Minor bug in dynamic form and question on limits of results

moon soo Lee Tue, 30 Jun 2015 19:28:05 -0700

Hi,

Thanks for asking questions.


Once you do
yourRDD.registerTempTable("table")

It supposed to work both
*  z.show(sqlContext.sql("select * from table")*
and
*  %sql select * from table*
The result should be identical.

Could you double check if you're not created sqlContext manually?
If you have created sqlContext manually, then %sql will not recognize the
table created with registerTempTable().

Thanks,
moon

On Tue, Jun 30, 2015 at 7:16 PM Su She <suhsheka...@gmail.com> wrote:

> Okay, I'm really confused haha, some things are working, but I'm not sure
> why..
>
>
> So this works:
>
> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS *test3*(date int, date_time
> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS
> TERMINATED BY ',' LOCATION
> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/'")
>
> val results = hc.sql("select * from *test3*")
>
> val schemaString = "x, y, z, a, value"
>
> import org.apache.spark.sql._
>
> val schema =StructType(schemaString.split(" ").map(fieldName =>
> StructField(fieldName, StringType, true)))
>
> val wikiSchemaRDD = sqlContext.applySchema(results, schema)
>
> wikiSchemaRDD.registerTempTable("*test4*")
>
>
> *z.show(sqlContext.sql("select * from test4 limit 1001"))*
>
> Why does this last line work, but %sql select * from test 4 does not??.
> Zeppelin is able to recognize test4, which was only registered as a temp
> table, so why can I not use %sql on this?
>
> Why can't I do *z.show(sqlContext.sql("select * from results limit
> 1001"))*
>
> All I see as the difference between test4 and results is that test4 was
> applied a schema (but, results is of type org.apache.spark.sql.Row (not
> catalyst.Row)).
>
> Sorry for all the questions, thanks for the answers!
>
>
> On Tue, Jun 30, 2015 at 6:38 PM, Su She <suhsheka...@gmail.com> wrote:
>
>> Thanks for the suggestion Moon, unfortunately, I got the Invocation
>> exception:
>>
>> *sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w
>> string, a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>> ',' LOCATION 'hdfs://ip:8020/user/flume/'")*
>>
>>
>> *sqlContext.sql("select * from test3").registerTempTable("test1")*
>>
>>
>> *%sql select * from test1*
>>
>> *java.lang.reflect.InvocationTargetException*
>>
>> To clarify, my data is in a csv format within the directory /user/flume/
>>
>> so:
>>
>> user/flume/csv1.csv
>> user/flume/csv2.csv
>>
>> The reason I created HiveContext was because when I do:
>>
>> *hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time
>> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS
>> TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")*
>>
>> *val results = hc.sql("select * from test3")*
>>
>> *results.take(10) // I am able to get the results, but when I replace hc
>> with sqlContext, I can't do results.take()*
>>
>> *This last line results an Array of rows*
>>
>> Please let me know if I am doing anything wrong, thanks!
>>
>>
>>
>>
>> On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> Please try not create hc, sqlContext manually, and use zeppelin created
>>> sqlContext. After run
>>>
>>> sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y
>>> string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED
>>> BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'")
>>>
>>> it supposed to access table 'test1' in your query.
>>>
>>> You can also do registerTempTable("test2"), for accessing table 'test2',
>>> but  it supposed from valid dataframe. So not
>>>
>>> sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2")
>>>
>>> but like this
>>>
>>> sqlContext("select * from test1").registerTempTable("test1")
>>>
>>> Tell me if it helps.
>>>
>>> Best,
>>> moon
>>>
>>>
>>> On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote:
>>>
>>>> Hey Moon/All,
>>>>
>>>> sorry for the late reply.
>>>>
>>>> This is the problem I'm encountering when trying to register Hive as a
>>>> temptable. It seems that it cannot find a table, I have bolded this in the
>>>> error message that I've c/p below. Please let me know if this is the best
>>>> way for doing this. My end goal is to execute:
>>>>
>>>> *z.show(hc.sql("select * from test1"))*
>>>>
>>>> Thank you for the help!
>>>>
>>>> *//Code:*
>>>> import sys.process._
>>>> import org.apache.spark.sql.hive._
>>>> val hc = new HiveContext(sc)
>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>>
>>>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string,
>>>> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>>>> LOCATION
>>>> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2")
>>>>
>>>> val results = hc.sql("select * from test2 limit 100") //have also tried
>>>> test1
>>>>
>>>> *//everything works fine upto here, but due to lazy evaluation, i guess
>>>> that doesn't mean much*
>>>> results.map(t => "Name: " + t(0)).collect().foreach(println)
>>>>
>>>> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at
>>>> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project
>>>> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date
>>>> int, date_time string, time string, sensor int, value int) ROW FORMAT
>>>> DELIMITED FIELDS TERMINATED BY ',' LOCATION
>>>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/',
>>>> [result#112] org.apache.spark.SparkException: Job aborted due to stage
>>>> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task
>>>> 0.0 in stage 7.0 (TID 4, localhost):
>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
>>>> attribute, tree: result#105 at
>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>>>> at
>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47)
>>>> at
>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46)
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>>>> at
>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>>>> at
>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46)
>>>> at
>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
>>>> at
>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
>>>> at
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> at
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> at
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at
>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
>>>> at
>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
>>>> at
>>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44)
>>>> at
>>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
>>>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
>>>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
>>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
>>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>> at
>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) at
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745) *Caused by:
>>>> java.lang.RuntimeException: Couldn't find result#105 in [result#112]*
>>>> at scala.sys.package$.error(package.scala:27) at
>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53)
>>>> at
>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47)
>>>> at
>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
>>>> ... 33 more
>>>>
>>>> Thank you!
>>>>
>>>> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Yes, %sql function is only for the tables that has been registered.
>>>>> Using DataFrame is basically similar to what currently you're doing.
>>>>> It needs registerTempTable.
>>>>>
>>>>> Could you share little bit about your problem when registering tables?
>>>>>
>>>>> And really appreciate for reporting a bug!
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Yes, you can change the number of records. The default value is 1000
>>>>>>
>>>>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani <
>>>>>> nihal_bhagchand...@yahoo.com> wrote:
>>>>>>
>>>>>>> Hi Su,
>>>>>>>
>>>>>>> as per my understanding you can change the limit of 1000record from
>>>>>>> the interpreter section by setting up the value for variable 
>>>>>>> "zeppelin.spark.maxResult",
>>>>>>> moon could you please confirm my understanding?
>>>>>>>
>>>>>>> Regards
>>>>>>> Nihal
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hello Everyone,
>>>>>>>
>>>>>>> Excited to be making progress, and thanks for the community for
>>>>>>> providing help along the way.This stuff is all really cool.
>>>>>>>
>>>>>>>
>>>>>>> *Questions:*
>>>>>>>
>>>>>>> *1) *I noticed that the limit for the visual representation is 1000
>>>>>>> results. Are there any short term plans to expand the limit? It seemed a
>>>>>>> little on the low side as many of the reasons for working with 
>>>>>>> spark/hadoop
>>>>>>> is to work with large datasets.
>>>>>>>
>>>>>>> *2) *When can I use the %sql function? Is it only on tables that
>>>>>>> have been registered? I have been having trouble registering tables 
>>>>>>> unless
>>>>>>> I do:
>>>>>>>
>>>>>>> // Apply the schema to the RDD.val peopleSchemaRDD = 
>>>>>>> sqlContext.applySchema(rowRDD, schema)
>>>>>>> // Register the SchemaRDD as a 
>>>>>>> table.peopleSchemaRDD.registerTempTable("people")
>>>>>>>
>>>>>>>
>>>>>>> I am having lots of trouble registering tables through HiveContext
>>>>>>> or even duplicating the Zeppelin tutorial, is this issue mitigated by 
>>>>>>> using
>>>>>>> DataFrames ( I am planning to move to 1.3 very soon)?
>>>>>>>
>>>>>>>
>>>>>>> *Bug:*
>>>>>>>
>>>>>>> When I do this:
>>>>>>> z.show(sqlContext.sql("select * from sensortable limit 100"))
>>>>>>>
>>>>>>> I get the table, but I also get text results in the bottom, please
>>>>>>> see attached image. For some reason, if the image doesn't go through, i
>>>>>>> basically get the table, and everything works well, but the select
>>>>>>> statement also returns text (regardless of its 100 results or all)
>>>>>>>
>>>>>>>
>>>>>>> Thank you !
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Su
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>

Re: Minor bug in dynamic form and question on limits of results

Reply via email to