Re: Minor bug in dynamic form and question on limits of results

Su She Tue, 30 Jun 2015 20:37:52 -0700

Ahh okay, so when I don't use or create HiveContext it now works. It seems
though that I have to apply the schema to "val results =
sqlContext.sql("select * from table)" before being able to register the
table in a manner sqlContext is able to see the table.


Thanks for the answers Moon!

On Tue, Jun 30, 2015 at 7:27 PM, moon soo Lee <m...@apache.org> wrote:

> Hi,
>
> Thanks for asking questions.
>
> Once you do
> yourRDD.registerTempTable("table")
>
> It supposed to work both
> *  z.show(sqlContext.sql("select * from table")*
> and
> *  %sql select * from table*
> The result should be identical.
>
> Could you double check if you're not created sqlContext manually?
> If you have created sqlContext manually, then %sql will not recognize the
> table created with registerTempTable().
>
> Thanks,
> moon
>
> On Tue, Jun 30, 2015 at 7:16 PM Su She <suhsheka...@gmail.com> wrote:
>
>> Okay, I'm really confused haha, some things are working, but I'm not sure
>> why..
>>
>>
>> So this works:
>>
>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS *test3*(date int, date_time
>> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS
>> TERMINATED BY ',' LOCATION
>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/'")
>>
>> val results = hc.sql("select * from *test3*")
>>
>> val schemaString = "x, y, z, a, value"
>>
>> import org.apache.spark.sql._
>>
>> val schema =StructType(schemaString.split(" ").map(fieldName =>
>> StructField(fieldName, StringType, true)))
>>
>> val wikiSchemaRDD = sqlContext.applySchema(results, schema)
>>
>> wikiSchemaRDD.registerTempTable("*test4*")
>>
>>
>> *z.show(sqlContext.sql("select * from test4 limit 1001"))*
>>
>> Why does this last line work, but %sql select * from test 4 does not??.
>> Zeppelin is able to recognize test4, which was only registered as a temp
>> table, so why can I not use %sql on this?
>>
>> Why can't I do *z.show(sqlContext.sql("select * from results limit
>> 1001"))*
>>
>> All I see as the difference between test4 and results is that test4 was
>> applied a schema (but, results is of type org.apache.spark.sql.Row (not
>> catalyst.Row)).
>>
>> Sorry for all the questions, thanks for the answers!
>>
>>
>> On Tue, Jun 30, 2015 at 6:38 PM, Su She <suhsheka...@gmail.com> wrote:
>>
>>> Thanks for the suggestion Moon, unfortunately, I got the Invocation
>>> exception:
>>>
>>> *sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w
>>> string, a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>>> ',' LOCATION 'hdfs://ip:8020/user/flume/'")*
>>>
>>>
>>> *sqlContext.sql("select * from test3").registerTempTable("test1")*
>>>
>>>
>>> *%sql select * from test1*
>>>
>>> *java.lang.reflect.InvocationTargetException*
>>>
>>> To clarify, my data is in a csv format within the directory /user/flume/
>>>
>>> so:
>>>
>>> user/flume/csv1.csv
>>> user/flume/csv2.csv
>>>
>>> The reason I created HiveContext was because when I do:
>>>
>>> *hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time
>>> string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS
>>> TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")*
>>>
>>> *val results = hc.sql("select * from test3")*
>>>
>>> *results.take(10) // I am able to get the results, but when I replace hc
>>> with sqlContext, I can't do results.take()*
>>>
>>> *This last line results an Array of rows*
>>>
>>> Please let me know if I am doing anything wrong, thanks!
>>>
>>>
>>>
>>>
>>> On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Please try not create hc, sqlContext manually, and use zeppelin created
>>>> sqlContext. After run
>>>>
>>>> sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y
>>>> string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED
>>>> BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'")
>>>>
>>>> it supposed to access table 'test1' in your query.
>>>>
>>>> You can also do registerTempTable("test2"), for accessing table
>>>> 'test2', but  it supposed from valid dataframe. So not
>>>>
>>>> sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2")
>>>>
>>>> but like this
>>>>
>>>> sqlContext("select * from test1").registerTempTable("test1")
>>>>
>>>> Tell me if it helps.
>>>>
>>>> Best,
>>>> moon
>>>>
>>>>
>>>> On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote:
>>>>
>>>>> Hey Moon/All,
>>>>>
>>>>> sorry for the late reply.
>>>>>
>>>>> This is the problem I'm encountering when trying to register Hive as a
>>>>> temptable. It seems that it cannot find a table, I have bolded this in the
>>>>> error message that I've c/p below. Please let me know if this is the best
>>>>> way for doing this. My end goal is to execute:
>>>>>
>>>>> *z.show(hc.sql("select * from test1"))*
>>>>>
>>>>> Thank you for the help!
>>>>>
>>>>> *//Code:*
>>>>> import sys.process._
>>>>> import org.apache.spark.sql.hive._
>>>>> val hc = new HiveContext(sc)
>>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>>>
>>>>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string,
>>>>> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>>>>> LOCATION
>>>>> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2")
>>>>>
>>>>> val results = hc.sql("select * from test2 limit 100") //have also
>>>>> tried test1
>>>>>
>>>>> *//everything works fine upto here, but due to lazy evaluation, i
>>>>> guess that doesn't mean much*
>>>>> results.map(t => "Name: " + t(0)).collect().foreach(println)
>>>>>
>>>>> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at
>>>>> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 
>>>>> !Project
>>>>> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date
>>>>> int, date_time string, time string, sensor int, value int) ROW FORMAT
>>>>> DELIMITED FIELDS TERMINATED BY ',' LOCATION
>>>>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/',
>>>>> [result#112] org.apache.spark.SparkException: Job aborted due to stage
>>>>> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost 
>>>>> task
>>>>> 0.0 in stage 7.0 (TID 4, localhost):
>>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
>>>>> attribute, tree: result#105 at
>>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
>>>>> at
>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>>> at
>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>>> at
>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
>>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
>>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at
>>>>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54)
>>>>> at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
>>>>> at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
>>>>> at
>>>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44)
>>>>> at
>>>>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
>>>>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
>>>>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
>>>>> at
>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
>>>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 
>>>>> at
>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
>>>>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>> at
>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745) *Caused by:
>>>>> java.lang.RuntimeException: Couldn't find result#105 in [result#112]*
>>>>> at scala.sys.package$.error(package.scala:27) at
>>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
>>>>> ... 33 more
>>>>>
>>>>> Thank you!
>>>>>
>>>>> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Yes, %sql function is only for the tables that has been registered.
>>>>>> Using DataFrame is basically similar to what currently you're doing.
>>>>>> It needs registerTempTable.
>>>>>>
>>>>>> Could you share little bit about your problem when registering tables?
>>>>>>
>>>>>> And really appreciate for reporting a bug!
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <
>>>>>> cornead...@apache.org> wrote:
>>>>>>
>>>>>>> Yes, you can change the number of records. The default value is 1000
>>>>>>>
>>>>>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani <
>>>>>>> nihal_bhagchand...@yahoo.com> wrote:
>>>>>>>
>>>>>>>> Hi Su,
>>>>>>>>
>>>>>>>> as per my understanding you can change the limit of 1000record from
>>>>>>>> the interpreter section by setting up the value for variable 
>>>>>>>> "zeppelin.spark.maxResult",
>>>>>>>> moon could you please confirm my understanding?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Nihal
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello Everyone,
>>>>>>>>
>>>>>>>> Excited to be making progress, and thanks for the community for
>>>>>>>> providing help along the way.This stuff is all really cool.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Questions:*
>>>>>>>>
>>>>>>>> *1) *I noticed that the limit for the visual representation is
>>>>>>>> 1000 results. Are there any short term plans to expand the limit? It 
>>>>>>>> seemed
>>>>>>>> a little on the low side as many of the reasons for working with
>>>>>>>> spark/hadoop is to work with large datasets.
>>>>>>>>
>>>>>>>> *2) *When can I use the %sql function? Is it only on tables that
>>>>>>>> have been registered? I have been having trouble registering tables 
>>>>>>>> unless
>>>>>>>> I do:
>>>>>>>>
>>>>>>>> // Apply the schema to the RDD.val peopleSchemaRDD = 
>>>>>>>> sqlContext.applySchema(rowRDD, schema)
>>>>>>>> // Register the SchemaRDD as a 
>>>>>>>> table.peopleSchemaRDD.registerTempTable("people")
>>>>>>>>
>>>>>>>>
>>>>>>>> I am having lots of trouble registering tables through HiveContext
>>>>>>>> or even duplicating the Zeppelin tutorial, is this issue mitigated by 
>>>>>>>> using
>>>>>>>> DataFrames ( I am planning to move to 1.3 very soon)?
>>>>>>>>
>>>>>>>>
>>>>>>>> *Bug:*
>>>>>>>>
>>>>>>>> When I do this:
>>>>>>>> z.show(sqlContext.sql("select * from sensortable limit 100"))
>>>>>>>>
>>>>>>>> I get the table, but I also get text results in the bottom, please
>>>>>>>> see attached image. For some reason, if the image doesn't go through, i
>>>>>>>> basically get the table, and everything works well, but the select
>>>>>>>> statement also returns text (regardless of its 100 results or all)
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you !
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Su
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>

Re: Minor bug in dynamic form and question on limits of results

Reply via email to