Re: Minor bug in dynamic form and question on limits of results

Su She Tue, 30 Jun 2015 18:40:21 -0700

Thanks for the suggestion Moon, unfortunately, I got the Invocation
exception:


*sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(b int, w string,
a string, x int, y int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 'hdfs://ip:8020/user/flume/'")*


*sqlContext.sql("select * from test3").registerTempTable("test1")*


*%sql select * from test1*

*java.lang.reflect.InvocationTargetException*

To clarify, my data is in a csv format within the directory /user/flume/

so:

user/flume/csv1.csv
user/flume/csv2.csv

The reason I created HiveContext was because when I do:

*hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test3(date int, date_time
string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' LOCATION 'hdfs://ip:8020/user/flume/'")*

*val results = hc.sql("select * from test3")*

*results.take(10) // I am able to get the results, but when I replace hc
with sqlContext, I can't do results.take()*

*This last line results an Array of rows*

Please let me know if I am doing anything wrong, thanks!




On Tue, Jun 30, 2015 at 11:07 AM, moon soo Lee <m...@apache.org> wrote:

> Hi,
>
> Please try not create hc, sqlContext manually, and use zeppelin created
> sqlContext. After run
>
> sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y
> string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED
> BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'")
>
> it supposed to access table 'test1' in your query.
>
> You can also do registerTempTable("test2"), for accessing table 'test2',
> but  it supposed from valid dataframe. So not
>
> sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2")
>
> but like this
>
> sqlContext("select * from test1").registerTempTable("test1")
>
> Tell me if it helps.
>
> Best,
> moon
>
>
> On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote:
>
>> Hey Moon/All,
>>
>> sorry for the late reply.
>>
>> This is the problem I'm encountering when trying to register Hive as a
>> temptable. It seems that it cannot find a table, I have bolded this in the
>> error message that I've c/p below. Please let me know if this is the best
>> way for doing this. My end goal is to execute:
>>
>> *z.show(hc.sql("select * from test1"))*
>>
>> Thank you for the help!
>>
>> *//Code:*
>> import sys.process._
>> import org.apache.spark.sql.hive._
>> val hc = new HiveContext(sc)
>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>
>> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string,
>> time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> LOCATION
>> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2")
>>
>> val results = hc.sql("select * from test2 limit 100") //have also tried
>> test1
>>
>> *//everything works fine upto here, but due to lazy evaluation, i guess
>> that doesn't mean much*
>> results.map(t => "Name: " + t(0)).collect().foreach(println)
>>
>> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at
>> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project
>> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date
>> int, date_time string, time string, sensor int, value int) ROW FORMAT
>> DELIMITED FIELDS TERMINATED BY ',' LOCATION
>> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/',
>> [result#112] org.apache.spark.SparkException: Job aborted due to stage
>> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task
>> 0.0 in stage 7.0 (TID 4, localhost):
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
>> attribute, tree: result#105 at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>> at
>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47)
>> at
>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>> at
>> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46)
>> at
>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
>> at
>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at
>> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
>> at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
>> at
>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44)
>> at
>> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
>> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
>> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56) at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745) *Caused by:
>> java.lang.RuntimeException: Couldn't find result#105 in [result#112]* at
>> scala.sys.package$.error(package.scala:27) at
>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53)
>> at
>> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47)
>> at
>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
>> ... 33 more
>>
>> Thank you!
>>
>> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> Yes, %sql function is only for the tables that has been registered.
>>> Using DataFrame is basically similar to what currently you're doing. It
>>> needs registerTempTable.
>>>
>>> Could you share little bit about your problem when registering tables?
>>>
>>> And really appreciate for reporting a bug!
>>>
>>> Thanks,
>>> moon
>>>
>>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org>
>>> wrote:
>>>
>>>> Yes, you can change the number of records. The default value is 1000
>>>>
>>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani <
>>>> nihal_bhagchand...@yahoo.com> wrote:
>>>>
>>>>> Hi Su,
>>>>>
>>>>> as per my understanding you can change the limit of 1000record from
>>>>> the interpreter section by setting up the value for variable 
>>>>> "zeppelin.spark.maxResult",
>>>>> moon could you please confirm my understanding?
>>>>>
>>>>> Regards
>>>>> Nihal
>>>>>
>>>>>
>>>>>
>>>>>   On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hello Everyone,
>>>>>
>>>>> Excited to be making progress, and thanks for the community for
>>>>> providing help along the way.This stuff is all really cool.
>>>>>
>>>>>
>>>>> *Questions:*
>>>>>
>>>>> *1) *I noticed that the limit for the visual representation is 1000
>>>>> results. Are there any short term plans to expand the limit? It seemed a
>>>>> little on the low side as many of the reasons for working with 
>>>>> spark/hadoop
>>>>> is to work with large datasets.
>>>>>
>>>>> *2) *When can I use the %sql function? Is it only on tables that have
>>>>> been registered? I have been having trouble registering tables unless I 
>>>>> do:
>>>>>
>>>>> // Apply the schema to the RDD.val peopleSchemaRDD = 
>>>>> sqlContext.applySchema(rowRDD, schema)
>>>>> // Register the SchemaRDD as a 
>>>>> table.peopleSchemaRDD.registerTempTable("people")
>>>>>
>>>>>
>>>>> I am having lots of trouble registering tables through HiveContext or
>>>>> even duplicating the Zeppelin tutorial, is this issue mitigated by using
>>>>> DataFrames ( I am planning to move to 1.3 very soon)?
>>>>>
>>>>>
>>>>> *Bug:*
>>>>>
>>>>> When I do this:
>>>>> z.show(sqlContext.sql("select * from sensortable limit 100"))
>>>>>
>>>>> I get the table, but I also get text results in the bottom, please see
>>>>> attached image. For some reason, if the image doesn't go through, i
>>>>> basically get the table, and everything works well, but the select
>>>>> statement also returns text (regardless of its 100 results or all)
>>>>>
>>>>>
>>>>> Thank you !
>>>>>
>>>>> Best,
>>>>>
>>>>> Su
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Minor bug in dynamic form and question on limits of results

Reply via email to