Hi Su,
I have already switched to spark 1.4.0, from spark 1.3.0 the concept of 
DataFrame introduced, which gives more flexibility to manage data in different 
formats. what is the possibility that you move your zeppelin to spark 1.4.0?you 
can build your zeppelin by running the following command
$ sudo mvn clean package -Pspark-1.4 -Dhadoop.version=2.2.0 -Phadoop-2.2 
-DskipTests 
for more details : apache/incubator-zeppelin
|   |
|   |  |   |   |   |   |   |
| apache/incubator-zeppelinincubator-zeppelin - Mirror of Apache Zeppelin 
(Incubating) |
|  |
| View on github.com | Preview by Yahoo |
|  |
|   |


RegardsNihal 
 


     On Wednesday, 17 June 2015 1:41 PM, Su She <suhsheka...@gmail.com> wrote:
   

 Couple clarifications:
1) Was able to use sqlContext.sql when using "programitically specifying 
schema" in this documentation:  
https://spark.apache.org/docs/1.2.0/sql-programming-guide.html
2) Here is the notebook I ran using this, i was able to run sql commands, but 
not the %sql commands
import sys.process._import org.apache.spark.sql._
// sc is an existing SparkContext.val sqlContext = new 
org.apache.spark.sql.SQLContext(sc)
val wiki = sc.textFile("data/wiki.csv")
val schemaString = "date language title pagecounts"
import org.apache.spark.sql._
val schema =StructType(schemaString.split(" ").map(fieldName => 
StructField(fieldName, StringType, true)))
val rowRDD = wiki.map(_.split(" ")).map(line => Row(line(0).substring(0, 
8),line(1), line(2), line(3)))
val wikiSchemaRDD = sqlContext.applySchema(rowRDD, schema)
wikiSchemaRDD.registerTempTable("people")
val results = sqlContext.sql("SELECT * FROM people")
results.take(10)
so results returns the correct results
however when I try: 
%sql select date from people 

java.lang.reflect.InvocationTargetException

Hope this adds clarity to my issues, thank you
Best,
Su
On Wed, Jun 17, 2015 at 12:47 AM, Su She <suhsheka...@gmail.com> wrote:

Hello Nihal,
This is what I got:
sc.version: 1.2.1
I couldn't get the name of the tables:
I tried it with this line in the code as well as commented out:val sqlContext = 
new org.apache.spark.sql.SQLContext(sc) 
 error: value tableNames is not a member of org.apache.spark.sql.SQLContext 
sqlContext.tableNames().foreach(println)

However, I don't think the table is registered with sqlContext. For example, if 
you check the Zeppelin tutorial, you cannot run:
val results = sqlContext.sql("select * from bank") //error: table bank not 
found, however you can run %sql select * from bank
When I followed this: 
https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, I was able to 
use sqlContext.sql to query results, but I couldn't use %sql in that case :(

Thank again for the help and please let me know how I can proceed :)
Thanks,
Su
On Wed, Jun 17, 2015 at 12:10 AM, Nihal Bhagchandani 
<nihal_bhagchand...@yahoo.com> wrote:

Hi Su,
could you please check if your bank1 get register as table?

-Nihal
 


     On Wednesday, 17 June 2015 11:54 AM, Su She <suhsheka...@gmail.com> wrote:
   

 Thanks Nihal for the suggestion, I kinda realized what the problem is.I 
realized that zeppelin will use the hivecontext unless it is set to false. So I 
set it to false in env.sh and the 3 charts at the bottom of the tutorial work 
as then SQLContext becomes the default instead of HiveContext.

However, I am having trouble running my own version of this notebook.
As I was having problems with the notebook, I c/p the code from the tutorial 
and instead of bank-full.csv I used wiki.csv. I followed the same format as the 
tutorial and i kept on getting errors. I kept on trying to simplify code and 
this is where I ended up with:

PARA1:
val wiki = bankText.map(s => s.split(" ")).map(    s => Bank(s(3).toInt,        
    "secondary",             "third",            "fourth",            
s(4).toInt        ))wiki.registerTempTable("bank1")
PARA2:
wiki.take(10)

Result:
res213: Array[Bank] = Array(Bank(2,secondary,third,fourth,9980), 
Bank(1,secondary,third,fourth,465), Bank(1,secondary,third,fourth,16086),

COMPARE THIS TO bank.take(10) from the tutorial:
res188: Array[Bank] = Array(Bank(58,management,married,tertiary,2143), 
Bank(44,technician,single,secondary,29), 
Bank(33,entrepreneur,married,secondary,2), 
Bank(47,blue-collar,married,unknown,1506), Bank(33,unknown,single,unknown,1)

PARA3:

%sql select age, count(1) value from bank1where age < 33 group by age order by 
age
java.lang.reflect.InvocationTargetException

I'm not sure what I'm doing wrong. The new array has the same data format, but 
different values. It doesn't seem like there are any extra spaces and such. 
On Tue, Jun 16, 2015 at 10:55 PM, Nihal Bhagchandani 
<nihal_bhagchand...@yahoo.com> wrote:

Hi Su,
it seems like your table is not getting registered.
can you try the following:if you have used the following line 
"val sqlContext = new org.apache.spark.sql.SQLContext(sc)" 

I would suggest to comment it, as zeppelin creates sqlContext byDefault.
if you didnt have the above line do write following lines at the end of 
paragraph and run:
sqlContext.tableNames().foreach(println) // this should print all the tables 
register with current sqlContext on output section.
you can also check you spark version by running following commandsc.version
-Nihal


 


     On Wednesday, 17 June 2015 10:01 AM, Su She <suhsheka...@gmail.com> wrote:
   

 Hello,

excited to get Zeppelin up and running!

1) I was not able to go through the Zeppelin tutorial notebook. I did
remove toDF which made that paragraph work, but the 3 graphs at the
bottom all returned the InvocationTargetException

2) From a couple other threads on the archive it seems like this error
means that it isn't connected to Spark:

a) I am running it locally

b) I created a new notebook and I was able to run spark commands and
create a table using sqlContext and query it, so this means that it is
connected to Spark right?

c) I am able to do:

val results = sqlContext.sql("SELECT * FROM wiki")

but i can't do:

%sql select pagecounts, count(1) from wiki

3) I am a bit confused on how to get the visualizations. I understand
the %table command, but do I use %table when running Spark jobs or do
I use %sql?

Thanks!

-Su


   



   





  

Reply via email to