Hi Su, I have already switched to spark 1.4.0, from spark 1.3.0 the concept of DataFrame introduced, which gives more flexibility to manage data in different formats. what is the possibility that you move your zeppelin to spark 1.4.0?you can build your zeppelin by running the following command $ sudo mvn clean package -Pspark-1.4 -Dhadoop.version=2.2.0 -Phadoop-2.2 -DskipTests for more details : apache/incubator-zeppelin | | | | | | | | | | | apache/incubator-zeppelinincubator-zeppelin - Mirror of Apache Zeppelin (Incubating) | | | | View on github.com | Preview by Yahoo | | | | |
RegardsNihal On Wednesday, 17 June 2015 1:41 PM, Su She <suhsheka...@gmail.com> wrote: Couple clarifications: 1) Was able to use sqlContext.sql when using "programitically specifying schema" in this documentation: https://spark.apache.org/docs/1.2.0/sql-programming-guide.html 2) Here is the notebook I ran using this, i was able to run sql commands, but not the %sql commands import sys.process._import org.apache.spark.sql._ // sc is an existing SparkContext.val sqlContext = new org.apache.spark.sql.SQLContext(sc) val wiki = sc.textFile("data/wiki.csv") val schemaString = "date language title pagecounts" import org.apache.spark.sql._ val schema =StructType(schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) val rowRDD = wiki.map(_.split(" ")).map(line => Row(line(0).substring(0, 8),line(1), line(2), line(3))) val wikiSchemaRDD = sqlContext.applySchema(rowRDD, schema) wikiSchemaRDD.registerTempTable("people") val results = sqlContext.sql("SELECT * FROM people") results.take(10) so results returns the correct results however when I try: %sql select date from people java.lang.reflect.InvocationTargetException Hope this adds clarity to my issues, thank you Best, Su On Wed, Jun 17, 2015 at 12:47 AM, Su She <suhsheka...@gmail.com> wrote: Hello Nihal, This is what I got: sc.version: 1.2.1 I couldn't get the name of the tables: I tried it with this line in the code as well as commented out:val sqlContext = new org.apache.spark.sql.SQLContext(sc) error: value tableNames is not a member of org.apache.spark.sql.SQLContext sqlContext.tableNames().foreach(println) However, I don't think the table is registered with sqlContext. For example, if you check the Zeppelin tutorial, you cannot run: val results = sqlContext.sql("select * from bank") //error: table bank not found, however you can run %sql select * from bank When I followed this: https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, I was able to use sqlContext.sql to query results, but I couldn't use %sql in that case :( Thank again for the help and please let me know how I can proceed :) Thanks, Su On Wed, Jun 17, 2015 at 12:10 AM, Nihal Bhagchandani <nihal_bhagchand...@yahoo.com> wrote: Hi Su, could you please check if your bank1 get register as table? -Nihal On Wednesday, 17 June 2015 11:54 AM, Su She <suhsheka...@gmail.com> wrote: Thanks Nihal for the suggestion, I kinda realized what the problem is.I realized that zeppelin will use the hivecontext unless it is set to false. So I set it to false in env.sh and the 3 charts at the bottom of the tutorial work as then SQLContext becomes the default instead of HiveContext. However, I am having trouble running my own version of this notebook. As I was having problems with the notebook, I c/p the code from the tutorial and instead of bank-full.csv I used wiki.csv. I followed the same format as the tutorial and i kept on getting errors. I kept on trying to simplify code and this is where I ended up with: PARA1: val wiki = bankText.map(s => s.split(" ")).map( s => Bank(s(3).toInt, "secondary", "third", "fourth", s(4).toInt ))wiki.registerTempTable("bank1") PARA2: wiki.take(10) Result: res213: Array[Bank] = Array(Bank(2,secondary,third,fourth,9980), Bank(1,secondary,third,fourth,465), Bank(1,secondary,third,fourth,16086), COMPARE THIS TO bank.take(10) from the tutorial: res188: Array[Bank] = Array(Bank(58,management,married,tertiary,2143), Bank(44,technician,single,secondary,29), Bank(33,entrepreneur,married,secondary,2), Bank(47,blue-collar,married,unknown,1506), Bank(33,unknown,single,unknown,1) PARA3: %sql select age, count(1) value from bank1where age < 33 group by age order by age java.lang.reflect.InvocationTargetException I'm not sure what I'm doing wrong. The new array has the same data format, but different values. It doesn't seem like there are any extra spaces and such. On Tue, Jun 16, 2015 at 10:55 PM, Nihal Bhagchandani <nihal_bhagchand...@yahoo.com> wrote: Hi Su, it seems like your table is not getting registered. can you try the following:if you have used the following line "val sqlContext = new org.apache.spark.sql.SQLContext(sc)" I would suggest to comment it, as zeppelin creates sqlContext byDefault. if you didnt have the above line do write following lines at the end of paragraph and run: sqlContext.tableNames().foreach(println) // this should print all the tables register with current sqlContext on output section. you can also check you spark version by running following commandsc.version -Nihal On Wednesday, 17 June 2015 10:01 AM, Su She <suhsheka...@gmail.com> wrote: Hello, excited to get Zeppelin up and running! 1) I was not able to go through the Zeppelin tutorial notebook. I did remove toDF which made that paragraph work, but the 3 graphs at the bottom all returned the InvocationTargetException 2) From a couple other threads on the archive it seems like this error means that it isn't connected to Spark: a) I am running it locally b) I created a new notebook and I was able to run spark commands and create a table using sqlContext and query it, so this means that it is connected to Spark right? c) I am able to do: val results = sqlContext.sql("SELECT * FROM wiki") but i can't do: %sql select pagecounts, count(1) from wiki 3) I am a bit confused on how to get the visualizations. I understand the %table command, but do I use %table when running Spark jobs or do I use %sql? Thanks! -Su