Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Ted Yu
Looking at SQLContext.scala (in master branch), jsonFile() returns DataFrame directly: def jsonFile(path: String, samplingRatio: Double): DataFrame = FYI On Sun, May 3, 2015 at 2:14 AM, ayan guha wrote: > Yes it is possible. You need to use jsonfile method on SQL context and > then create a d

Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Dean Wampler
Note that each JSON object has to be on a single line in the files. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @deanwampler http://polyglotprogramming.com

Re: Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread ayan guha
Yes it is possible. You need to use jsonfile method on SQL context and then create a dataframe from the rdd. Then register it as a table. Should be 3 lines of code, thanks to spark. You may see few YouTube video esp for unifying pipelines. On 3 May 2015 19:02, "Jai" wrote: > Hi, > > I am noob to

Spark distributed SQL: JSON Data set on all worker node

2015-05-03 Thread Jai
Hi, I am noob to spark and related technology. i have JSON stored at same location on all worker clients spark cluster). I am looking to load JSON data set on these clients and do SQL query, like distributed SQL. is it possible to achieve? right now, master submits task to one node only. Thank