> > sqlContext.jsonFile("data.json") <---- Is this already available in the > master branch??? >
Yes, and it will be available in the soon to come 1.0.1 release. > But the question about the use a combination of resources (Memory > processing & Disk processing) still remains. > This code should work just fine off of disk. I would not recommend trying to cache the JSON data in memory as it is heavily nested and this is a place where the columnar storage code does not do great. Instead, maybe try converting it to parquet and reading that data from disk (tweets.saveAsParquetFile(...); sqlContext.parquetFile(...).registerAsTable(...)) You should see improved compression and much better performance for queries that only read some of the columns. You could also just pull out the relevant columns and cache only that data in memory: sqlContext.jsonFile("data.json").registerAsTable("allTweets") sql("SELECT text FROM allTweets").registerAsTable("tweetText") sqlContext.cacheTable("tweetText")