Re: Using Spark to analyze complex JSON

2014-05-25 Thread Michael Armbrust
On Sat, May 24, 2014 at 11:47 PM, Mayur Rustagi wrote: > > Is the in-memory columnar store planned as part of SparkSQL ? > This has already been ported from Shark, and is used when you run cacheTable. > Also will both HiveQL & SQLParser be kept updated? > Yeah, we need to figure out exactly wha

Re: Using Spark to analyze complex JSON

2014-05-24 Thread Mayur Rustagi
Hi Michael, Is the in-memory columnar store planned as part of SparkSQL ? Also will both HiveQL & SQLParser be kept updated? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Sun, May 25, 2014 at 2:44 AM, Michael Armbrus

Re: Using Spark to analyze complex JSON

2014-05-24 Thread Michael Armbrust
> But going back to your presented pattern, I have a question. Say your data > does have a fixed structure, but some of the JSON values are lists. How > would you map that to a SchemaRDD? (I didn’t notice any list values in the > CandyCrush example.) Take the likes field from my original example: >

Re: Using Spark to analyze complex JSON

2014-05-23 Thread Nicholas Chammas
Michael, What an excellent example! Thank you for posting such a detailed explanation and sample code. So I see what you’re doing and it looks like it works very well as long as your source data has a well-known and fixed structure. I’m looking for a pattern that can be used to expose JSON data f

Re: Using Spark to analyze complex JSON

2014-05-22 Thread Michael Cutler
I am not 100% sure of the functionality in Catalyst, probably the easiest way to see what it supports is to look at SqlParser.scalain GIT. Straight away I can see " LIKE", "RLIKE"

Re: Using Spark to analyze complex JSON

2014-05-22 Thread Flavio Pompermaier
Is there a way to query fields by similarity (like Lucene or using a similarity metric) to be able to query something like WHERE language LIKE "it~0.5" ? Best, Flavio On Thu, May 22, 2014 at 8:56 AM, Michael Cutler wrote: > Hi Nick, > > Here is an illustrated example which extracts certain fie

Re: Using Spark to analyze complex JSON

2014-05-21 Thread Michael Cutler
Hi Nick, Here is an illustrated example which extracts certain fields from Facebook messages, each one is a JSON object and they are serialised into files with one complete JSON object per line. Example of one such message: CandyCrush.json You

Re: Using Spark to analyze complex JSON

2014-05-21 Thread Nicholas Chammas
That's a good idea. So you're saying create a SchemaRDD by applying a function that deserializes the JSON and transforms it into a relational structure, right? The end goal for my team would be to expose some JDBC endpoint for analysts to query from, so once Shark is updated to use Spark SQL that

Re: Using Spark to analyze complex JSON

2014-05-21 Thread Tobias Pfeiffer
Hi, as far as I understand, if you create an RDD with a relational structure from your JSON, you should be able to do much of that already today. For example, take lift-json's deserializer and do something like val json_table: RDD[MyCaseClass] = json_data.flatMap(json => json.extractOpt[MyCaseC

Re: Using Spark to analyze complex JSON

2014-05-21 Thread Nicholas Chammas
Looking forward to that update! Given a table of JSON objects like this one: { "name": "Nick", "location": { "x": 241.6, "y": -22.5 }, "likes": ["ice cream", "dogs", "Vanilla Ice"]} It would be SUPER COOL if we could query that table in a way that is as natural as follows

Re: Using Spark to analyze complex JSON

2014-05-21 Thread Michael Armbrust
You can already extract fields from json data using Hive UDFs. We have an intern working on on better native support this summer. We will be sure to post updates once there is a working prototype. Michael On Tue, May 20, 2014 at 6:46 PM, Nick Chammas wrote: > The Apache Drill