subject:"Re\: Dealing with missing columns in SPARK SQL in JSON"

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Sam Elamin

ah if thats the case then you might need to define the schema before hand. Either that or if you want to infer it then ensure a jsonfile exists with the right schema so spark infers the right columns essentially making both files one dataframe if that makes sense On Tue, Feb 14, 2017 at 3:04 PM,

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Aseem Bansal

Sorry if I trivialized the example. It is the same kind of file and sometimes it could have "a", sometimes "b", sometimes both. I just don't know. That is what I meant by missing columns. It would be good if I read any of the JSON and if I do spark sql and it gave me for json1.json a | b 1 | nul

Re: Dealing with missing columns in SPARK SQL in JSON

2017-02-14 Thread Sam Elamin

I may be missing something super obvious here but can't you combine them into a single dataframe. Left join perhaps? Try writing it in sql " select a from json1 and b from josn2"then run explain to give you a hint to how to do it in code Regards Sam On Tue, 14 Feb 2017 at 14:30, Aseem Bansal wro