Re: Spark SQL Nested Array of JSON with empty field

Christian Hellström Fri, 03 Jun 2016 12:46:01 -0700

If that's your JSON file, then the first problem is that it's incorrectly
formatted.

Apart from that you can just read the JSON into a DataFrame with
sqlContext.read.json() and then select directly on the DataFrame without
having to register a temporary table: jsonDF.select("firstname",
"address.state", ...). Works for me (with a properly formatted JSON
document). To make sure that your JSON is read correctly, check
jsonDF.printSchema. If there is an entry with corrupt records (or similar),
you know there's a problem with the JSON structure.

On 3 June 2016 at 21:31, Jerry Wong <[email protected]> wrote:

> Hi,
>
> I met a problem of empty field in the nested JSON file with Spark SQL. For
> instance,
> There are two lines of JSON file as follows,
>
> {
> "firstname": "Jack",
> "lastname": "Nelson",
> "address": {
> "state": "New York",
> "city": "New York"
> }
> }{
> "firstname": "Landy",
> "middlename": "Ken",
> "lastname": "Yong",
> "address": {
> "state": "California",
> "city": "Los Angles"
> }
> }
>
> I use Spark SQL to get the files like,
> val row = sqlContext.sql("SELECT firstname, middlename, lastname,
> address.state, address.city FROM jsontable")
> The compile will tell me the error of line1: no "middlename".
> How do I handle this case in the SQL sql?
>
> Many thanks in advance!
> Jerry
>
>
>

Re: Spark SQL Nested Array of JSON with empty field

Reply via email to