Re: Spark SQL JSON Column Support

Michael Armbrust Wed, 28 Sep 2016 12:18:02 -0700

Burak, you can configure what happens with corrupt records for the
datasource using the parse mode.  The parse will still fail, so we can't
get any data out of it, but we do leave the JSON in another column for you
to inspect.

In the case of this function, we'll just return null if its unparable.  You
could filter for rows where the function returns null and inspect the input
if you want to see whats going wrong.

When you talk about ‘user specified schema’ do you mean for the user to
> supply an additional schema, or that you’re using the schema that’s
> described by the JSON string?

I mean we don't do schema inference (which we might consider adding, but
that would be a much larger change than this PR).  You need to construct a
StructType that says what columns you want to extract from the JSON column
and pass that in.  I imagine in many cases the user will run schema
inference ahead of time and then encode the inferred schema into their
program.

On Wed, Sep 28, 2016 at 11:04 AM, Burak Yavuz <brk...@gmail.com> wrote:

> I would really love something like this! It would be great if it doesn't
> throw away corrupt_records like the Data Source.
>
> On Wed, Sep 28, 2016 at 11:02 AM, Nathan Lande <nathanla...@gmail.com>
> wrote:
>
>> We are currently pulling out the JSON columns, passing them through
>> read.json, and then joining them back onto the initial DF so something like
>> from_json would be a nice quality of life improvement for us.
>>
>> On Wed, Sep 28, 2016 at 10:52 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> Spark SQL has great support for reading text files that contain JSON
>>> data. However, in many cases the JSON data is just one column amongst
>>> others. This is particularly true when reading from sources such as Kafka. 
>>> This
>>> PR <https://github.com/apache/spark/pull/15274> adds a new functions
>>> from_json that converts a string column into a nested StructType with a
>>> user specified schema, using the same internal logic as the json Data
>>> Source.
>>>
>>> Would love to hear any comments / suggestions.
>>>
>>> Michael
>>>
>>
>>
>

Re: Spark SQL JSON Column Support

Reply via email to