Totally agree that specifying the schema manually should be the
baseline. LGTM, thanks for working on it. Seems like it looks good
to others too judging by the comment on the PR that it's getting
merged to master :)
On Thu, Sep 29, 2016 at 2:13 PM, Michael Armbrust
wrote:
>> Will this be able t
>
> Will this be able to handle projection pushdown if a given job doesn't
> utilize all the columns in the schema? Or should people have a
per-job schema?
>
As currently written, we will do a little bit of extra work to pull out
fields that aren't needed. I think it would be pretty straight fo
Will this be able to handle projection pushdown if a given job doesn't
utilize all the columns in the schema? Or should people have a
per-job schema?
On Wed, Sep 28, 2016 at 2:17 PM, Michael Armbrust
wrote:
> Burak, you can configure what happens with corrupt records for the
> datasource using t
Burak, you can configure what happens with corrupt records for the
datasource using the parse mode. The parse will still fail, so we can't
get any data out of it, but we do leave the JSON in another column for you
to inspect.
In the case of this function, we'll just return null if its unparable.
Silly question?
When you talk about ‘user specified schema’ do you mean for the user to supply
an additional schema, or that you’re using the schema that’s described by the
JSON string?
(or both? [either/or] )
Thx
On Sep 28, 2016, at 12:52 PM, Michael Armbrust
mailto:mich...@databricks.com>>
I would really love something like this! It would be great if it doesn't
throw away corrupt_records like the Data Source.
On Wed, Sep 28, 2016 at 11:02 AM, Nathan Lande
wrote:
> We are currently pulling out the JSON columns, passing them through
> read.json, and then joining them back onto the i
We are currently pulling out the JSON columns, passing them through
read.json, and then joining them back onto the initial DF so something like
from_json would be a nice quality of life improvement for us.
On Wed, Sep 28, 2016 at 10:52 AM, Michael Armbrust
wrote:
> Spark SQL has great support fo