Yeah, turning it into an RDD should preserve the incremental planning.

On Tue, Jun 28, 2016 at 6:30 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> Ok, that makes sense (the JIRA where the restriction note was added didn't
> have a lot of details). So for now, would converting to an RDD inside of a
> custom Sink and then doing your operations on that be a reasonable work
> around?
>
>
> On Tuesday, June 28, 2016, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> This is not too broadly worded, and in general I would caution that any
>> interface in org.apache.spark.sql.catalyst or
>> org.apache.spark.sql.execution is considered internal and likely to change
>> in between releases.  We do plan to open a stable source/sink API in a
>> future release.
>>
>> The problem here is that the DataFrame is constructed using an
>> incrementalized physical query plan.  If you call any operations on the
>> Dataframe that change the logical plan, you will loose prior state and the
>> DataFrame will return an incorrect result.  Since this was discovered late
>> in the release process we decided it was better to document the current
>> behavior, rather than do a large refactoring.
>>
>> On Tue, Jun 28, 2016 at 12:59 PM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> Looking at the Sink in 2.0 there is a warning (added in SPARK-16020
>>> without a lot of details) that says "Note: You cannot apply any operators
>>> on `data` except consuming it (e.g., `collect/foreach`)." but I'm wondering
>>> if this restriction is perhaps too broadly worded? Provided that we consume
>>> the data in a blocking fashion could we apply some other transformation
>>> beforehand? Or is there a better way to get equivalent foreachRDD
>>> functionality with the structured streaming API?
>>>
>>> On somewhat of tangent - would it maybe make sense to mark
>>> transformations on Datasets which are not supported for Streaming use (e.g.
>>> toJson etc.)?
>>>
>>> Cheers,
>>>
>>> Holden :)
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
>

Reply via email to