Re: Structured Streaming Sink in 2.0 collect/foreach restrictions added in SPARK-16020

2016-06-28 Thread Michael Armbrust
Yeah, turning it into an RDD should preserve the incremental planning. On Tue, Jun 28, 2016 at 6:30 PM, Holden Karau wrote: > Ok, that makes sense (the JIRA where the restriction note was added didn't > have a lot of details). So for now, would converting to an RDD inside of a > custom Sink and

Re: Structured Streaming Sink in 2.0 collect/foreach restrictions added in SPARK-16020

2016-06-28 Thread Holden Karau
Ok, that makes sense (the JIRA where the restriction note was added didn't have a lot of details). So for now, would converting to an RDD inside of a custom Sink and then doing your operations on that be a reasonable work around? On Tuesday, June 28, 2016, Michael Armbrust wrote: > This is not t

Re: Structured Streaming Sink in 2.0 collect/foreach restrictions added in SPARK-16020

2016-06-28 Thread Michael Armbrust
This is not too broadly worded, and in general I would caution that any interface in org.apache.spark.sql.catalyst or org.apache.spark.sql.execution is considered internal and likely to change in between releases. We do plan to open a stable source/sink API in a future release. The problem here i

Structured Streaming Sink in 2.0 collect/foreach restrictions added in SPARK-16020

2016-06-28 Thread Holden Karau
Looking at the Sink in 2.0 there is a warning (added in SPARK-16020 without a lot of details) that says "Note: You cannot apply any operators on `data` except consuming it (e.g., `collect/foreach`)." but I'm wondering if this restriction is perhaps too broadly worded? Provided that we consume the d