Yeah, turning it into an RDD should preserve the incremental planning.
On Tue, Jun 28, 2016 at 6:30 PM, Holden Karau wrote:
> Ok, that makes sense (the JIRA where the restriction note was added didn't
> have a lot of details). So for now, would converting to an RDD inside of a
> custom Sink and
Ok, that makes sense (the JIRA where the restriction note was added didn't
have a lot of details). So for now, would converting to an RDD inside of a
custom Sink and then doing your operations on that be a reasonable work
around?
On Tuesday, June 28, 2016, Michael Armbrust wrote:
> This is not t
This is not too broadly worded, and in general I would caution that any
interface in org.apache.spark.sql.catalyst or
org.apache.spark.sql.execution is considered internal and likely to change
in between releases. We do plan to open a stable source/sink API in a
future release.
The problem here i
Looking at the Sink in 2.0 there is a warning (added in SPARK-16020 without
a lot of details) that says "Note: You cannot apply any operators on `data`
except consuming it (e.g., `collect/foreach`)." but I'm wondering if this
restriction is perhaps too broadly worded? Provided that we consume the
d