Hi Jaro, I think this discussion would be more visible to the DataFusion
developers if you filed a ticket or discussion in the repository [1]

[1]: https://github.com/apache/arrow-datafusion

On Thu, Sep 21, 2023 at 4:47 AM Jaroslaw Nowosad <yare...@gmail.com> wrote:

> Hi,
>
> Looking for comments/your view:
>
> Would it be possible to:
> 1. patch datafusion dataframe to make df.state public
> 2. patch datafusion adding method to  dataframe ie:
> df.transform_logical_plan(mut self, new_plan) -> df where some
> original plan could be modified / injected with NewPlanNode
> (UserDefinedPlanNode).
>
> Reason:
> I'm working on "writer to kafka topic", on top of datafusion using
> ballista - to use proper distribution I need to change dataframe
> output to be processed/sent on each executor.
> To do this currently I need to have access to both dataframe and
> context: I need to get a state to change dataframe on-the-fly to
> inject it with my own UserDefinedLogicalNode.
>
> Current code works, but looks little "messy":
> df.write(ballista_ctx, "kafka://topic:port?brokers", Format::JSON);
>
> if I had public access to df.state that would look like:
> df.write_json("kafka://topic:port?brokers");
>
>
> Cheers,
> Jaro
>

Reply via email to