je-ik commented on issue #18479:
URL: https://github.com/apache/beam/issues/18479#issuecomment-3864109181
Hi @junaiddshaukat, great to hear about your interest. It's also cool that
you have such a big insight into the Beam ecosystem. Maybe we can put down the
design document for the GSoC together? Generally speaking, my point of view
would be that:
1) the skeleton should target the FnAPI to have full support of all SDKs
(it should be possible to optimize the translation for Java SDK later)
2) the skeleton should be "useful", i.e. it should be possible to run at
least some basic Pipelines, that would imply we should target to implement at
least:
- Read
- stateless ParDo
- GBK
- CBK
- Window
- optional: stateful ParDo, later splittable DoFn
3) this is crucial question that needs deeper analysis to make sure that
the DSL aligns correctly with the Apache Beam model. My intuition here would be
that the Processor API would be a better choice, because it is flexible enough
to support the model and yet it should not force us to manually manage state,
etc.
We would also need to define the design of watermarks, bundles (we must pay
attention to Beam's guarantees and make the bundling compatible), timers (if we
would support stateful ParDo).
Would you like to try to sketch a design document we could iterate on? When
we create some basic doc, we can then share it on the dev@ list to get more
feedback and ensure we don't hit a wall during implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]