Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-26 Thread Robert Burke
I'm all aboard for an improved local single machine experience, via my work on Prism. Having consistent, simple to start up single program to iterate SDK or transform development will help with All Beam SDKs provide a consistent experience, vs each one having various levels of "Direct Runner" supp

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-26 Thread Steven van Rossum via dev
+1 on structured data. It opens up possibilities for additional optimizations in encoding (e.g. RLE per stream of a field) and processing (e.g. running ComposedCombineFn components per stream of a field instead of looping through component combiners for a record stream). This change would lay the f

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-23 Thread Kenneth Knowles
With regards to the process of approaching Beam 3.0: A lot of what we describe would just be new stuff that goes into Beam 2.XX as well. This is all good as far as I'm concerned. If there was something where we want to change the default, we could release it early under a `--preview-3.0` flag or s

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-23 Thread Danny McCormick via dev
I'm generally +1 on doing this as well. Things I'm interested in are: - Expanded turnkey transform support (especially ML). I think moving Beam beyond just being a core "here's some pieces, build it yourself" SDK to a tool that can solve business problems is useful. --- Corollary - if we're increa

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread Ahmet Altay via dev
It is excellent to have this discussion and excitement :) I admit I only glanced at the email threads. I apologize if I am repeating some existing ideas. I wanted to share my thoughts: - Focus on the future: Instead of going back to stuff we have not implemented, we can think about what the users

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread XQ Hu via dev
Thanks a lot for these discussions so far! I really like all of the thoughts. If you have some time, please add these thoughts to these public doc: https://docs.google.com/document/d/13r4NvuvFdysqjCTzMHLuUUXjKTIEY3d7oDNIHT6guww/ Everyone should have the write permission. Feel free to add/edit theme

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread Valentyn Tymofieiev via dev
> Key to this will be a push to producing/consuming structured data (as has been mentioned) and also well-structured, language-agnostic configuration. > Unstructured data (aka "everything is bytes with coders") is overrated and should be an exception not the default. Structured data everywhere, w

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread Chamikara Jayalath via dev
Love this idea. I think we kind of have many of these pieces already (or have the capability to implement them) but from a user perspective, they are buried deep down the APIs to be heavily usable. Beam transforms are the core pieces of logic that users would want to apply on their data. So I thin

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread Robert Bradshaw via dev
Echoing many of the comments here, but organizing them under a single theme, I would say a good focus for Beam 3.0 could be centering around being more "transform-centric." Specifically: - Make it easy to mix and match transforms across pipelines and environments (SDKs). Key to this will be a push

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread Kenneth Knowles
I think this is a good idea. Fun fact - I think the first time we talked about "3.0" was 2018. I don't want to break users with 3.0 TBH, despite that being what a major version bump suggests. But I also don't want a triple-digit minor version. I think 3.0 is worthwhile if we have a new emphasis th

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-20 Thread Jan Lukavský
Formatting and coloring. :) Hi XQ, thanks for starting this discussion! I agree we are getting to a point when discussion a major update of Apache Beam might be good idea. Because such window of opportunity happens only once in (quite many) years, I think we should try to use our curre

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-20 Thread Jan Lukavský
Hi XQ, thanks for starting this discussion! I agree we are getting to a point when discussion a major update of Apache Beam might be good idea. Because such window of opportunity happens only once in (quite many) years, I think we should try to use our current experience with the Beam model i

[DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-19 Thread XQ Hu via dev
Hi Beam Community, Lately, I have been thinking about the future of Beam and the potential roadmap towards Beam 3.0. After discussing this with my colleagues at Google, I would like to open a discussion about the path for us to move towards Beam 3.0. As we continue to enhance Beam 2 with new featu