[DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

XQ Hu via dev Mon, 19 Aug 2024 14:17:55 -0700

Hi Beam Community,

Lately, I have been thinking about the future of Beam and the potential
roadmap towards Beam 3.0. After discussing this with my colleagues at
Google, I would like to open a discussion about the path for us to move
towards Beam 3.0. As we continue to enhance Beam 2 with new features and
improvements, it's important to look ahead and consider the long-term
vision for the project.


Why Beam 3.0?

I think there are several compelling reasons to start planning for Beam 3.0:

   -

   Opportunity for Major Enhancements: We can introduce significant
   improvements and innovations.
   -

   Mature Beam Primitives: We can re-evaluate and refine the core
   primitives, ensuring their maturity, stability, and ease of use for
   developers.
   -

   Enhanced User Experience: We can introduce new features and APIs that
   significantly improve the developer experience and cater to evolving use
   cases, particularly in the machine learning domain.


Potential Vision for Beam 3

   -

   Best-in-Class for ML: Empower machine learning users with intuitive
   Python interfaces for data processing, model deployment, and evaluation.
   -

   Rich, Portable Transforms: A cross-language library of standardized
   transforms, easily configured and managed via YAML.
   -

   Streamlined Core: Simplified Beam primitives with clear semantics for
   easier development and maintenance.
   -

   Turnkey Solutions: A curated set of powerful transforms for common data
   and ML tasks, including use-case-specific solutions.
   -

   Simplified Streaming: Intuitive interfaces for streaming data with
   robust support for time-sorted input, metrics, and notifications.
   -

   Enhanced Single Runner capabilities: For use cases where a single large
   box which can be kept effectively busy can solve the users needs.

Key Themes

   -

   User-Centric Design: Enhance the overall developer experience with
   simplified APIs and streamlined workflows.
   -

   Runner Consistency: Ensure identical functionality between local and
   remote runners for seamless development and deployment.
   -

   Ubiquitous Data Schema: Standardize data schemas for improved
   interoperability and robustness.
   -

   Expanded SDK Capabilities: Enrich SDKs with powerful new features like
   splittable DataFrames, stable input guarantees, and time-sorted input
   processing.
   -

   Thriving Transform Ecosystem: Foster a rich ecosystem of portable,
   managed turnkey transforms, available across all SDKs.
   -

   Minimized Operational Overhead: Reduce complexity and maintenance burden
   by splitting Beam into smaller, more focused repositories.

Next Steps:

I propose we start by discussing the following:

   -

   High-Level Goals/Vision/Themes: What are the most important goals and
   priorities for Beam 3.0?
   -

   Potential Challenges: What are the biggest challenges we might face
   during the transition to Beam 3.0?
   -

   Timeline: What would be a realistic timeline for planning, developing,
   and releasing Beam 3.0?

This email thread primarily sparks conversations about the anticipated
features of Beam 3.0, however, there is currently no official timeline
commitment. To facilitate the discussions, I created a public doc
<https://docs.google.com/document/d/13r4NvuvFdysqjCTzMHLuUUXjKTIEY3d7oDNIHT6guww/edit>
that we can collaborate on.

I am excited to work with all of you to shape the future of Beam and make
it an even more powerful and user-friendly data processing framework!

Meanwhile, I hope to see many of you at Beam Summit 2024 (
https://beamsummit.org/), where we can have more in-depth conversations
about the future of Beam.

Thanks,

XQ Hu (GitHub: liferoad <https://github.com/liferoad>)
Public Doc for gathering feedback: [Public] Beam 3.0: a discussion doc
<https://docs.google.com/document/d/13r4NvuvFdysqjCTzMHLuUUXjKTIEY3d7oDNIHT6guww/edit>
(PTAL)

[DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

Reply via email to