Hi Beam community, With my GSoC 2025 project concluded, I recently wrote up a blog https://beam.apache.org/blog/gsoc-25-yaml-user-accessibility/ about my experience working on the project.
The work includes example pipelines and workflows for ML use cases with Kafka and Iceberg data sources, using the YAML SDK: - *Streaming Classification Inference <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/sentiment_analysis>:* A pipeline that performs a sentiment analysis task on a stream of YouTube comments read from Kafka. The overall workflow also includes DistilBERT model deployment and serving on Google Cloud Vertex AI where the pipeline can access for remote inferences. - *Streaming Regression Inference <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/taxi_fare>*: A pipeline that performs taxi fare amount predictions on a stream of taxi rides read from Kafka. The overall workflow also includes custom model deployment and serving on Google Cloud Vertex AI where the pipeline can access for remote inferences. - *Batch Anomaly Detection <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/log_analysis>*: A workflow containing model training and several pipelines that leverage Iceberg for storing results, BigQuery for storing vector embeddings and MLTransform for computing embeddings to demonstrate an end-to-end anomaly detection task on a dataset of system logs. - *Feature Engineering & Model Evaluation <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/fraud_detection>*: A workflow containing model training and several pipelines, showcasing an end-to-end fraud detection MLOps solution that generates features and evaluates models to detect credit card transaction frauds. These illustrative pipelines and workflows will be a very nice addition to Beam, especially with Beam 3.0 coming up. I'm also very glad to have been working on this larger goal of democratizing data processing for everyone. And as always, a huge thank you to my mentor Chamikara Jayalath and the larger Beam community for your support throughout this project! Best, Charles
