Hi Beam community,

With my GSoC 2025 project concluded, I recently wrote up a blog
https://beam.apache.org/blog/gsoc-25-yaml-user-accessibility/ about my
experience working on the project.

The work includes example pipelines and workflows for ML use cases with
Kafka and Iceberg data sources, using the YAML SDK:

   - *Streaming Classification Inference
   
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/sentiment_analysis>:*
A
   pipeline that performs a sentiment analysis task on a stream of YouTube
   comments read from Kafka. The overall workflow also includes DistilBERT
   model deployment and serving on Google Cloud Vertex AI where the pipeline
   can access for remote inferences.

   - *Streaming Regression Inference
   
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/taxi_fare>*:
   A pipeline that performs taxi fare amount predictions on a stream of taxi
   rides read from Kafka. The overall workflow also includes custom model
   deployment and serving on Google Cloud Vertex AI where the pipeline can
   access for remote inferences.

   - *Batch Anomaly Detection
   
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/log_analysis>*:
   A workflow containing model training and several pipelines that leverage
   Iceberg for storing results, BigQuery for storing vector embeddings and
   MLTransform for computing embeddings to demonstrate an end-to-end anomaly
   detection task on a dataset of system logs.

   - *Feature Engineering & Model Evaluation
   
<https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/ml/fraud_detection>*:
   A workflow containing model training and several pipelines, showcasing an
   end-to-end fraud detection MLOps solution that generates features and
   evaluates models to detect credit card transaction frauds.

These illustrative pipelines and workflows will be a very nice addition to
Beam, especially with Beam 3.0 coming up. I'm also very glad to have been
working on this larger goal of democratizing data processing for everyone.
And as always, a huge thank you to my mentor Chamikara Jayalath and the
larger Beam community for your support throughout this project!

Best,
Charles

Reply via email to