[GSOC] Build out Beam Use Cases

Nivaldo Tokuda Wed, 28 Feb 2024 06:12:26 -0800

Hello!

My name is Nivaldo, and I'd like to express my interest in joining this
year's GSOC to add real world use cases for Beam's MLTransform/Enrichment
transforms <https://issues.apache.org/jira/browse/GSOC-259>.

***About me***
I am a Senior Data Engineer from Brazil, with 6-7 YOE in helping companies
make the most out of their data. I've contributed to Beam in the past (See
[1] <https://github.com/apache/beam/pull/23879> and [2]
<https://github.com/apache/beam/issues/21089>), but I think I still fit the
criteria of being a beginner in open source development (Using [3]
<https://developers.google.com/open-source/gsoc/faq#how_do_i_know_if_i_am_considered_a_beginner_in_open_source_development>
as a reference).

Most notably I spent 2-3 months contributing to the creation of a Rust SDK
for Beam, but due to unfortunate events, I abruptly stopped contributing. I
was happy to see that some amazing members of the community have been able
to fork the code I wrote and continue from there. Part of the reason I had
for that contribution was to prepare a career transition into Software
Engineering, but I also had to put that goal on hold at the time. Recently,
my circumstances have changed and I have been preparing to continue with a
more domain-specific version of this goal, more directed towards machine
learning. Working on this project would be an excellent way to increment my
portfolio, learn relevant skills and contribute to the Beam community.

I learned a lot about Beam's internals and fundamental concepts while
working on the Rust SDK (See [4]
<https://github.com/apache/beam/compare/master...nivaldoh:beam:rust_sdk>
for my commits), and I think this knowledge would give me a nice headstart
to work with the ML transforms. Briefly speaking, I also have some
experience working with Beam professionally (See [5]
<https://github.com/google/megalista/pull/12>), and I have two official
Google Cloud certifications (Professional Data Engineer and Professional ML
Engineer). I have a bachelor's degree in CS, and there's a chance I might
start a Master's degree program in CS/AI this summer/fall (pending
university decisions).

***Questions***
1. Would I actually be eligible to apply to GSOC for this project, or do I
not count as an open source beginner anymore in this case? The total number
of PRs and issues I've ever opened on Github would be below 10 as far as
I'm aware. I've never worked formally as a Software Engineer, so I'd have a
lot to learn from a mentor and would be looking forward to that.

2. I'd like to understand the scope and exact purpose of the use cases a
bit better. Are they meant to serve more like standalone tutorials with
purely mock data, or maybe more like reusable/adaptable examples where
users can fit in their own data? Additionally, is my assessment correct
that the implementation would consist basically of actual code, testing and
documentation?

3. Would it be possible to define what exactly would count as a "slowly
changing source" for the purposes of the Enrichment use cases to be
implemented?

4. Regarding the implementation of 1 or more additional Enrichment handlers
for currently unsupported sources, we'd be looking into adding, for
instance, something like a BigQueryEnrichmentHandler, is that correct?

Thank you for reading this.

***References***

[1]: https://github.com/apache/beam/pull/23879

[2]: https://github.com/apache/beam/issues/21089

[3]:
https://developers.google.com/open-source/gsoc/faq#how_do_i_know_if_i_am_considered_a_beginner_in_open_source_development

[4]: https://github.com/apache/beam/compare/master...nivaldoh:beam:rust_sdk

[5]: https://github.com/google/megalista/pull/12

[GSOC] Build out Beam Use Cases

Reply via email to