Re: A Declarative API for Apache Beam

2022-12-16 Thread Byron Ellis via dev
So, I picked dbt for the simple reason that it's what all the data practitioners I know seem to be using and/or complaining about. Hell, I even found articles about people moving

Re: A Declarative API for Apache Beam

2022-12-16 Thread Sachin Agarwal via dev
While dbt and Dataform clearly can solve some of the same problems, there is a large dbt community that Beam could serve. If the Beam community thinks joining forces with thr dbt community to bring dbt to ETL use cases beyond just data warehouses where dbt is used for ELT, that is great for everyon

Re: A Declarative API for Apache Beam

2022-12-16 Thread Austin Bennett
Seems a worthwhile addition which can expand the community by making Beam increasingly accessible to additional users and for more use-cases. A bit of a tangent, since commenting on @Byron Ellis 's part, but ... Ensuring some have also seen Dataform [ ex: https://cloud.google.com/dataform/docs/ov

Re: A Declarative API for Apache Beam

2022-12-15 Thread Ahmet Altay via dev
+1 to both of these proposals. In the past 12 months I have heard of at least 3 YAML implementations built on top of Beam in large production systems. Unfortunately, none of those were open sourced. Having these out of the box would be great, and it will clearly have used demand. Thank you all! On

Re: A Declarative API for Apache Beam

2022-12-15 Thread Robert Bradshaw via dev
On Thu, Dec 15, 2022 at 3:37 AM Steven van Rossum wrote: > > This is great! I developed a similar template a year or two ago as a > reference for a customer to speed up their development process and > unsurprisingly it did speed up their development. > Here's an example of the config layout I ca

Re: A Declarative API for Apache Beam

2022-12-15 Thread Steven van Rossum via dev
This is great! I developed a similar template a year or two ago as a reference for a customer to speed up their development process and unsurprisingly it did speed up their development. Here's an example of the config layout I came up with at the time: options: runner: DirectRunner pipeline: #

Re: A Declarative API for Apache Beam

2022-12-14 Thread Chamikara Jayalath via dev
+1 for these proposals and agree that these will simplify and demystify Beam for many new users. I think when combined with the x-lang/Schema-Aware transform binding, these might end up being adequate solutions for many production use-cases as well (unless users need to define custom composites, I/

Re: A Declarative API for Apache Beam

2022-12-14 Thread Sachin Agarwal via dev
To build on Kenn's point, if we leverage existing stuff like dbt we get access to a ready made community which can help drive both adoption and incremental innovation by bringing more folks to Beam On Wed, Dec 14, 2022 at 2:57 PM Kenneth Knowles wrote: > 1. I love the idea. Back in the early day

Re: A Declarative API for Apache Beam

2022-12-14 Thread Robert Burke
I like the idea of a common spec for something like this so we can actually cross validate all the SDK behaviours. It would make testing significantly easier. On Wed, Dec 14, 2022, 2:57 PM Kenneth Knowles wrote: > 1. I love the idea. Back in the early days people talked about an "XML > SDK" or "

Re: A Declarative API for Apache Beam

2022-12-14 Thread Kenneth Knowles
1. I love the idea. Back in the early days people talked about an "XML SDK" or "JSON SDK" or "YAML SDK" and it didn't really make sense at the time. Portability and specifically cross-language schema transforms gives the right infrastructure so this is the perfect time: unique names (URNs) for tran

Re: A Declarative API for Apache Beam

2022-12-14 Thread Byron Ellis via dev
And I guess also a PR for completeness to make it easier to find going forward instead of my random repo: https://github.com/apache/beam/pull/24670 On Wed, Dec 14, 2022 at 2:37 PM Byron Ellis wrote: > Since Robert opened that can of worms (and we happened to talk about it > yesterday)... :-) > >

Re: A Declarative API for Apache Beam

2022-12-14 Thread Byron Ellis via dev
Since Robert opened that can of worms (and we happened to talk about it yesterday)... :-) I figured I'd also share my start on a "port" of dbt to the Beam SDK. This would be complementary as it doesn't really provide a way of specifying a pipeline, more orchestrating and packaging a complex pipeli

Re: A Declarative API for Apache Beam

2022-12-14 Thread Damon Douglas via dev
Hello Robert, I'm replying to say that I've been waiting for something like this ever since I started learning Beam and I'm grateful you are pushing this forward. Best, Damon On Wed, Dec 14, 2022 at 2:05 PM Robert Bradshaw wrote: > While Beam provides powerful APIs for authoring sophisticated