Ah thanks for your reply! So done manually/outside - I was hoping to have this in the pipeline really so that DDLs would be done in a transaction along with the data ingest. And even if not quite that, would be great to have all the stages/code defined and run by the same thing, and not have something running out of pipeline
The automatic migration thing... will have a nose around. That's close to what I'm looking for. Although maybe not quite. I think what I would like is some sort of sink transform that accepts as a parameter a SQLAlchemy table definition, and automatically migrates any existing table to that (where it can). Although I don't know Beam quite enough yet to know if that's really what I want... (Note also - the word "application" here... there isn't really an application here - am using PostgreSQL as I guess a data warehouse) Michal On Sat, May 6, 2023 at 3:18 PM Pavel Solomin <p.o.solo...@gmail.com> wrote: > Hello! > > Usually DDLs (create table / alter table) live outside of the > applications. From my experience I've seen that sort of tasks being done > either manually or via automations like Liquibase / Flyway. This is not > specific to Beam, it is a common pattern of backend / data engineering apps > development. > > Some applications may have support of most simple and conflict-free DDLs > like adding a nullable column without restarting the app itself. > > I remember I've seen some examples in Java and Python for Beam apps which > supported schema automatic migrations. Example: > > > > https://medium.com/inside-league/streaming-data-to-bigquery-with-dataflow-and-real-time-schema-updating-c7a3deba3bad > > I am not aware of automatic solutions for arbitrary schema changes though. > > On Saturday, 6 May 2023, Michal Charemza <mic...@charemza.name> wrote: > > I'm looking into using Beam to ingest from various sources into a > PostgreSQL database, but there is something that I don't quite know how to > fit into the Beam model. How to deal with "non data" tasks that would need > to happen before or after the pipeline proper? > > For example, creation of tables, renames of tables, migrations on > existing tables. Where should all this sort of code/logic live if the > fetch/ingestion of data is via Beam? Or - is this entirely outside of the > Beam model? It should happen before the pipeline, or after the pipeline, > but not as part of the pipeline? > > -- > Best Regards, > Pavel Solomin > > Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin > <https://www.linkedin.com/in/pavelsolomin> > > > > >