Ah thanks for your reply!

So done manually/outside - I was hoping to have this in the pipeline really
so that DDLs would be done in a transaction along with the data ingest. And
even if not quite that, would be great to have all the stages/code defined
and run by the same thing, and not have something running out of pipeline

The automatic migration thing... will have a nose around. That's close to
what I'm looking for. Although maybe not quite. I think what I would like
is some sort of sink transform that accepts as a parameter a SQLAlchemy
table definition, and automatically migrates any existing table to that
(where it can). Although I don't know Beam quite enough yet to know if
that's really what I want...

(Note also - the word "application" here... there isn't really an
application here - am using PostgreSQL as I guess a data warehouse)

Michal

On Sat, May 6, 2023 at 3:18 PM Pavel Solomin <p.o.solo...@gmail.com> wrote:

> Hello!
>
> Usually DDLs (create table / alter table) live outside of the
> applications. From my experience I've seen that sort of tasks being done
> either manually or via automations like Liquibase / Flyway. This is not
> specific to Beam, it is a common pattern of backend / data engineering apps
> development.
>
> Some applications may have support of most simple and conflict-free DDLs
> like adding a nullable column without restarting the app itself.
>
> I remember I've seen some examples in Java and Python for Beam apps which
> supported schema automatic migrations. Example:
>
>
>
> https://medium.com/inside-league/streaming-data-to-bigquery-with-dataflow-and-real-time-schema-updating-c7a3deba3bad
>
> I am not aware of automatic solutions for arbitrary schema changes though.
>
> On Saturday, 6 May 2023, Michal Charemza <mic...@charemza.name> wrote:
> > I'm looking into using Beam to ingest from various sources into a
> PostgreSQL database, but there is something that I don't quite know how to
> fit into the Beam model. How to deal with "non data" tasks that would need
> to happen before or after the pipeline proper?
> > For example, creation of tables, renames of tables, migrations on
> existing tables. Where should all this sort of code/logic live if the
> fetch/ingestion of data is via Beam? Or - is this entirely outside of the
> Beam model? It should happen before the pipeline, or after the pipeline,
> but not as part of the pipeline?
>
> --
> Best Regards,
> Pavel Solomin
>
> Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin
> <https://www.linkedin.com/in/pavelsolomin>
>
>
>
>
>

Reply via email to