Hello!
I'm trying to get into Beam, but it seems rather puzzling.
started
> bin\java" -jar \...\...\beam-runners-flink-1.14-job-server-2.46.0.jar
Logs seems ok. Then, I build java starter with beam-sdks-java-core:2.46.0
and beam-runners-portability-java:2.46.0
> bin\java -jar build\pipeline.jar -
I'm looking into using Beam to ingest from various sources into a
PostgreSQL database, but there is something that I don't quite know how to
fit into the Beam model. How to deal with "non data" tasks that would need
to happen before or after the pipeline proper?
For example, creation of tables, re
Hello!
Usually DDLs (create table / alter table) live outside of the applications.
>From my experience I've seen that sort of tasks being done either manually
or via automations like Liquibase / Flyway. This is not specific to Beam,
it is a common pattern of backend / data engineering apps develop
Ah thanks for your reply!
So done manually/outside - I was hoping to have this in the pipeline really
so that DDLs would be done in a transaction along with the data ingest. And
even if not quite that, would be great to have all the stages/code defined
and run by the same thing, and not have somet
You could create a batch pipeline that reads GCS and writes to BigQuery.
And you can use this template
https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-storage-to-bigquery
.
On Sat, May 6, 2023 at 1:10 AM Utkarsh Parekh
wrote:
> Hi,
>
> I'm writing a simple streaming beam a