Re: Join daily update Bigquery table with pubsub topic message

杨胜 Mon, 25 May 2020 17:50:18 -0700

Thank you, Reza Ardeshir Rokni,

This post saves my day:
https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs

At 2020-05-24 11:36:46, "Reza Ardeshir Rokni" <raro...@gmail.com> wrote:

If things fit in memory please have a look at the following pattern:

https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs

Note there is a nicer API coming for this pattern,

https://issues.apache.org/jira/browse/BEAM-9650

On Sun, 24 May 2020 at 11:25, 杨胜 <liff...@163.com> wrote:

How large is the BigQuery table? Does it fit in memory?

10 columns (each column data is small), 800,000 rows of data. I believe these
data should be easily fitted into the memory.

At 2020-05-24 11:13:04, "Reuven Lax" <re...@google.com> wrote:

How large is the BigQuery table? Does it fit in memory?

On Sat, May 23, 2020 at 7:01 PM 杨胜 <liff...@163.com> wrote:

Hi everyone,

I am new to apache beam, but I had experiences on spark streaming.

I have a daily updated bigquery table, I want to use this bigquery table as a
lookup table, read this table into beam as bounded PCollection<TableRow> and
refresh this collection within beam on daily basis, I named this variable
bigqueryTableRows. I also had another pubsub topic messages, I want to read
this message as unbounded PCollection<TableRow>, I named this variable as
pubsubTableRows. then join bigqueryTableRows with pubsubTableRows. finally
write result into bigquery.

I have checked all the examples under beam's github repository:
https://github.com/apache/beam/tree/d906270f243bb4de20a7f0baf514667590c8c494/examples/java/src/main/java/org/apache/beam/examples.
But none matches my case.

Any suggestion on how I should implement my pipeline?

Many Thanks,
Steven

Re: Join daily update Bigquery table with pubsub topic message

Reply via email to