If things fit in memory please have a look at the following pattern: https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs
Note there is a nicer API coming for this pattern, https://issues.apache.org/jira/browse/BEAM-9650 On Sun, 24 May 2020 at 11:25, 杨胜 <liff...@163.com> wrote: > > How large is the BigQuery table? Does it fit in memory? > > 10 columns (each column data is small), 800,000 rows of data. I believe > these data should be easily fitted into the memory. > > > At 2020-05-24 11:13:04, "Reuven Lax" <re...@google.com> wrote: > > How large is the BigQuery table? Does it fit in memory? > > On Sat, May 23, 2020 at 7:01 PM 杨胜 <liff...@163.com> wrote: > >> Hi everyone, >> >> I am new to apache beam, but I had experiences on spark streaming. >> >> I have a daily updated bigquery table, I want to use this bigquery table >> as a lookup table, read this table into beam as bounded >> PCollection<TableRow> and refresh this collection within beam on daily >> basis, I named this variable *bigqueryTableRows*. I also had another >> pubsub topic messages, I want to read this message as unbounded >> PCollection<TableRow>, I named this variable as *pubsubTableRows*. then >> join *bigqueryTableRows* with *pubsubTableRows*. finally write result >> into bigquery. >> >> I have checked all the examples under beam's github repository: >> https://github.com/apache/beam/tree/d906270f243bb4de20a7f0baf514667590c8c494/examples/java/src/main/java/org/apache/beam/examples. >> But none matches my case. >> >> Any suggestion on how I should implement my pipeline? >> >> Many Thanks, >> Steven >> >> >> >> > > > >