BigQuery query caching?

2020-07-02 Thread Matt Terwilliger
Hello, I'm writing a Beam pipeline that does some relatively expensive reads from BigQuery. I want to be able to run the pipeline in a development loop without racking up a huge bill. I know BigQuery has support for query caching, but from the docs, that only works if you don't specify a destinat

Re: BigQuery query caching?

2020-07-02 Thread Jeff Klukas
It sounds like your pipeline is issuing a query rather than reading a whole table. Are you using Java or Python? I'm only familiar with the Java SDK so my answer may be Java-biased. I would recommend materializing the query results to a table, and then configuring your pipeline to read that table

Re: BigQuery query caching?

2020-07-02 Thread Matthew Terwilliger
Hi Jeff, Using Java. Yeah - we are issuing a query rather than reading a table. Materializing the results myself and reading them back seems simple enough. I will give that a try! Thanks, Matt On Thu, Jul 2, 2020 at 9:42 AM Jeff Klukas wrote: > It sounds like your pipeline is issuing a query

Re: Concurrency issue with KafkaIO

2020-07-02 Thread Alexey Romanenko
KafkaUnboundedReader is not thread-safe and, maybe I’m wrong, but I don’t think it’s supposed to be so since every KafkaUnboundedReader is supposed to read from every split, represented by KafkaUnboundedSource, independently. Though, in KafkaIO case, if total number of splits is less than numb

Re: BigQuery query caching?

2020-07-02 Thread Matthew Terwilliger
How are you materializing the query results for the first time and keeping the table up to date as your queries change? Thanks, Matt On Thu, Jul 2, 2020 at 10:06 AM Matthew Terwilliger wrote: > Hi Jeff, > > Using Java. > > Yeah - we are issuing a query rather than reading a table. Materializing

Re: BigQuery query caching?

2020-07-02 Thread Jeff Klukas
I don't have any particularly exciting recommendations for keeping the query up to date. It would be up to the developer to remember to rerun the query to populate the dev results table as they make changes to the query. I'd likely be copying and pasting the query into the BQ console, wrapping it

Querying Metrics when using Spark Runner

2020-07-02 Thread Truebody, Kyle
Hi, We have recently upgraded to the latest version of Apache Beam 2.22.0. We were previously using version 2.13.0 . We are using the SparkRunner. I noticed that after the upgrade that the Metrics query has stop producing values. Through debugging I can see that the metrics and distribution a

Scio v0.9.2 release

2020-07-02 Thread Filipe Regadas
Hi all, We just released scio v0.9.2. If you are upgrading from 0.8.x see our Migration Guide for detailed instructions. I’ll take this opportunity to announce that we moved from Gitter to Spotify FOSS Slack, you can find

Re: Concurrency issue with KafkaIO

2020-07-02 Thread wang Wu
Thank you for the information. Here is our Kafka client version: [INFO] +- org.apache.kafka:kafka-clients:jar:2.3.0:compile [INFO] | +- com.github.luben:zstd-jni:jar:1.4.0-1:compile [INFO] | +- org.lz4:lz4-java:jar:1.6.0:compile [INFO] | \- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile I a