Hi Evan,
Just to have full knowledge:
- "provided" should be used when You expect the target cluster on
environment to have the package of interest installed so you do not have
to include it in the pipeline jar (this is to have it more lightweight
and easier to maintain coherent target jre en
Hi,
Not tested, but few options that might be a solutions for You problem:
1. go with having read and write replicas of Your DB - so that write
replica would get inserts one by one and live with this. Make sure to
deduplicate the data before insert to avoid potential collisions (this
should n
Just curious. where it was documented like this?
I briefly checked it on Maven Central [1] and the provided code snippet for
Gradle uses “implementation” scope.
—
Alexey
[1]
https://search.maven.org/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0/jar
> On 21 Apr 2023, at 01:52, Evan
Thank you for the information.
I'm assuming you had a unique ID in records, and you observed some IDs
missing in Beam output comparing with Spark, and not just some duplicates
produced by Spark.
If so, I would suggest to create a P1 issue at
https://github.com/apache/beam/issues
Also, did you tr
Hi Jan,
To generalize the per-stage parallelism configuration, we should have a FR
proposing the capability to explicitly set autoscaling (in this case, fixed
size per stage) policy in Beam pipelines.
Per-step or per-stage parallelism, or fusion/optimization is not part of
the Beam model. They ar
Oops, I was looking at the "bootleg" mvnrepository search engine, which
shows `compileOnly` in the copy-pastable dependency installation
prompts[1]. When I received the "ClassNotFound" error, my thought was that
the dep should be installed in "implementation" mode. When I tried that, I
get other
Hi Nirav,
BQ external tables are read-only, so you won't be able to write this way. I
also don't think reading a standard external table will work since the Read
API and tabledata.list are not supported for external tables [1].
BigLake tables [2] on the other hand, may "just work". I haven't
check
I hope you all are doing well. I am facing an issue with an Apache Beam
pipeline that gets stuck indefinitely when using the Wait.on transform
alongside JdbcIO. Here's a simplified version of my code, focusing on the
relevant parts:
PCollection result = p.
apply("Pubsub",
PubsubIO.readMess
I believe you have to call withResults() on the JdbcIO transform in order
for this to work.
On Fri, Apr 21, 2023 at 10:35 PM Juan Cuzmar wrote:
> I hope you all are doing well. I am facing an issue with an Apache Beam
> pipeline that gets stuck indefinitely when using the Wait.on transform
> alo