Re: [java] Trouble with gradle and using ParquetIO

2023-04-21 Thread Wiśniowski Piotr
Hi Evan, Just to have full knowledge: - "provided" should be used when You expect the target cluster on environment to have the package of interest installed so you do not have to include it in the pipeline jar (this is to have it more lightweight and easier to maintain coherent target jre en

Re: Can I batch data when i use JDBC write operation?

2023-04-21 Thread Wiśniowski Piotr
Hi, Not tested, but few options that might be a solutions for You problem: 1. go with having read and write replicas of Your DB - so that write replica would get inserts one by one and live with this. Make sure to deduplicate the data before insert to avoid potential collisions (this should n

Re: [java] Trouble with gradle and using ParquetIO

2023-04-21 Thread Alexey Romanenko
Just curious. where it was documented like this? I briefly checked it on Maven Central [1] and the provided code snippet for Gradle uses “implementation” scope. — Alexey [1] https://search.maven.org/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0/jar > On 21 Apr 2023, at 01:52, Evan

Re: How Beam Pipeline Handle late events

2023-04-21 Thread Pavel Solomin
Thank you for the information. I'm assuming you had a unique ID in records, and you observed some IDs missing in Beam output comparing with Spark, and not just some duplicates produced by Spark. If so, I would suggest to create a P1 issue at https://github.com/apache/beam/issues Also, did you tr

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-21 Thread Ning Kang via user
Hi Jan, To generalize the per-stage parallelism configuration, we should have a FR proposing the capability to explicitly set autoscaling (in this case, fixed size per stage) policy in Beam pipelines. Per-step or per-stage parallelism, or fusion/optimization is not part of the Beam model. They ar

Re: [java] Trouble with gradle and using ParquetIO

2023-04-21 Thread Evan Galpin
Oops, I was looking at the "bootleg" mvnrepository search engine, which shows `compileOnly` in the copy-pastable dependency installation prompts[1]. When I received the "ClassNotFound" error, my thought was that the dep should be installed in "implementation" mode. When I tried that, I get other

Re: apache beam bigquery IO connector support for bigquery external tables

2023-04-21 Thread Brian Hulette via user
Hi Nirav, BQ external tables are read-only, so you won't be able to write this way. I also don't think reading a standard external table will work since the Read API and tabledata.list are not supported for external tables [1]. BigLake tables [2] on the other hand, may "just work". I haven't check

Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-21 Thread Juan Cuzmar
I hope you all are doing well. I am facing an issue with an Apache Beam pipeline that gets stuck indefinitely when using the Wait.on transform alongside JdbcIO. Here's a simplified version of my code, focusing on the relevant parts: PCollection result = p. apply("Pubsub", PubsubIO.readMess

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-21 Thread Reuven Lax via user
I believe you have to call withResults() on the JdbcIO transform in order for this to work. On Fri, Apr 21, 2023 at 10:35 PM Juan Cuzmar wrote: > I hope you all are doing well. I am facing an issue with an Apache Beam > pipeline that gets stuck indefinitely when using the Wait.on transform > alo