Re: Specifying dataflow template location with Apache beam Python SDK

2023-12-18 Thread Bruno Volpato via user
Right, there's some misunderstanding here, so Bartosz and XQ's inputs are correct. Just want to add that the template_location parameter is the GCS path that you want to store your template on, and not the image reference of the base image. The GCR path that you are trying to use is used in the Do

Re: Indeterminate behavior while accessing side inputs

2023-10-25 Thread Bruno Volpato via user
Hi Rajath, I think you need to guarantee that the PCollection will have a single element, so you can use .asSingleton(). A possible approach is getting the latest for the window prior to .asSingleton(): PCollectionView> targetCountryByJourneyId = p.apply("Generate a sequence for extracting resou

Re: Apache Beam 2.50.0 - org.apache.beam.sdk.options.MemoryMonitorOptions ClassDef Not found

2023-10-14 Thread Bruno Volpato via user
Hi Jaymik, This class (MemoryMonitorOptions) was indeed introduced in Beam 2.50.0. My guess is that somehow dependencies were upgraded only partially (for example, just the Runner module, but not Core). A good way to make sure that all dependencies are upgraded correctly is to use a BOM (Bill of

Re: [QUESTION] Why no auto labels?

2023-10-02 Thread Bruno Volpato via user
If I understand the question correctly, you don't have to specify those names. As Reuven pointed out, it is probably a good idea so you have a stable / deterministic graph. But in the Python SDK, you can simply use pcollection | map_fn, instead of pcollection | 'Map' >> map_fn. See an example her

Re: How can we get multiple side inputs from a single pipeline ?

2023-08-28 Thread Bruno Volpato via user
Hi Sachin, Yes, this seems fine to me -- your DoFn can output to specific tags, and then use the PCollectionTuple.get(tagX), PCollectionTuple.get(tagY) followed by View.asSingleton, View.asList, etc, to create different PCollectionView instances. Just be careful, you might need different triggers

Re: Fetch Truststore File Inside a Flex Template image for Confluent Kafka

2023-07-21 Thread Bruno Volpato via user
Hi Somnath, The problem here seems to be that you have */tmp/trust.jks* available when creating the pipeline (in the launcher), but it apparently is not available in the worker VMs (SDK container). In Java we've been using JvmInitializers for Templates to copy files from GCS when the worker start

Re: [Question] Change default file encoding in Dataflow runners

2023-06-17 Thread Bruno Volpato via user
There isn't anything in our code > that sets ANSI file encoding. I will check with Google Support. > > > On Fri, Jun 16, 2023 at 7:27 AM Bruno Volpato via user < > user@beam.apache.org> wrote: > >> Hi Ramana, >> >> Curious where you got ANSI_X3.4-1968 f

Re: [Question] Change default file encoding in Dataflow runners

2023-06-15 Thread Bruno Volpato via user
Hi Ramana, Curious where you got ANSI_X3.4-1968 from -- I don't think there's any trace of this encoding anywhere in Dataflow Workers (as far as I am aware and looked around). The default encoding for JVM is UTF-8, and Dataflow doesn't appear to set it anywhere. I was able to check using: $ docke

Re: [Error] Unable to submit job to Dataflow Runner V2

2023-05-27 Thread Bruno Volpato via user
Hi Mário, The template that you are using as --gcs-location has to be built with Runner v2 enabled. The produced graph is not compatible across Runner v1 and Runner v2. On Sat, May 27, 2023 at 11:28 AM XQ Hu via user wrote: > Can you check whether your code has any options that contain any of >

Re: How to identify what objects in your code have to be serialized

2023-05-08 Thread Bruno Volpato via user
Hi Sachin, Can you post the error that you are getting? It should provide some additional information / path. If you are trying to use DataClient on the pipeline (inside a PTransform, DoFn, etc), you would have to initialize that client inside the DoFn itself (e.g., @Setup

Re: Vulnerabilities in Transitive dependencies

2023-05-01 Thread Bruno Volpato via user
Hi Joshua, It may take a lot of effort and knowledge to review whether CVEs are exploitable or not. I have seen this kind of analysis done in a few cases, such as SnakeYAML recently: https://s.apache.org/beam-and-cve-2022-1471 ( https://github.com/apache/beam/issues/25449) If there is a patch ava

Re: JDBC to BIgquery table create

2023-04-24 Thread Bruno Volpato via user
Thanks for tagging Ahmed! That's correct, that template isn't prepared to create tables. Creating tables is a bit complicated because you need to be able to infer a schema from the data being read, which may not be easy to generalize for all cases. The template accepts an arbitrary SQL query, and

Re: Can some one please remove me from this mailing list

2023-04-22 Thread Bruno Volpato via user
Hi Unais, If you want to unsubscribe from this mailing list, you need to send a blank email to user-unsubscr...@beam.apache.org. On Sat, Apr 22, 2023 at 12:54 PM Unais T wrote: > Can some one please remove me from this mailing list

Re: [Question] Beam Java Dataflow v1 Runner Oracle JDK

2023-04-17 Thread Bruno Volpato via user
Hello Hardip, If you are using Beam 2.46.0, it should be using OpenJDK already (not Oracle's JRE as before). No need for the sources, you can check the images directly from your terminal, if you have Docker installed: $ docker run -it --entrypoint '/bin/bash' gcr.io/cloud-dataflow/v1beta3/beam-

Re: Error "record is out of upper bound"

2023-03-03 Thread Bruno Volpato via user
Found some more info, sorry for the chopping. Are you using *bigqueryio* or *bigquery_tools* somehow? If so, biguquery_tools defines a histogram using 20 buckets of 3 seconds each to export latencies (see https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L

Re: Error "record is out of upper bound"

2023-03-03 Thread Bruno Volpato via user
Hi Nick, This seems to come from utils/histogram.py . Any chance that you are initializing it in a way that defines bounds up to 60,000 but invoking record() with out of bounds value? Best, Bruno On Fr

Re: Launch Dataflow Flex Templates from Go

2023-02-15 Thread Bruno Volpato via user
Hello, The most similar to the mentioned Python APIs is through https://github.com/googleapis/google-api-go-client. There are some docs that include Go examples: - Flex Templates: https://cloud.google.com/dataflow/docs/samples/dataflow-v1beta3-generated-FlexTemplatesService-LaunchFlexTemplate-syn

Re: DataFlow Template error - SDK not reporting number of elements processed

2023-01-10 Thread Bruno Volpato via user
Hi Patrick, I have a few questions that might help troubleshoot this: Did you use the same SDK? Have you updated Beam or any other dependencies? Are there any other error logs (prior to the trace above) that could help understand it? Do you still have the previous template so you can compare the