Re: Loosing records when using BigQuery IO Connector

2023-04-17 Thread XQ Hu via user
Does FILE_LOADS ( https://beam.apache.org/releases/javadoc/2.46.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#FILE_LOADS) work for your case? For STORAGE_WRITE_API, it has been actively improved. If the latest SDK still has this issue, I highly recommend you to create a Google

Re: JDBC to BIgquery table create

2023-04-24 Thread XQ Hu via user
Do you mean creating the BigQuery table when it does not exist? If so, you can check https://beam.apache.org/documentation/io/built-in/google-bigquery/#create-disposition . On Mon, Apr 24, 2023 at 12:47 AM Himanshu Hazari via user < user@beam.apache.org> wrote: > I am new to Apache beam so please

Re: Loosing records when using BigQuery IO Connector

2023-05-03 Thread XQ Hu via user
https://github.com/apache/beam/issues/26515 tracks this issue. The fix was merged. Thanks a lot for reporting this issue, Binh! On Mon, Apr 17, 2023 at 12:58 PM Binh Nguyen Van wrote: > Hi, > > I tested with streaming insert and file load, and they all worked as > expected. But looks like storag

Re: Apache beam

2023-05-06 Thread XQ Hu via user
You could create a batch pipeline that reads GCS and writes to BigQuery. And you can use this template https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-storage-to-bigquery . On Sat, May 6, 2023 at 1:10 AM Utkarsh Parekh wrote: > Hi, > > I'm writing a simple streaming beam a

Re: [Error] Unable to submit job to Dataflow Runner V2

2023-05-27 Thread XQ Hu via user
Can you check whether your code has any options that contain any of [disable_runner_v2, disable_prime_runner_v2, disable_prime_streaming_engine]? On Sat, May 27, 2023 at 5:29 AM Mário Costa via user wrote: > I have a pipeline built using Apache Beam java SDK version 2.46.0 when > submitting a jo

Re: Tour of Beam - an interactive Apache Beam learning guide

2023-06-15 Thread XQ Hu via user
We already have a Beam Overview there. https://beam.apache.org/get-started/tour-of-beam/ contains some good Colab notebooks, which mainly are just for Python. I suggest we link this to https://tour.beam.apache.org/ but move the current content under Python Quickstart. On Thu, Jun 15, 2023 at 10:11

Re: Pandas 2 Timeline Estimate

2023-07-12 Thread XQ Hu via user
https://github.com/apache/beam/issues/27221#issuecomment-1603626880 This tracks the progress. On Wed, Jul 12, 2023 at 7:37 PM Adlae D'Orazio wrote: > Hello, > > I am currently trying to use Interactive Beam to run my pipelines through > a Jupyter notebook, but I > have internal packages dependi

Re: [QUESTION] Why no auto labels?

2023-10-03 Thread XQ Hu via user
Looks like this is the current behaviour. If you have `t = beam.Filter(identity_filter)`, `t.label` is defined as `Filter(identity_filter)`. On Mon, Oct 2, 2023 at 9:25 AM Joey Tran wrote: > You don't have to specify the names if the callable you pass in is > /different/ for two `beam.Map`s, but

Re: [QUESTION] Why no auto labels?

2023-10-03 Thread XQ Hu via user
That suggests the default label is created as that, which indeed causes the duplication error. On Tue, Oct 3, 2023 at 9:15 PM Joey Tran wrote: > Not sure what that suggests > > On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote: > >> Looks like this is the current behavio

Re: Beam 2.52.0 Release

2023-11-18 Thread XQ Hu via user
Thanks a lot! Great job, Team! On Fri, Nov 17, 2023 at 7:21 PM Danny McCormick via user < user@beam.apache.org> wrote: > I am happy to announce that the 2.52.0 release of Beam has been finalized. > This release includes both improvements and new functionality. > > For more information on changes

Re: Specifying dataflow template location with Apache beam Python SDK

2023-12-18 Thread XQ Hu via user
https://github.com/google/dataflow-ml-starter/tree/main?tab=readme-ov-file#run-the-beam-pipeline-with-dataflow-flex-templates has a full example about how to create your own flex template. FYI. On Mon, Dec 18, 2023 at 5:01 AM Bartosz Zabłocki via user < user@beam.apache.org> wrote: > Hi Sumit, >

Re: Environmental variables not accessible in Dataflow pipeline

2023-12-20 Thread XQ Hu via user
Dataflow VMs cannot know your local env variable. I think you should use custom container: https://cloud.google.com/dataflow/docs/guides/using-custom-containers. Here is a sample project: https://github.com/google/dataflow-ml-starter On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World wrote: > Hello S

Re: Environmental variables not accessible in Dataflow pipeline

2023-12-22 Thread XQ Hu via user
You can use the same docker image for both template launcher and Dataflow job. Here is one example: https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile#L60 On Fri, Dec 22, 2023 at 8:04 AM Sumit Desai wrote: > Yes, I will have to try it out. > > Regards > Sumit

Re: [Question] WaitOn for Reading Step

2023-12-22 Thread XQ Hu via user
When I search the Beam code base, there are plenty of places which use Wait.on. You could check these code for some insights. If this doesn't work, it would be better to create a small test case to reproduce the problem and open a Github issue. Sorry, I cannot help too much with this. On Fri, Dec

Re: [Question] S3 Token Expiration during Read Step

2023-12-22 Thread XQ Hu via user
Can you share some code snippets about how to read from S3? Do you use the builtin TextIO? On Fri, Dec 22, 2023 at 11:28 AM Ramya Prasad via user wrote: > Hello, > > I am a developer trying to use Apache Beam, and I have a nuanced problem I > need help with. I have a pipeline which has to read i

Re: Removing old dataflow jobs

2024-01-02 Thread XQ Hu via user
You can archive jobs now: https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline#archive On Tue, Jan 2, 2024 at 8:47 AM Svetak Sundhar via user wrote: > Hello Sumit, > > There is no requirement to delete old jobs, though you can archive > completed jobs via a recently released feature

Re: Beam 2.53.0 Release

2024-01-05 Thread XQ Hu via user
Great! And thank you! On Fri, Jan 5, 2024 at 2:49 PM Jack McCluskey via user wrote: > We are happy to present the new 2.53.0 release of Beam. > This release includes both improvements and new functionality. > For more information on changes in 2.53.0, check out the detailed release > notes (http

Re: Does withkeys transform enforce a reshuffle?

2024-01-19 Thread XQ Hu via user
I do not think it enforces a reshuffle by just checking the doc here: https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html?highlight=withkeys#apache_beam.transforms.util.WithKeys Have you tried to just add ReShuffle after PubsubLiteIO? On Thu, Jan 18, 2024 at 8:54 PM hs

Re: Query about `JdbcIO`

2024-02-24 Thread XQ Hu via user
I did not find BEAM-13846 but this suggests String is never supported: https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcUtilTest.java#L59 However, you could use the code from the test to create yours. On Thu, Feb 22, 2024 at 11:20 AM Vard

Re: Cross Language Runtime error python-Java

2024-02-24 Thread XQ Hu via user
Does your code work without the launcher? Better check this step by step to figure out which part causes this error. On Sat, Feb 24, 2024 at 3:25 AM George Dekermenjian wrote: > I have a python pipeline that uses the bigquery storage write method > (cross language with Java). I’m building launch

Re: Problem in jdbc connector with autoincrement value

2024-02-24 Thread XQ Hu via user
Here is what I did: CREATE TABLE IF NOT EXISTS test2 (id bigint DEFAULT nextval('sap_tm_customer_id_seq'::regclass) NOT NULL, name VARCHAR(10), load_date_time TIMESTAMP) make sure id cannot be NULL (you might not need this). I tried this for my data without using the id field: class ExampleRow(

Re: Cross Language Runtime error python-Java

2024-02-24 Thread XQ Hu via user
t /opt/apache/beam/jars > /opt/apache/beam/jars COPY --from=apache/beam_java11_sdk:latest > /opt/java/openjdk /opt/java/openjdk ENV JAVA_HOME=/opt/java/openjdk ENV > PATH="${JAVA_HOME}/bin:${PATH}" > > > > On Sat, Feb 24, 2024 at 21:55 XQ Hu via user wrote: > >&

Re: [Question] Python Streaming Pipeline Support

2024-03-08 Thread XQ Hu via user
Is this what you are looking for? import random import time import apache_beam as beam from apache_beam.transforms import trigger, window from apache_beam.transforms.periodicsequence import PeriodicImpulse from apache_beam.utils.timestamp import Timestamp with beam.Pipeline() as p: input = (

Re: How to change SQL dialect on beam_sql magic?

2024-03-08 Thread XQ Hu via user
I do not think the dialect argument is exposed here: https://github.com/apache/beam/blob/a391198b5a632238dc4a9298e635bb5eb0f433df/sdks/python/apache_beam/runners/interactive/sql/beam_sql_magics.py#L293 Two options: 1) create a feature request and PR to add that 2) Switch to SqlTransform On Mon, M

Re: Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-13 Thread XQ Hu via user
Can you explain more about " that current sinks for Avro and Parquet with the destination of GCS are not supported"? We do have AvroIO and ParquetIO ( https://beam.apache.org/documentation/io/connectors/) in Python. On Wed, Mar 13, 2024 at 5:35 PM Ondřej Pánek wrote: > Hello Beam team! > > > >

Re: java.lang.ClassCastException: class java.lang.String cannot be cast to class...

2024-03-17 Thread XQ Hu via user
Here is what I did including how I setup the portable runner with Flink 1. Start the local Flink cluster 2. Start the Flink job server and point to that local cluster: docker run --net=host apache/beam_flink1.16_job_server:latest --flink-master=localhost:8081 3. I use these pipeline options in the

Re: how to enable debugging mode for python worker harness

2024-03-17 Thread XQ Hu via user
I cloned your repo on my Linux machine, which is super useful to run. Not sure why you use Beam 2.41 but anyway, I tried this on my Linux machine: python t.py \ --topic test --group test-group --bootstrap-server localhost:9092 \ --job_endpoint localhost:8099 \ --artifact_endpoint localhost:8

Re: how to enable debugging mode for python worker harness

2024-03-18 Thread XQ Hu via user
oblem is related to only the image, thanks. > > The goal for this repo is to complete my previous talk: > https://www.youtube.com/watch?v=XUz90LpGAgc&ab_channel=ApacheBeam > > On Sun, Mar 17, 2024 at 8:07 AM XQ Hu via user > wrote: > >> I cloned your repo on my Linux ma

Re: What's the current status of pattern matching with Beam SQL?

2024-03-23 Thread XQ Hu via user
https://beam.apache.org/documentation/dsls/sql/zetasql/overview/ and https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java for the supported functions. On Sat, Mar 23, 2024 at 5:50 

Re: Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-23 Thread XQ Hu via user
s this error (when deploying > with DataflowRunner). So obviously, there is no global window or default > trigger. That’s, I believe, what’s described in the issue: > https://github.com/apache/beam/issues/25598 > > > > > > *From: *Ondřej Pánek > *Date: *Thursday,

Re: What's the current status of pattern matching with Beam SQL?

2024-03-24 Thread XQ Hu via user
gt; eg) Apache Flink CEP. > > Both the tickets have an associating GitHub issue and no update for more > than 1 year, which means they are not likely to be completed in the near > future? > > Cheers, > Jaehyeon > > > On Sun, 24 Mar 2024 at 12:02, XQ Hu via user

Re: DLQ Implementation

2024-03-27 Thread XQ Hu via user
You can check https://github.com/search?q=repo%3Ajohnjcasey%2Fbeam%20withBadRecordErrorHandler&type=code. The test codes show how to use them. More doc will be added later. On Wed, Mar 27, 2024 at 7:15 PM Ruben Vargas wrote: > Hello all > > Maybe a silly question. Are there any suggestions for

Re: how to enable debugging mode for python worker harness

2024-03-31 Thread XQ Hu via user
python worker harness is > starting to work) > > Wondering if you can share what you've changed (maybe a PR) so that I can > test again on my linux machine. Thanks so much for your help. There's > someone else also pinging me on the same error when testing, and I do wa

Re: Is Pulsar IO Connector Officially Supported?

2024-04-10 Thread XQ Hu via user
I think PulsarIO needs more work to be polished. On Tue, Apr 9, 2024 at 2:48 PM Vince Castello via user wrote: > I see that a Pulsar connector was made available as of BEAM 2.38.0 release > but I don't see Pulsar as an official connector on the page below. Is the > Pulsar IO connector official o

Re: Is Pulsar IO Connector Officially Supported?

2024-04-10 Thread XQ Hu via user
Sounds like a good idea to add a new section. Let me chat with the team about that. Thanks. On Wed, Apr 10, 2024 at 12:09 PM Ahmet Altay wrote: > Pulsar IO did not change much since it was originally added in 2022. You > can find about the gaps in this presentation ( > https://2022.beamsummit.or

Re: Any recomendation for key for GroupIntoBatches

2024-04-14 Thread XQ Hu via user
To enrich your data, have you checked https://cloud.google.com/dataflow/docs/guides/enrichment? This transform is built on top of https://beam.apache.org/documentation/io/built-in/webapis/ On Fri, Apr 12, 2024 at 4:38 PM Ruben Vargas wrote: > On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim wrote:

Re: Any recomendation for key for GroupIntoBatches

2024-04-15 Thread XQ Hu via user
sages > in batches to the external API in order to gain some performance > (don't expect to send 1 http request per message). > > Thank you very much for all your responses! > > > On Sun, Apr 14, 2024 at 8:28 AM XQ Hu via user > wrote: > > > > To enrich your

Re: Any recomendation for key for GroupIntoBatches

2024-04-15 Thread XQ Hu via user
at is not clear to me is what are the advantages of using it? Is > > >> only the error/retry handling? anything in terms of performance? > > >> > > >> My PCollection is unbounded but I was thinking of sending my messages > > >> in batches to the external API

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-04 Thread XQ Hu via user
I played with your example. Indeed, create_tracker in your ProcessFilesFn is never called, which is quite strange. I could not find any example that shows the chained SDFs, which makes me wonder whether the chained SDFs work. @Chamikara Jayalath Any thoughts? On Fri, May 3, 2024 at 2:45 AM Jaehy

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread XQ Hu via user
; Jaehyeon > > On Sun, 5 May 2024 at 09:21, XQ Hu via user wrote: > >> I played with your example. Indeed, create_tracker in your ProcessFilesFn >> is never called, which is quite strange. >> I could not find any example that shows the chained SDFs, which makes me &g

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread XQ Hu via user
orks with the FlinkRunner. Thank you so much! > > Cheers, > Jaehyeon > > [image: image.png] > > On Mon, 6 May 2024 at 02:49, XQ Hu via user wrote: > >> Have you tried to use other runners? I think this might be caused by some >> gaps in Python DirectRunner to

Re: Fails to deploy a python pipeline to a flink cluster

2024-05-11 Thread XQ Hu via user
Do you still have the same issue? I tried to follow your setup.sh to reproduce this but somehow I am stuck at the word_len step. I saw you also tried to use `print(kafka_kv)` to debug it. I am not sure about your current status. On Fri, May 10, 2024 at 9:18 AM Jaehyeon Kim wrote: > Hello, > > I'

Re: Question: Java Apache Beam, mock external Clients initialized in Setup

2024-05-25 Thread XQ Hu via user
I am not sure which part you want to test. If the processData part should be tested, you could refactor the code without use any Beam specific code and test the processing data logic. >From your example, it seems that you are calling some APIs, we recently added a new Web API IO: https://beam.apac

Re: Error handling for GCP Pub/Sub on Dataflow using Python

2024-05-25 Thread XQ Hu via user
I do not suggest you handle this in beam.io.WriteToPubSub. You could change your pipeline to add one transform to check the message size. If it is beyond 10 MB, you could use another sink or process the message to reduce the size. On Fri, May 24, 2024 at 3:46 AM Nimrod Shory wrote: > Hello group

Re: Query about autinference of numPartitions for `JdbcIO#readWithPartitions`

2024-05-31 Thread XQ Hu via user
You should be able to configure the number of partition like this: https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Java/src/main/java/jdbc/ReadPartitionsJdbc.java#L132 The code to auto infer the number of partitions seems to be unreachable (I haven't checked this carefully). M

Re: Question: Pipelines Stuck with Java 21 and BigQuery Storage Write API

2024-06-03 Thread XQ Hu via user
Probably related to the strict encapsulation that is enforced with Java 21. Use `--add-opens=java.base/java.lang=ALL-UNNAMED` as the JVM flag could be a temporary workaround. On Mon, Jun 3, 2024 at 3:04 AM 田中万葉 wrote: > Hi all, > > I encountered an UnsupportedOperationException when using Java 2

Re: Beam + VertexAI

2024-06-09 Thread XQ Hu via user
If you have a Vertex AI model, try https://cloud.google.com/dataflow/docs/notebooks/run_inference_vertex_ai If you want to use the Vertex AI model to do text embedding, try https://cloud.google.com/dataflow/docs/notebooks/vertex_ai_text_embeddings On Sun, Jun 9, 2024 at 4:40 AM Sofia’s World wrot

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread XQ Hu via user
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies is a great example. On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user < user@beam.apache.org> wrote: > In this case the Python version will be defined by the Python v

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-11 Thread XQ Hu via user
ftester-image": > KeyError: 'setuptools_scm' > Step #0 - "build-shareloader-template": Step #4 - "dftester-image": > running bdist_wheel > > > > > It is somehow getting messed up with a toml ? > > > Could anyone adv

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-12 Thread XQ Hu via user
emplate": Step #4 - "dftester-image": >>> section = defn.get("tool", {})[tool_name] >>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>> ~~~~~~~~~~~~^

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-16 Thread XQ Hu via user
ker file >>>>>> >>>>>> >>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester >>>>>> >>>>>> I was using a setup.py as well, but then i commented out t

Re: KafkaIO metric publishing

2024-06-19 Thread XQ Hu via user
Is your job a Dataflow Template job? The error is caused by https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/DataflowTemplateJob.java#L55 . So basically DataflowTemplateJob does not support metrics. On Wed, Jun 19,

Re: [Question] Issue with MongoDB Read in Apache Beam - InvalidBSON Error

2024-07-11 Thread XQ Hu via user
You are welcome to create a PR to fix this issue if you need to change the connector source code. On Sun, Jul 7, 2024 at 5:39 AM Marcin Stańczak wrote: > Hello Apache Beam Community, > > I'm Marcin and I am currently working on a project using Apache Beam > 2.57.0. I have encountered an issue wh

Re: Apache beam github repo collaborator

2024-07-13 Thread XQ Hu via user
Welcome to Beam! You can start contributing now. Some useful docs: - https://github.com/apache/beam/blob/master/CONTRIBUTING.md - https://github.com/apache/beam/tree/master/contributor-docs - https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides You can start with some good

Re: async/await logic in a Beam DoFn task

2024-08-25 Thread XQ Hu via user
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Wait.html Is this something you are looking for? On Sun, Aug 25, 2024 at 6:21 AM Sofia’s World wrote: > HI all > i want to write a pipeline where, as part of one of the steps, i will > need to use > an await call

Re: Transform Pattern Question

2024-10-11 Thread XQ Hu via user
This sounds like what CDC (Change Data Capture) typically does, which usually runs as a streaming pipeline. On Fri, Oct 11, 2024 at 3:51 PM Joey Tran wrote: > Another basic pattern question for the user group. > > Say I have a database of records with an ID and some float property. > Another tea

Re: Thanks!

2024-10-05 Thread XQ Hu via user
Thanks for sharing the article! You could put your example under https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples if you want to share this with the community. On Sat, Oct 5, 2024 at 1:46 PM Henry Tremblay wrote: > Thanks for the help on writing to Kafka. Here is my sh

Re: Thanks!

2024-10-07 Thread XQ Hu via user
As long as you have a GitHub account, you could do a Pull Request. :) On Mon, Oct 7, 2024 at 1:42 AM Henry Tremblay wrote: > I believe I have to be a developer? > > On Sat, Oct 5, 2024 at 4:32 PM XQ Hu via user > wrote: > >> Thanks for sharing the article! >> You c

Re: Question about "Guide to common Cloud Dataflow use-case patterns" series

2024-10-16 Thread XQ Hu via user
https://github.com/GoogleCloudPlatform/dataflow-solution-guides has more guides. On Wed, Oct 16, 2024 at 1:53 AM LDesire wrote: > Hello Beam Community, > > I believe there are some people currently working at Google in this group, > and I would like to ask a question. > > There is an article ser

Re: Where to import external dependencies when using Flex Template

2024-10-16 Thread XQ Hu via user
It is fine to put that import inside the process method. I think Dataflow probably complains about this due to your template launcher image that does not install `psycopg2`. On Wed, Oct 16, 2024 at 6:08 PM Henry Tremblay via user < user@beam.apache.org> wrote: > Not exactly Apache Beam, but I not

Re: Bean dataflow job suddenly breaks due to input arguments

2024-10-05 Thread XQ Hu via user
This looks strange. It looks like add_argument is called twice. I am not sure it will work but can you put all your parser to a function like parse_known_args and then call it from run? e.g., https://github.com/apache/beam/blob/d4dd58b2c4c4b5867ace4bdd34e1bcc32de963cc/sdks/python/apache_beam/exam

Re: Solution to import problem

2024-11-04 Thread XQ Hu via user
nd worker because > of the ENTRYPOINT > > On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user > wrote: > >> I think the problem is you do not specify sdk_container_image when >> running your template. >> >> >> https://github.com/GoogleCloudPlatform/python

Re: Solution to import problem

2024-11-03 Thread XQ Hu via user
: > You shouldn’t have to use an sdk_container_image. It is not in the docs, > and I talked to Google, and they said a container image is not needed. > Also, why does the requests library work, and the secret manager does not? > > > > *From:* XQ Hu via user > *Sent:* Sunday,

Re: Solution to import problem

2024-11-03 Thread XQ Hu via user
I think the problem is you do not specify sdk_container_image when running your template. https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template has more details. Basically, you do not need to https://github.com/pau

Re: ConsumerPollingTimeout

2024-09-27 Thread XQ Hu via user
yes, for Python, check consumer_polling_timeout in https://beam.apache.org/releases/pydoc/current/apache_beam.io.kafka.html. On Fri, Sep 27, 2024 at 3:55 PM Utkarsh Parekh wrote: > Hi Team, > > Is consumer polling timeout configurable for KafkaIO.read? > > Utkarsh >

Re: Support for Flink 1.19 or 1.20

2024-09-20 Thread XQ Hu via user
I suggest you open a feature request issue like https://github.com/apache/beam/issues/30789 to track this request. On Thu, Sep 19, 2024 at 1:20 PM Alek Mitrevski wrote: > Is there a roadmap or support planned for Flink 1.19 or 1.20 releases? >

Re: ConsumerPollingTimeout

2024-10-01 Thread XQ Hu via user
Can you share an example to reproduce this issue? On Fri, Sep 27, 2024 at 4:04 PM Utkarsh Parekh wrote: > For the Java SDK, it's not working. > > There is GitHub issue related to it - > https://github.com/apache/beam/pull/30877 > > > On Fri, Sep 27, 2024 at 1:02 

Re: Questions about file_upload method in BigQueryIO

2024-10-02 Thread XQ Hu via user
Have you checked https://cloud.google.com/dataflow/docs/guides/write-to-bigquery? autosharding is generally recommended. If the cost is the concern, have you checked STORAGE_API_AT_LEAST_ONCE? On Wed, Oct 2, 2024 at 2:16 PM hsy...@gmail.com wrote: > We are trying to process over 150TB data(stre

Re: Solution to import problem

2024-11-07 Thread XQ Hu via user
gt;> >> >> ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}" >> >> ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP >> >> In my worker image. >> >> >> >> What is not completely clear to me is why you need the setup.py to

Re: Beam 2.61.0 Release

2024-11-25 Thread XQ Hu via user
Great job! Thanks for your work! On Mon, Nov 25, 2024 at 3:45 PM Danny McCormick via user < user@beam.apache.org> wrote: > Hi, I am happy to announce that Beam 2.61.0 has been fully released. For > more information about the release, check out the release notes - > https://github.com/apache/beam/

Re: Problem running pipeline in unit tests

2024-12-03 Thread XQ Hu via user
Can you share the full example to reproduce this? On Mon, Dec 2, 2024 at 4:30 PM Sofia’s World wrote: > HI all >i am starting writing beam test in a new environment and suddenly i am > finding this error > while running a very simple test > > def test_word_count(self): > with TestPipeline()

Re: [Question] Multi-language Pipelines: Python BigTable Enrichment Transform in Java

2024-12-24 Thread XQ Hu via user
I created one example here https://github.com/liferoad/beam-multi_language_inference by following https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/multi_language_inference. This is not the same as what you asked for. But this should work if you package BigTableE

Re: Commit Kafka message after processing

2024-12-20 Thread XQ Hu via user
Assuming you are using Dataflow as your runner, please check https://cloud.google.com/dataflow/docs/guides/read-from-kafka On Fri, Dec 20, 2024 at 12:03 PM Utkarsh Parekh wrote: > Hello everyone, > > For Python SDK users, what configuration do you use in the beam pipeline > to ensure committing

Re: [Question]: Apache Beam Test Pipeline Configuration

2024-11-21 Thread XQ Hu via user
Hard to tell what went wrong. Have you tried our Java starter example ( https://github.com/apache/beam-starter-java)? This should work just fine with TestPipeline. On Thu, Nov 21, 2024 at 5:56 PM Ramya Prasad via user wrote: > Hello, > I am a developer trying to use Apache Beam in Java, and I am

Re: [Bug] Problem with subscription

2025-01-22 Thread XQ Hu via user
cc this to the user list to test my recent subscription with my private gmail. On Wed, Jan 22, 2025 at 3:44 PM Robert Bradshaw via dev wrote: > Welcome to the community, Enrique! > > I have no idea why the subscriptions aren't working, or how to debug > this. Apache infra would probably have peo

Re: Unable to run in python-worker-harness after bumpping from 2.41.0 to 2.60.0

2025-01-27 Thread XQ Hu via user
Hi Lydian, I do not know how you setup your FlinkRunner. Recently, I tested several ways (you can check https://github.com/liferoad/beam-ml-flink) to run a simple Beam ML job on FlinkRunner and they all seem to be working fine. XQ On Mon, Jan 27, 2025 at 12:28 PM Ahmet Altay wrote: > I do not

Beam 2.64.0 Release

2025-04-01 Thread XQ Hu via user
Hi, I am happy to announce that Beam 2.64.0 has been fully released. For more information about the release, check out the release notes - https://github.com/apache/beam/releases/tag/v2.64.0. Thanks, XQ Hu

Re: Beam YAML is great!

2025-04-29 Thread XQ Hu via user
We are glad you like it. For case studies, if you are interested in this, please let me know. Some links: https://beam.apache.org/case-studies/ and https://beam.apache.org/community/case-study/. :) The no-code experience is one of our focuses for Beam 3.0 as well. On Tue, Apr 29, 2025 at 7:39 PM

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-25 Thread XQ Hu via user
Apache Beam provides a built-in mechanism specifically for managing per-key-and-window state that persists across workers and pipeline restarts. Is there anything you can not use https://beam.apache.org/documentation/programming-guide/#state-and-timers? On Fri, Apr 25, 2025 at 8:45 AM Shaochen Bai

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-28 Thread XQ Hu via user
te-persistence-specs> > > <https://stackoverflow.com/questions/69835743/dataflow-state-persistence-specs> > > > On 25 Apr 2025, at 17:12, XQ Hu via user wrote: > > Apache Beam provides a built-in mechanism specifically for managing > per-key-and-window state that

Re: Uploading Beam Inference example

2025-05-03 Thread XQ Hu via user
Feel free to create a PR for your example. Thanks! On Sat, May 3, 2025 at 12:43 PM Sofia’s World wrote: > HI All > i have written a sample pipeline using Apache Beam which uses > RunInference to pass > to an LLM a list of stocks and get it to interpret - based on some > criteria - , which one

Re: Beam YAML Side Inputs?

2025-05-01 Thread XQ Hu via user
We do not have a plan to support this yet. We are trying to package all these into some higher level APIs. For example, YAML does not surface Reshuffle but Create ( https://beam.apache.org/releases/yamldoc/current/#create) has the boolean option to add this after Create. On Thu, May 1, 2025 at 3:5

Re: [python] Beam Education Material for Workshops

2025-03-08 Thread XQ Hu via user
We probably have many online resources that cover these topics but they are scattered. For example, Beam Summit and College talks on Youtube: https://www.youtube.com/@ApacheBeamYT (Beam Summit slides can be found here: https://beamsummit.org/) and https://www.youtube.com/@BeamCollege ( https://beam

Re: IO Connector for AliCloud's MaxCompute Bigdata store

2025-02-24 Thread XQ Hu via user
I am not sure anyone is working on this. This is not on the roadmap, either. On Mon, Feb 24, 2025 at 6:20 AM Rajath BK wrote: > Bumping up for attention... > > - Thanks and Regards > Rajath > > > On Tue, Feb 18, 2025 at 11:10 PM Rajath BK wrote: > >> Hello community folks, >> We have a req

Re: Beam 2.65.0 Release

2025-05-12 Thread XQ Hu via user
Thank you, Yi! Great news! On Mon, May 12, 2025 at 3:34 PM Yi Hu via dev wrote: > Hi, > > I am happy to announce that Beam 2.65.0 has been fully released. For > more information about the release, check out the release notes - > https://github.com/apache/beam/releases/tag/v2.65.0. > > Thanks, >

Re: Splitable DoFn

2025-07-08 Thread XQ Hu via user
How you checked https://beam.apache.org/blog/splittable-do-fn-is-available/? It lists some real world examples. On Tue, Jul 8, 2025 at 6:59 PM Zack Culberson wrote: > Hi all, > > I have been looking for a working example of a splitable DoFn and what is > needed to implement one. I have added som

Re: Beam 2.66.0 Release

2025-07-01 Thread XQ Hu via user
Great job, Vitaly! On Tue, Jul 1, 2025 at 2:12 PM Vitaly Terentyev via dev wrote: > Hi, > > I am happy to announce that Beam 2.66.0 has been fully released. For > more information about the release, check out the release notes - > https://github.com/apache/beam/releases/tag/v2.66.0. > > Thanks,

Re: max timeout for dataflow beam jobs

2025-06-28 Thread XQ Hu via user
It should still work. But it is now accessible with https://cloud.google.com/dataflow/docs/reference/service-options#python. For example, --dataflowServiceOptions=max_workflow_runtime_walltime_seconds=300 On Sat, Jun 28, 2025 at 6:27 AM Marc _ wrote: > hI all > i enquired on this long time ago

Re: Question about Beam SQL and Security

2025-06-27 Thread XQ Hu via user
I think your understanding is correct. https://docs.google.com/document/d/1tJapdA7ZNwkU0NaK7p-em0XnpHqNE1pKIXw9hVJkIUg/edit?tab=t.0#heading=h.83zu2vb65i5v has more details. On Fri, Jun 27, 2025 at 12:11 PM Ronoaldo Pereira wrote: > Hi! I have a question about Apache Beam and SQL... A colleague a