Re: Question: Best Practice for periodic file flushing in streaming DoFn

2025-08-01 Thread Jin An via user
euven Lax Sent: Friday, August 1, 2025 9:45 AM To: user@beam.apache.org Cc: Jin An Subject: Re: Question: Best Practice for periodic file flushing in streaming DoFn This is incorrect - FinishBundleContext is only valid inside of finishBundle. You cannot save it beyond the method scope. Have yo

Re: Question: Best Practice for periodic file flushing in streaming DoFn

2025-08-01 Thread Reuven Lax via user
This is incorrect - FinishBundleContext is only valid inside of finishBundle. You cannot save it beyond the method scope. Have you looked at the existing files sinks? Do those not work for your use case? On Fri, Aug 1, 2025 at 9:29 AM Jin An via user wrote: > Hi Beam Community, > > I'm reac

Re: Splitable DoFn

2025-07-18 Thread Israel Herraiz via user
Are you using Python or Java? In Java, there is a good example in the chapter 7 of this book (repo with code

Re: EXTERNAL Email: Re: Splitable DoFn

2025-07-08 Thread Jaehyeon Kim
gt; Thank you, > Zack Culberson > -- > *From:* XQ Hu via user > *Sent:* Tuesday, July 8, 2025 7:35 PM > *To:* user@beam.apache.org > *Cc:* XQ Hu > *Subject:* EXTERNAL Email: Re: Splitable DoFn > > How you checked > https://beam.apache.org/blo

Re: EXTERNAL Email: Re: Splitable DoFn

2025-07-08 Thread Zack Culberson
Sent: Tuesday, July 8, 2025 7:35 PM To: user@beam.apache.org Cc: XQ Hu Subject: EXTERNAL Email: Re: Splitable DoFn How you checked https://beam.apache.org/blog/splittable-do-fn-is-available/<https://urldefense.com/v3/__https://beam.apache.org/blog/splittable-do-fn-is-available/__;!!K1aXqncla1X7G9

Re: Splitable DoFn

2025-07-08 Thread XQ Hu via user
How you checked https://beam.apache.org/blog/splittable-do-fn-is-available/? It lists some real world examples. On Tue, Jul 8, 2025 at 6:59 PM Zack Culberson wrote: > Hi all, > > I have been looking for a working example of a splitable DoFn and what is > needed to implement one. I have added som

Re: Beam 2.66.0 Release

2025-07-01 Thread XQ Hu via user
Great job, Vitaly! On Tue, Jul 1, 2025 at 2:12 PM Vitaly Terentyev via dev wrote: > Hi, > > I am happy to announce that Beam 2.66.0 has been fully released. For > more information about the release, check out the release notes - > https://github.com/apache/beam/releases/tag/v2.66.0. > > Thanks,

Re: Twister2 Runner

2025-06-30 Thread Kenneth Knowles
+1 good idea. If the upstream project is not updating then most likely this can be archived/removed. We did this for Apex and Gearpump without waiting for a major version bump. Here is that old thread: https://lists.apache.org/thread/p63nbgb6hx0rl8287zjt6q96zwwrtqmr Kenn On Fri, Jun 27, 2025 at

Re: max timeout for dataflow beam jobs

2025-06-28 Thread XQ Hu via user
It should still work. But it is now accessible with https://cloud.google.com/dataflow/docs/reference/service-options#python. For example, --dataflowServiceOptions=max_workflow_runtime_walltime_seconds=300 On Sat, Jun 28, 2025 at 6:27 AM Marc _ wrote: > hI all > i enquired on this long time ago

Re: Question about Beam SQL and Security

2025-06-27 Thread dan young
unsubscribe On Fri, Jun 27, 2025 at 10:42 AM XQ Hu via user wrote: > I think your understanding is correct. > https://docs.google.com/document/d/1tJapdA7ZNwkU0NaK7p-em0XnpHqNE1pKIXw9hVJkIUg/edit?tab=t.0#heading=h.83zu2vb65i5v > has more details. > > On Fri, Jun 27, 2025 at 12:11 PM Ronoaldo Pere

Re: Question about Beam SQL and Security

2025-06-27 Thread XQ Hu via user
I think your understanding is correct. https://docs.google.com/document/d/1tJapdA7ZNwkU0NaK7p-em0XnpHqNE1pKIXw9hVJkIUg/edit?tab=t.0#heading=h.83zu2vb65i5v has more details. On Fri, Jun 27, 2025 at 12:11 PM Ronoaldo Pereira wrote: > Hi! I have a question about Apache Beam and SQL... A colleague a

Re: Twister2 Runner

2025-06-27 Thread Yi Hu via user
Hi Joey, Thanks for raising this question. As far as I know there are a few tests exercising on twister2 runner: - https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_ValidatesRunner_Twister2.yml - https://github.com/apache/beam/actions/workflows/beam_PostRelease_NightlySnapshot

Re: Catching Up with a Streaming Pipeline

2025-05-30 Thread Jonathan Hope
The pipeline sees the message in dofns before the windowing. However any messages that were published before the pipeline starts will not end up in windows. As near as I can tell windows cannot have data older than the process time at which the pipeline was started. On Wed, May 21, 2025 at 12:52 P

Re: Catching Up with a Streaming Pipeline

2025-05-21 Thread Reuven Lax via user
If you create a Pub/Sub subscription before you start the Beam pipeline, the subscription will capture all of those messages (as long as you start the Beam pipeline within 7 days). You can then start the Beam pipeline against that subscription, which should do what you want. On Fri, May 16, 2025 a

Re: Writing to partitioned BigQuery tables

2025-05-15 Thread Lina Mårtensson via user
Thanks Radek! Very delayed response here... but we've solved the problem now. We finally just went through a somewhat painful migration to a new dataset where the tables were partitioned correctly, flipped the flags on all our jobs at the same time, etc etc. A bit painful but I think less risky th

Re: [Question] How to maintain cross-window state

2025-05-15 Thread Shaochen Bai
Interesting suggestion. Are you proposing that we transform our original windowing strategy into a global window + OrderedListState +timer? Then the state can indeed persist throughout the entire execution. The only problem left I think is that the state would be lost whenever we cancel and redeplo

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-14 Thread Kenneth Knowles
I wonder, then, if the phantom type trick might be less "extra burden for type safety" and more unambiguously helpful. It might looks something like this (I'm sure someone has written how to do this in Python properly but I'm just winging it) class OutputTag[T]: # a string, basically, and I will

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-14 Thread Jack McCluskey via user
Ah yeah, unfortunately tagged outputs currently inherit the output typing of the parent DoFn. It's a bit of a pain, since a "correct" output type hint becomes the union of all of the possible output types of the DoFn (and becomes really hard to scope back down for DoFns that consume specific tagged

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-14 Thread Kenneth Knowles
Replying to break the silence - in Java the DoFn is done according to the main output type, then phantom types on the output tag are used to make sure non-main outputs are type safe (I wouldn't expect this sort of technique in Python) Anyone who is more expert in Beam Python typing stuff? +Jack Mc

Re: Beam 2.65.0 Release

2025-05-12 Thread XQ Hu via user
Thank you, Yi! Great news! On Mon, May 12, 2025 at 3:34 PM Yi Hu via dev wrote: > Hi, > > I am happy to announce that Beam 2.65.0 has been fully released. For > more information about the release, check out the release notes - > https://github.com/apache/beam/releases/tag/v2.65.0. > > Thanks, >

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-12 Thread Robert Bradshaw
On Fri, May 9, 2025 at 3:14 PM Joey Tran wrote: > Seems like a hard problem. I suppose it could look something like: > ``` > def process(self, x) -> Iterable[str | TaggedOutputs[{"numbers": int]] > ``` > A little ugly... > Yeah, something like this was what I was thinking. Something that both th

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-09 Thread Robert Bradshaw
Unfortunately type hints have not yet been implemented for multiple-ouput Fns (though I think perhaps Jack was looking into this?) On Fri, May 9, 2025 at 2:40 PM Joey Tran wrote: > Is it to just type it based on the main output? > def process(self, x) -> str: > yield "x" > yield TaggedOu

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-09 Thread Joey Tran
Seems like a hard problem. I suppose it could look something like: ``` def process(self, x) -> Iterable[str | TaggedOutputs[{"numbers": int]] ``` A little ugly... On Fri, May 9, 2025 at 6:00 PM Robert Bradshaw wrote: > Unfortunately type hints have not yet been implemented for > multiple-ouput

Re: [Question] How to maintain cross-window state

2025-05-06 Thread Kenneth Knowles
It does sound to me like your use case may be a good fit for using DoFn with state. I realize now that RequiresTimeSortedInput may not be supported for the configuration. As a workaround, you can use OrderedListState to buffer elements and then process them in order using a timer to "wake up" your

Re: [Question] How to maintain cross-window state

2025-05-06 Thread Shaochen Bai
Hello, Our objective is to maintain a persistent, time-relevant state per key. What we do now is that we use non-overlapping windows and apply GroupByKey.create() to gather an array of windowed data for each key. We then sort the data by timestamp and iterate through the array to update the asso

Re: [Question] How to maintain cross-window state

2025-05-05 Thread Kenneth Knowles
Hello! This is not possible in a simple way, because of the main fact: windows are processed simultaneously. Many windows may have some state and incoming data at the same time, even if the time ranges of your windows do not overlap. So, sharing state across windows would need concurrency control

Re: Uploading Beam Inference example

2025-05-03 Thread XQ Hu via user
Feel free to create a PR for your example. Thanks! On Sat, May 3, 2025 at 12:43 PM Sofia’s World wrote: > HI All > i have written a sample pipeline using Apache Beam which uses > RunInference to pass > to an LLM a list of stocks and get it to interpret - based on some > criteria - , which one

Re: Beam YAML Side Inputs?

2025-05-01 Thread XQ Hu via user
We do not have a plan to support this yet. We are trying to package all these into some higher level APIs. For example, YAML does not surface Reshuffle but Create ( https://beam.apache.org/releases/yamldoc/current/#create) has the boolean option to add this after Create. On Thu, May 1, 2025 at 3:5

Re: Bizzarre behavour in Apache Beam pipeline after adding a RunInference pipeline / ignoe

2025-05-01 Thread Sofia’s World
Apologies, operator error :( On Thu, May 1, 2025 at 4:33 PM Sofia’s World wrote: > HI all > was wondering if someone can advise here as i am puzzled on what is > happening (and perhaps it's my poor understanding of beam) > > Here's my usecase - very simplified -: > > I have 3 collections whic

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-30 Thread Shaochen Bai
Hello, Thank you all for your responses — I now have a much clearer understanding of how state works in Apache Beam. The data we currently store in Bigtable is critical, and we want to ensure it is never lost. Duplicates are not an issue for us, as we always perform idempotent updates to uniqu

Re: Beam YAML is great!

2025-04-29 Thread Robert Bradshaw
Thanks for the feedback! Glad it's working out so well for you. A case study (when you get to that point) is a great idea. On Tue, Apr 29, 2025 at 7:24 PM Joey Tran wrote: > Sure! I'll work on putting together a case study in a couple months once > one of our products are ready. The Beam YAML ca

Re: Beam YAML is great!

2025-04-29 Thread Joey Tran
Sure! I'll work on putting together a case study in a couple months once one of our products are ready. The Beam YAML case study probably won't be ready for a bit however though since the systems we're considering building with it are in very early stages On Tue, Apr 29, 2025, 8:21 PM XQ Hu via us

Re: Beam YAML is great!

2025-04-29 Thread XQ Hu via user
We are glad you like it. For case studies, if you are interested in this, please let me know. Some links: https://beam.apache.org/case-studies/ and https://beam.apache.org/community/case-study/. :) The no-code experience is one of our focuses for Beam 3.0 as well. On Tue, Apr 29, 2025 at 7:39 PM

Re: Beam YAML is great!

2025-04-29 Thread Ahmet Altay via user
Great to hear and thank you for the feedback Joey! Would you be interested in publishing a case study on Beam's website? We will all very much appreciate that :) On Tue, Apr 29, 2025 at 2:41 PM Joey Tran wrote: > We've just upgraded beam to 2.63 and started prototyping and building on > Beam YA

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-29 Thread Reuven Lax via user
Pipeline state persists across pipeline updates - i.e. if you update the job to. a new one. If you cancel the job and restart, then you generally lose the state. Writing to an external store such as BigTable from your DoFn can be tricky both from a performance perspective and a correctness perspec

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-28 Thread XQ Hu via user
Please check https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#state-data and https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#CCheck https://beam.apache.org/blog/stateful-processing/ has more details about how to use stateful DoFn. The state persists. And you can

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-28 Thread Shaochen Bai
Hello, Thank you for your response. I was not aware that state in Apache Beam persists across different jobs — there seem to be very few open resources discussing this. Here is one of the few I found. I do have some concerns regarding state management: 1. Does the state persist if the pipeline

Re: [Question] Best Practices for Managing Persistent State with Bigtable in Streaming Beam Pipelines

2025-04-25 Thread XQ Hu via user
Apache Beam provides a built-in mechanism specifically for managing per-key-and-window state that persists across workers and pipeline restarts. Is there anything you can not use https://beam.apache.org/documentation/programming-guide/#state-and-timers? On Fri, Apr 25, 2025 at 8:45 AM Shaochen Bai

Re: Limiting the Number of Parallel Reads/Writes

2025-04-10 Thread Jonathan Hope
Right so in that case we really have one knob to turn: the number of records (bundle size). We would still want to choose some kind of reasonable upper bound for the number of records being read. In the case where the collection/partition being read from has 2 billion things say we decide to split

Re: Limiting the Number of Parallel Reads/Writes

2025-04-09 Thread Radek Stankiewicz via user
in the context of mongodb - there are already configuration pieces: https://github.com/apache/beam/blob/7136380c4a79f8dea9b42a42ee7569b665edf431/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/MongoDbIO.java#L230 bucketAuto or numSplits. exact logic how it would split it is writt

Re: Limiting the Number of Parallel Reads/Writes

2025-04-09 Thread Jonathan Hope
I think I'm still struggling a bit... Let's stick with a bounded example for now. I would be reading from a single mongo cluster/database/collection/partition that has billions of things in it. I read through the mongoio code a bit and it seems to: 1. Get the min ID 2. Get the max ID 3.

Re: Limiting the Number of Parallel Reads/Writes

2025-04-09 Thread Radek Stankiewicz via user
hey Jonathan, parallelism for read and write is directly related to the amount of keys you are processing in the current stage. As an example - Imagine you have KafkaIO with 1 partition - and after reading from KafkaIO you have a mapping step to JDBC entity and then you have a step writing to the

Re: [Discussion] Deprecate ZetaSQL

2025-04-08 Thread Yi Hu via user
Thanks for the suggestion. Both documentation and expansion time warnings are added in https://github.com/apache/beam/pull/34563 . Yi On Mon, Mar 31, 2025 at 2:43 PM Kenneth Knowles wrote: > One way that you might be able to reach some users is to issue a warning > in the code for SqlTransform

Re: [Discussion] Deprecate ZetaSQL

2025-04-05 Thread Yi Hu via user
Thanks for inputs! From discussion we have agreed on moving forward Next step: - Draft PR to note the deprecation status in Documentation, including CHANGES, Javadoc, etc - The earliest release to stop publishing ZetaSQL artifacts is pushed further, not earlier than 1 full quarter before the nex

Re: Writing to partitioned BigQuery tables

2025-04-05 Thread Radek Stankiewicz via user
Hi Lina, thanks for confirming that you are using write_truncate - that's a bummer as it's not possible to truncate with storage_write_api, it's append only. As it is a batch I imagine your job could have a step where you truncate the table to overcome this limitation (e.g. via DML). Have in mind t

Re: [Question] Redirect Write Failures to Dead Letter Queue

2025-04-03 Thread Jonathan Hope
Of the supported Beam languages the one I know the best is Go so that's what I've been working in. What I ended up doing was forking the databaseio code and changing it so that any rows that had an error on a write are emitted as a pcollection, and the error in question is also emitted as a pcollec

Re: [Question] Redirect Write Failures to Dead Letter Queue

2025-04-03 Thread Radek Stankiewicz via user
hi Jonathan, Is there a specific IO in mind you would like to use? If I think about enforcing key constraints, then I think about sql databases like pgsql, mysql, alloydb. In Beam those databases are supported in JDBCIO. Problem with JDBCIO is that it doesn't yet have an error handler - https://gi

Re: Langchain/Agents on Beam

2025-04-02 Thread Sofia’s World
Hello Danny sure.. i currently have few 'standard' dataflows sourcing financial data from the internet and 'doing something about it' i then got hooked on langchain, and thought about writing a simple Agent to fetching similar information, rather than me fetching the data and do somoe analysis t

Re: Langchain/Agents on Beam

2025-04-02 Thread Danny McCormick via user
Yeah, this shouldn't really be particularly different than any other sort of inference. I'm curious about your use case - would you be willing to share more about what you are using this for? For context, I'm interested in seeing if we can add broader agentic support to Beam, though I'm having a ha

Re: calling an LLM from a Beam function

2025-04-02 Thread Sofia’s World
Great thanks Radek! works like a charm! kind regards Marco On Tue, Apr 1, 2025 at 9:34 AM Sofia’s World wrote: > I see .I should have read code better..I did not see run inference is part > of beam rather than the specific example.thanks > > On Tue, 1 Apr 2025, 09:02 Radek Stankiewicz, wrote: >

Re: [Question] Redirect Write Failures to Dead Letter Queue

2025-04-01 Thread Robert Bradshaw via user
On Tue, Apr 1, 2025 at 12:01 PM Jonathan Hope wrote: > > Unfortunately both databases will be online during this so conflicts could > occur in either direction. I had previously dug up an answer around modifying > the JdbcIO here: > https://stackoverflow.com/questions/56398422/exception-handlin

Re: [Question] Redirect Write Failures to Dead Letter Queue

2025-04-01 Thread Robert Bradshaw via user
Good question. I think it depends on who else is modifying the SQL database. In the easy case (e.g. everything you want to write to your SQL database comes from the NoSQL source) you could group (e.g. via a GroupByKey) on your identifier, filter out duplicates with a subsequent DoFn, and then writ

Re: [Question] Redirect Write Failures to Dead Letter Queue

2025-04-01 Thread Jonathan Hope
Unfortunately both databases will be online during this so conflicts could occur in either direction. I had previously dug up an answer around modifying the JdbcIO here: https://stackoverflow.com/questions/56398422/exception-handling-in-apache-beam-pipelines-when-writing-to-database-using-java. But

Re: calling an LLM from a Beam function

2025-04-01 Thread Radek Stankiewicz via user
hey Marco, In your case, as your model inference is remote and has a custom handler, the only difference is that RunInference transform is adding batching before invoking the handler. Today this handler is pretty simple but I would imagine that this RunInference in the future may introduce certain

Re: calling an LLM from a Beam function

2025-04-01 Thread Sofia’s World
I see .I should have read code better..I did not see run inference is part of beam rather than the specific example.thanks On Tue, 1 Apr 2025, 09:02 Radek Stankiewicz, wrote: > hey Marco, > In your case, as your model inference is remote and has a custom handler, > the only difference is that Ru

Re: calling an LLM from a Beam function

2025-03-31 Thread Sofia’s World
Hi Radek uhm, how does that differ from creating something like this? am i missing something? Kind regards Marco class LLMProcessor(beam.DoFn): def __init__(self, key): """Initiate the OAI API client.""" self.client = OpenAI( api_key=key ) def pro

Re: [Discussion] Deprecate ZetaSQL

2025-03-31 Thread Kenneth Knowles
One way that you might be able to reach some users is to issue a warning in the code for SqlTransform.expand(), with the deprecation timeline. Maybe this is under "etc" but I wanted to mention it because the other items were all documentation. Kenn On Mon, Mar 31, 2025 at 1:34 PM Yi Hu via dev w

Re: calling an LLM from a Beam function

2025-03-27 Thread Sofia’s World
thanks Radek, i'll give it a go and report back if i get stuck rgds On Thu, Mar 27, 2025 at 8:48 AM Radek Stankiewicz via user < user@beam.apache.org> wrote: > hi Sofia, > > here you have nice example > https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.

Re: calling an LLM from a Beam function

2025-03-27 Thread Radek Stankiewicz via user
hi Sofia, here you have nice example https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb where CloudVisionModelHandler is custom code that can invoke any client library. you can pass the key as one of the constructors to CloudVisionModelHandler or y

Re: [Discussion] Deprecate ZetaSQL

2025-03-26 Thread Kenneth Knowles
+1 to this deprecation. Thanks for putting together a clear summary. FWIW it also has significantly worse performance than Calcite SQL dialect, since it calls out to a ZetaSQL subprocess for most calculations, and that is less optimized than Beam's Fn API. Kenn On Tue, Mar 25, 2025 at 4:18 PM Ro

Re: Specifying java dependencies for multi-language Beam pipeline (using java transforms from python)

2025-03-26 Thread Robert Bradshaw via user
Creating your own jar with updated dependencies is the right solution, to ensure it gets used you may need to set Python's --beam_services pipeline option, e.g. --beam_services='{"sdks:java:io:expansion-service:shadowJar": "/path/to/your.jar"}' See https://github.com/apache/beam/blob/release-2.4

Re: [Discussion] Deprecate ZetaSQL

2025-03-25 Thread Robert Bradshaw via user
I'm in favor of deprecating this and cleaning it up, but it depends on usage. I suspect it is low (or possibly non-existent, especially as there's little upside to moving away from the default). I cc'd user@ just in case anyone wants to chime in there. This may be a good thing to add to our release

Re: Regarding the GSOC 2025 Project

2025-03-22 Thread SIDDHARTH SALIAN
Hello Sir, Thank you for the email, I shall go through the links and guides. Thanks, Siddharth From: Danny McCormick via user Date: Saturday, 22 March 2025 at 1:35 AM To: user@beam.apache.org Cc: Danny McCormick Subject: Re: Regarding the GSOC 2025 Project Hey Siddharth, I would recommend

Re: Regarding the GSOC 2025 Project

2025-03-21 Thread Danny McCormick via user
te: *Tuesday, 4 March 2025 at 3:00 AM > *To: *user@beam.apache.org > *Cc: *Danny McCormick > *Subject: *Re: Regarding the GSOC 2025 Project > > No, you will not need a strong understanding of deep learning. I think the > main thing you'll need is an understanding of how em

Re: Regarding the GSOC 2025 Project

2025-03-21 Thread SIDDHARTH SALIAN
would provide better clarity sir. Best Regards, Thanking you Siddharth Salian From: Danny McCormick via user Date: Tuesday, 4 March 2025 at 3:00 AM To: user@beam.apache.org Cc: Danny McCormick Subject: Re: Regarding the GSOC 2025 Project No, you will not need a strong understanding of deep

Re: Writing to partitioned BigQuery tables

2025-03-19 Thread Lina Mårtensson via user
Thanks Radek! I didn't realize that writing is done with a copy job - then I understand why we need to configure partitioning as well. And that all makes sense. We haven't tried the storage write API - that wasn't available for Python yet when we started doing this. I will take a look at it and s

Re: Writing to partitioned BigQuery tables

2025-03-19 Thread Radek Stankiewicz via user
hi Lina, there are multiple reasons why copy job is used with temporary table; - you may be using dynamic destinations - you are loading lots of data, probably with truncate This way we ensure atomicity as we can trigger copy from multiple temp tables into one final table. Can you confirm or paste

Re: [python] is merge_accumulators called with stream of accumulators?

2025-03-14 Thread Joey Tran
My intuition says no says all the accumulators will just grouped together into a single key/accumulators element and that entire element will get loaded into memory, but not sure if there's any kind of special magic to handle this On Tue, Mar 11, 2025 at 11:16 PM Joey Tran wrote: > Hey all, > >

Re: [python] Beam Education Material for Workshops

2025-03-11 Thread Joey Tran
This is very helpful. Thanks all! On Sat, Mar 8, 2025 at 4:17 PM Ahmet Altay via user wrote: > Adding to XQ's list, there are also some docs discussing the execution > model in addition to the programming model > - https://cloud.google.com/dataflow/docs/pipeline-lifecycle > - https://cloud.googl

Re: [python] Beam Education Material for Workshops

2025-03-08 Thread Ahmet Altay via user
Adding to XQ's list, there are also some docs discussing the execution model in addition to the programming model - https://cloud.google.com/dataflow/docs/pipeline-lifecycle - https://cloud.google.com/dataflow/docs/concepts/beam-programming-model & https://beam.apache.org/documentation/programming-

Re: [python] Beam Education Material for Workshops

2025-03-08 Thread XQ Hu via user
We probably have many online resources that cover these topics but they are scattered. For example, Beam Summit and College talks on Youtube: https://www.youtube.com/@ApacheBeamYT (Beam Summit slides can be found here: https://beamsummit.org/) and https://www.youtube.com/@BeamCollege ( https://beam

RE: Re: [Question] Regarding custom metrics in beam

2025-03-03 Thread 이소망
Hello. Thank you for sharing your issue. Is your pipeline Bounded or Unbounded? Thank you. On 2024/10/22 12:05:07 Kenneth Knowles wrote: > Thanks for reporting this. I know there's problems with the plumbing of > metrics in the FlinkRunner, but I don't know the full extent. I'm actually > starti

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread Danny McCormick via user
gt; > > *From: *SIDDHARTH SALIAN > *Date: *Tuesday, 4 March 2025 at 1:53 AM > *To: *Danny McCormick , Danny McCormick via > user > *Subject: *Re: Regarding the GSOC 2025 Project > > Respected Sir, > > Thank you for the email. I have understood. I’ll continue the conver

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread SIDDHARTH SALIAN
to have strong grip over it especially for this project? Thanking you, Siddharth From: SIDDHARTH SALIAN Date: Tuesday, 4 March 2025 at 1:53 AM To: Danny McCormick , Danny McCormick via user Subject: Re: Regarding the GSOC 2025 Project Respected Sir, Thank you for the email. I have understood

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread SIDDHARTH SALIAN
: SIDDHARTH SALIAN Cc: Danny McCormick via user Subject: Re: Regarding the GSOC 2025 Project I'd probably recommend using the dev@ list; both are fine, but dev@ is probably more likely to have more folks with ideas/opinions. Thanks, Danny On Mon, Mar 3, 2025 at 3:17 PM SIDDHARTH S

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread Danny McCormick via user
ood for now sir? > > > > Thanking you, > > Siddharth > > > > *From: *Danny McCormick > *Date: *Tuesday, 4 March 2025 at 1:37 AM > *To: *SIDDHARTH SALIAN > *Cc: *user@beam.apache.org , damcc...@apache.org < > damcc...@apache.org> > *Subject: *Re: Rega

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread Danny McCormick via user
> *To: *SIDDHARTH SALIAN > *Cc: *Danny McCormick via user > *Subject: *Re: Regarding the GSOC 2025 Project > > I think that should be plenty for now, thanks! > > > > On Mon, Mar 3, 2025 at 3:11 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: &

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread SIDDHARTH SALIAN
1:43 AM To: SIDDHARTH SALIAN Cc: Danny McCormick via user Subject: Re: Regarding the GSOC 2025 Project I think that should be plenty for now, thanks! On Mon, Mar 3, 2025 at 3:11 PM SIDDHARTH SALIAN mailto:siddharthsalia...@gmail.com>> wrote: Respected Sir, Thank you for the email. I sh

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread SIDDHARTH SALIAN
@beam.apache.org , damcc...@apache.org Subject: Re: Regarding the GSOC 2025 Project > Sir, with reference to the point about python, I meant to ask that sir, like > apart from learning the main coding language of python, anything more > important topic has to be learnt (such as pytho

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread SIDDHARTH SALIAN
Subject: Re: Regarding the GSOC 2025 Project > Sir, apart from strong fundamentals of vector DB’s, python fundamentals, Beam > docs, writing sink, is there anything much important topic to be > covered/learnt other than these as part of project prerequisites? I think those are the main

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread Danny McCormick via user
ards, > > Thanking you, > > Siddharth Salian > > > > *From: *Danny McCormick via user > *Date: *Tuesday, 4 March 2025 at 1:25 AM > *To: *user@beam.apache.org > *Cc: *Danny McCormick > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, apart

Re: Regarding the GSOC 2025 Project

2025-03-03 Thread Danny McCormick via user
e. Also >continuous mails won’t be appealing. Whatever you agree upon sir, we can >follow it upon sir. > > > > Best Regards, > > Thanking you, > > Siddharth Salian > > > > *From: *SIDDHARTH SALIAN > *Date: *Friday, 21 February 2025 at 12:59 AM &g

Re: Regarding the GSOC 2025 Project

2025-03-02 Thread SIDDHARTH SALIAN
a new concept and environment for me. Also continuous mails won’t be appealing. Whatever you agree upon sir, we can follow it upon sir. Best Regards, Thanking you, Siddharth Salian From: SIDDHARTH SALIAN Date: Friday, 21 February 2025 at 12:59 AM To: user@beam.apache.org Subject: Re: Regarding

Re: Assistance required: Error Translating Pipeline with ReadFromKafka in Apache Beam

2025-02-25 Thread Jan Lukavský
Hi Utharsh, the cause of the error is the use_deprecated_read flag you pass to the expansion service. Dataflow v2 expands KafkaIO using SDF [1] and does not support (as the error says) the depreacted Read transform. Just remove the flag and it should work. Best,  Jan [1] https://beam.apach

Re: IO Connector for AliCloud's MaxCompute Bigdata store

2025-02-24 Thread XQ Hu via user
I am not sure anyone is working on this. This is not on the roadmap, either. On Mon, Feb 24, 2025 at 6:20 AM Rajath BK wrote: > Bumping up for attention... > > - Thanks and Regards > Rajath > > > On Tue, Feb 18, 2025 at 11:10 PM Rajath BK wrote: > >> Hello community folks, >> We have a req

Re: IO Connector for AliCloud's MaxCompute Bigdata store

2025-02-24 Thread Rajath BK
Bumping up for attention... - Thanks and Regards Rajath On Tue, Feb 18, 2025 at 11:10 PM Rajath BK wrote: > Hello community folks, > We have a requirement to interact with AliCloud's Maxcompute Bigdata > store. As of today, there seems to be no official I/O connector for > Maxcompute. > I

Re: Regarding the GSOC 2025 Project

2025-02-20 Thread SIDDHARTH SALIAN
Hello Sir, Thank you for the email. I have understood. Thanks, Siddharth Salian From: Danny McCormick via user Date: Thursday, 20 February 2025 at 9:51 PM To: user@beam.apache.org Cc: Danny McCormick Subject: Re: Regarding the GSOC 2025 Project > Sir, as you have mentioned in the mail, Pyt

Re: Regarding the GSOC 2025 Project

2025-02-20 Thread Danny McCormick via user
roject, don’t you think RAG is still >limited to capturing historical data, or it has capability of capturing >latest/modern data’s too? > > > > Best regards, > > Thanking you, > > Siddharth Salian > > > > *From: *Danny McCormick via user > *

Re: Regarding the GSOC 2025 Project

2025-02-18 Thread SIDDHARTH SALIAN
Subject: Re: Regarding the GSOC 2025 Project Hey Siddharth, thanks for reaching out. I'm glad you're interested in the project. In general, I would expect there to be more details about projects once we know which ones have been accepted. > Sir, if you could tell me the pre-requi

Re: Regarding the GSOC 2025 Project

2025-02-18 Thread Danny McCormick via user
Hey Siddharth, thanks for reaching out. I'm glad you're interested in the project. In general, I would expect there to be more details about projects once we know which ones have been accepted. > Sir, if you could tell me the pre-required knowledge (such as major programming languages used, etc.,

Re: Regarding Updates, Slack and Contribution

2025-02-12 Thread SIDDHARTH SALIAN
Thank you so much. Also I shall follow up in the given thread link below. Best Regards, Siddharth From: Danny McCormick via user Date: Thursday, 13 February 2025 at 2:50 AM To: user@beam.apache.org Cc: Danny McCormick Subject: Re: Regarding Updates, Slack and Contribution > I had an doubt,

Re: Regarding Updates, Slack and Contribution

2025-02-12 Thread Danny McCormick via user
via user > *Date: *Wednesday, 12 February 2025 at 3:35 AM > *To: *user@beam.apache.org > *Cc: *Danny McCormick > *Subject: *Re: Regarding Updates, Slack and Contribution > > Hey, welcome to the Beam community! > > > > > Can anyone please tell me how I can join slack

Re: Regarding Updates, Slack and Contribution

2025-02-11 Thread SIDDHARTH SALIAN
, Thanking You Siddharth Salian From: Danny McCormick via user Date: Wednesday, 12 February 2025 at 3:35 AM To: user@beam.apache.org Cc: Danny McCormick Subject: Re: Regarding Updates, Slack and Contribution Hey, welcome to the Beam community! > Can anyone please tell me how I can join sl

Re: Regarding Updates, Slack and Contribution

2025-02-11 Thread SIDDHARTH SALIAN
Subject: Re: Regarding Updates, Slack and Contribution Hey, welcome to the Beam community! > Can anyone please tell me how I can join slack channel of ASF (Apache > Software Foundation) as I don’t have apache.org<http://apache.org/> email > address. Also, it would help me to know the co

Re: Regarding Updates, Slack and Contribution

2025-02-11 Thread Danny McCormick via user
Hey, welcome to the Beam community! > Can anyone please tell me how I can join slack channel of ASF (Apache Software Foundation) as I don’t have apache.org email address. Also, it would help me to know the community as well as know about the current workings on the project. I just sent you an inv

Re: Non-time based windowing

2025-02-07 Thread Joey Tran
; >>> >>> >>> On Wed, Feb 5, 2025 at 1:01 PM Robert Bradshaw via user < >>> user@beam.apache.org> wrote: >>> >>>> Interestingly, the very first prototypes of windows were >>>> actually called buckets, and we thought of appl

Re: Non-time based windowing

2025-02-07 Thread Joey Tran
, event time >>> in special in the sense that it's required for aggregation and omnipresent, >>> and so this is what windows are centred on. But nothing prevents one from >>> creating data-dependent windows (that, in batch, may not have anything to >>> do wi

Re: Non-time based windowing

2025-02-07 Thread Robert Bradshaw via user
nts one from >> creating data-dependent windows (that, in batch, may not have anything to >> do with time at all) and using them as secondary keys. >> >> The idea of carrying out-of-band metadata along with elements to >> simplify/re-use transforms is certainly an inter

Re: Number of connections to Kafka

2025-02-07 Thread Ahmet Altay via user
It should help. Adding @Yi Hu @Steven van Rossum @Sam Whittle who would be able to give a more definitive answer. On Fri, Feb 7, 2025 at 9:07 AM Utkarsh Parekh wrote: > Hi Team, > > > > I came across this PR and wanted to check if it addresses the issue of > multiple kafka connections being c

Re: EXTERNAL Email: Re: JMSIO support

2025-02-07 Thread Sam Whittle via user
’t say it was empty but the queue was receiving about a 1/3 >> to ¼ of the message one of the other Queue Managers was getting. But those >> connections were still going up. I am using dataflow runner v2 and 2.61 >> sdk. We saw the connections on MeshIq which we use to monitor the q

  1   2   3   4   5   6   7   8   9   10   >