Re: [ANNOUNCE] New committer: John Casey

2022-07-29 Thread Ahmed Abualsaud via dev
Congrats John, what a great addition! On Fri, Jul 29, 2022 at 4:56 PM Kerry Donny-Clark via dev < dev@beam.apache.org> wrote: > John, you have made a huge impact on the many, many users of Kafka and > other IOs. This is great recognition of your commitment to Beam. > Kerry > > On Fri, Jul 29, 202

A lesson about DoFn retries

2022-09-01 Thread Ahmed Abualsaud via dev
Hi all, TLDR: When writing IO connectors, be wary of how bundle retries can affect the work flow. A faulty implementation of a step in BigQuery batch loads was discovered recently. I raised an issue [1] but also wanted to mention it here as a potentially helpful lesson for those developing new/ex

Re: A lesson about DoFn retries

2022-09-02 Thread Ahmed Abualsaud via dev
gt; Brian > > On Thu, Sep 1, 2022 at 1:43 PM Ahmed Abualsaud via dev < > dev@beam.apache.org> wrote: > >> Hi all, >> >> TLDR: When writing IO connectors, be wary of how bundle retries can >> affect the work flow. >> >> A faulty implementation of a

Support existing IOs with Schema Transforms

2022-11-03 Thread Ahmed Abualsaud via dev
Hi all, There has been an effort to add SchemaTransform capabilities to our connectors to facilitate the use of multi-lang pipelines. I've drafted a document below that provides guidelines and examples of how to support IOs with SchemaTransforms. Please take a look and share your thoughts and sugg

Re: [ANNOUNCE] New committer: Ritesh Ghorse

2022-11-04 Thread Ahmed Abualsaud via dev
Congrats Ritesh! On Fri, Nov 4, 2022 at 10:29 AM Andy Ye via dev wrote: > Congrats Ritesh! > > On Fri, Nov 4, 2022 at 9:26 AM Kerry Donny-Clark via dev < > dev@beam.apache.org> wrote: > >> Congratulations Ritesh, I'm happy to see your hard work and community >> spirit recognized! >> >> On Fri, N

Re: [ANNOUNCE] New committer: Yi Hu

2022-11-09 Thread Ahmed Abualsaud via dev
Congrats Yi! On Wed, Nov 9, 2022 at 1:33 PM Sachin Agarwal via dev wrote: > Congratulations Yi! > > On Wed, Nov 9, 2022 at 10:32 AM Kenneth Knowles wrote: > >> Hi all, >> >> Please join me and the rest of the Beam PMC in welcoming a new >> committer: Yi Hu (y...@apache.org) >> >> Yi started con

Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Ahmed Abualsaud via dev
Thank you for the informative email Damon! I am in favor of setting an intuitive naming convention early on to reduce confusion when Schema Transforms become more widespread. I like the proposed name in your email and I think this convention should also apply to the rest of the classes involved her

Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Ahmed Abualsaud via dev
> > Schema-aware transforms are not restricted to I/Os. An arbitrary transform > can be a Schema-Transform. Also, designation Read/Write does not map to an > arbitrary transform. Probably we should try to make this more generic ? > Agreed, I suggest keeping everything on the left side of the name

Re: One Configuration, Many File Write Formats

2022-11-17 Thread Ahmed Abualsaud via dev
Thanks for drafting this Damon! I left some comments on the doc. It's really cool that users can go to one source with a specified file format (json, avro, xml, csv, parquet) and retrieve the relevant file writing PTransform. I also like how the same configuration can be re-used for different file

Re: [Proposal] Change to Default PubsubMessage Coder

2022-12-19 Thread Ahmed Abualsaud via dev
+1 to looking into using RowCoder, this may help avoid creating more specialized coders in the future (which is mentioned as a pain point in the issue you linked [1]). [1] https://github.com/apache/beam/issues/23525#issuecomment-1281294275 On Tue, Dec 20, 2022 at 3:00 AM Andrew Pilloud via dev w

Re: Refactor BigQuery SchemaTransforms naming

2023-03-03 Thread Ahmed Abualsaud via dev
Thank you Damon, I left a few comments. On Fri, Mar 3, 2023 at 11:14 AM Damon Douglas via dev wrote: > Hello Everyone, > > This PR brings BigQuery Schema transforms in line with the others in terms > of naming conventions and use of AutoService. > > https://github.com/apache/beam/pull/25706 > >

Re: [PROPOSAL] Preparing for 2.47.0 Release

2023-03-23 Thread Ahmed Abualsaud via dev
Thanks for stepping up Jack! On Thu, Mar 23, 2023 at 12:43 PM Ahmet Altay via dev wrote: > Thank you Jack! > > On Wed, Mar 22, 2023 at 8:39 AM Jack McCluskey via dev < > dev@beam.apache.org> wrote: > >> Hey all, >> >> The next (2.47.0) release branch cut is scheduled for April 5th, 2023, >> acco

Re: [ANNOUNCE] New committer: Anand Inguva

2023-04-21 Thread Ahmed Abualsaud via dev
Congrats Anand! On Fri, Apr 21, 2023 at 3:18 PM Anand Inguva via dev wrote: > Thanks everyone. Really excited to be a part of Beam Committers. > > On Fri, Apr 21, 2023 at 3:07 PM XQ Hu via dev wrote: > >> Congratulations, Anand!!! >> >> On Fri, Apr 21, 2023 at 2:31 PM Jack McCluskey via dev < >

Re: [ANNOUNCE] New committer: Damon Douglas

2023-04-24 Thread Ahmed Abualsaud via dev
Congrats Damon! On Mon, Apr 24, 2023 at 5:05 PM Kerry Donny-Clark via dev < dev@beam.apache.org> wrote: > Damon, you have done outstanding work to grow and improve Beam and the > Beam community. Well done, well deserved! > > On Mon, Apr 24, 2023 at 4:39 PM XQ Hu via dev wrote: > >> Congrats Damo

Re: [VOTE] Release 2.46.0, release candidate #1

2023-04-28 Thread Ahmed Abualsaud via dev
@Danny McCormick @Reuven Lax sorry it's been a while since you looked into this, but do you remember if the fix in #25642 issue is related to the recent "ALREADY_EXISTS: The offset is within stream, expected offset..." errors? On Fri, Mar 10, 2023 at 7

Proposal to reduce the steps to make a Java transform portable

2023-05-30 Thread Ahmed Abualsaud via dev
Hey everyone, I was looking at how we use SchemaTransforms in our expansion service. From what I see, there may be a redundant step in developing SchemaTransforms. Currently, we have 3 pieces: - SchemaTransformProvider [1] - A configuration object - SchemaTransform [2] The API is generally used l

Re: Proposal to reduce the steps to make a Java transform portable

2023-06-22 Thread Ahmed Abualsaud via dev
DRY. >>>>>> >>>>>> On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev < >>>>>> dev@beam.apache.org> wrote: >>>>>> >>>>>>> +1 to this simplification, it's a historical artifact that provides >

Re: Proposal to reduce the steps to make a Java transform portable

2023-06-29 Thread Ahmed Abualsaud via dev
>>>>> >>>>>>> >>>>>>> +1 but I think the hard part today is to convert existing >>>>>>> PTransforms to be schema-aware transform compatible (for example, change >>>>>>> input/output types and make su

Re: [ANNOUNCE] New committer: Ahmed Abualsaud

2023-08-28 Thread Ahmed Abualsaud via dev
Thanks to the PMC for these responsibilities, and thank you all for guiding me along this journey. I'm looking forward to helping this community however I can :) Best, Ahmed On Sun, Aug 27, 2023 at 8:48 PM Reza Rokni via dev wrote: > Congrats Ahmed! > > On Fri, Aug 25, 2023 at 2:34 PM John Case

Re: [ANNOUNCE] New PMC Member: Alex Van Boxel

2023-10-03 Thread Ahmed Abualsaud via dev
Congratulations! On Tue, Oct 3, 2023 at 3:48 PM Byron Ellis via dev wrote: > Congrats! > > On Tue, Oct 3, 2023 at 12:40 PM Danielle Syse via dev > wrote: > >> Congratulations Alex!! Definitely well deserved! >> >> On Tue, Oct 3, 2023 at 2:57 PM Ahmet Altay via dev >> wrote: >> >>> Congratulati

[Design Doc] Auto-generating SchemaTransform wrappers

2023-12-04 Thread Ahmed Abualsaud via dev
Hey everyone, I've written a design doc for automatically generating SchemaTransform wrappers. The document is focused on generating Python SDK wrappers, but the framework would be generic enough to be easily applicable to other SDKs. I'd like to share this with you all and gather any feedback. I

Re: Bigquery Connector Rate limits

2024-02-22 Thread Ahmed Abualsaud via dev
Hey Taher, Regarding the first question about what API Beam uses, that depends on the BigQuery method you set in the connector's configuration. We have 4 different write methods, and a high-level description of each can be found in the documentation: https://beam.apache.org/releases/javadoc/curren

Re: Proposal to reduce the steps to make a Java transform portable

2024-03-07 Thread Ahmed Abualsaud via dev
both Java and cross-language. >>>>>>>>> >>>>>>>> >>>>>>>> +1 but I think the hard part today is to convert existing >>>>>>>> PTransforms to be schema-aware transform compatible (for example, >>>&

Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Ahmed Abualsaud via dev
Hey all, There have been some conversations lately about how best to enable dynamic destinations in a portable context. Usually, this comes up for cross-language transforms and more recently for Beam YAML. I've started a short doc outlining some routes we could take. The purpose is to establish a

Re: Supporting Dynamic Destinations in a portable context

2024-03-27 Thread Ahmed Abualsaud via dev
the {record=..., dest_info=...} and the elide-fields >> approaches, as the former is nicer when one has a fixed representation for >> the output record (e.g. a proto or avro schema) and the flattened form for >> ease of use in more free-form contexts (e.g. when producing records from >

Re: [ANNOUNCE] New Committer: XQ Hu

2024-06-25 Thread Ahmed Abualsaud via dev
Congrats XQ! On Tue, Jun 25, 2024 at 3:17 AM Desire L wrote: > Congratulations XQ!🤓🤓 > > 2024. 6. 25. 오전 8:32, XQ Hu via dev 작성: > >  > Thanks a lot! Happy to keep working with all of you! > > On Mon, Jun 24, 2024 at 6:22 PM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> Cong

How to override sticky dependencies in gradle?

2024-07-29 Thread Ahmed Abualsaud via dev
Hey all, Does anyone have experience with sticky dependencies in gradle? I'm experimenting with the new Managed Iceberg connector and trying to write to GCS with a Hive catalog. I naturally need to pull some hive dependencies, but one particular module is very stubborn with its dependencies: *org

Re: How to override sticky dependencies in gradle?

2024-08-01 Thread Ahmed Abualsaud via dev
.com/GradleUp/shadow/issues/384 Shoutout to @Yi Hu for the help! On Mon, Jul 29, 2024 at 4:48 PM Tomo Suzuki wrote: > "resolutionStrategy.force" should work. Use "./gradlew dependencies" to > debug the dependencies. > https://docs.gradle.org/current/userguide/view

Updating the Python Multi-language Quickstart

2024-08-26 Thread Ahmed Abualsaud via dev
Hey all, I'm looking to update our multi-language quickstart [1] to use the updated SchemaTransform framework. I'm aiming to make it a generic step-by-step walkthrough so people can follow along, but will also submit some examples to the repo for reference. I've written up the content in a rough

Re: Updating the Python Multi-language Quickstart

2024-08-26 Thread Ahmed Abualsaud via dev
Updated link with access: https://docs.google.com/document/d/1_embA3pGwoYG7sbHaYzAkg3hNxjTughhFCY8ThcoK_Q/edit?usp=sharing On Mon, Aug 26, 2024 at 7:53 PM Ahmed Abualsaud wrote: > Hey all, > > I'm looking to update our multi-language quickstart [1] to use the updated > SchemaTransform framework.

Re: Supporting Dynamic Destinations in a portable context

2024-08-29 Thread Ahmed Abualsaud via dev
actually >>>>>>> a pretty convenient and natural way for the user to name their >>>>>>> destination >>>>>>> (for the common usecase, even easier than providing a lambda), and has >>>>>>> the >>>

Re: [YAML] Reprocessing failed records

2024-10-19 Thread Ahmed Abualsaud via dev
Another option is to add a second DLQ that outputs just the original rows, i.e. the user has the option to fetch failed rows with or without metadata. It would take some work on our side to add this second DLQ to existing transforms, but that seems pretty straightforward. On Sat, Oct 19, 2024 at 1

Re: Supporting Dynamic Destinations in a portable context

2024-09-19 Thread Ahmed Abualsaud via dev
gt;>>>>> have >>>>>>>> to be part of the record, and can be the output of an arbitrary map if >>>>>>>> need >>>>>>>> be, makes this restriction not so bad.) >>>>>>>> >>>>>>>>

Re: [ANNOUNCE] New PMC Member: Danny McCormick

2024-12-27 Thread Ahmed Abualsaud via dev
Well deserved! Thanks for all your hard work Danny On Fri, Dec 20, 2024 at 7:58 PM LDesire wrote: > Congratulations Danny! 😀

Re: Streaming Iceberg Source

2025-03-18 Thread Ahmed Abualsaud via dev
n Fri, Mar 7, 2025 at 11:05 AM Jean-Baptiste Onofré wrote: > Hi Ahmed, > > I added some comments in the design doc. > > Regards > JB > > On Mon, Mar 3, 2025 at 8:52 PM Ahmed Abualsaud via dev > wrote: > > > > Hey everyone, > > > > I've put togeth

Streaming Iceberg Source

2025-03-03 Thread Ahmed Abualsaud via dev
Hey everyone, I've put together a design doc for a new streaming source for Apache Iceberg. The initial implementation focuses on supporting append-only snapshots, but it’s designed to accommodate future CDC support. The PR tracking this work is here: #33504

Re: Best way to normalize TFRecordIO

2025-02-24 Thread Ahmed Abualsaud via dev
+1 to option (2) On Mon, Feb 24, 2025 at 1:32 PM Danny McCormick via dev wrote: > Thanks for looking into this! I think I like option (2) for the base > transform since it allows us to normalize across languages and get this > added with the lowest amount of effort, plus it doesn't stop us from

Re: Introducing Catalogs to Beam SQL

2025-06-13 Thread Ahmed Abualsaud via dev
ite-up. This is a pretty good start! I left some > comments, but if we have time pressure, I think we can release "something", > but clearly mark it as experimental (or better unstable), so that users > know what is the current state. > > WDYT? > > Jan > On 6/12/25

Re: [ANNOUNCE] New Committer: Shunping Huang

2025-06-10 Thread Ahmed Abualsaud via dev
Congrats Shunping!! Very well deserved, thank you for all the contributions! On Sun, Jun 8, 2025 at 4:57 PM Rakesh Kumar wrote: > Congratulations Shunping 🎉👏!!! > > On Sat, Jun 7, 2025, 9:47 AM Danny McCormick via dev > wrote: > >> Congratulations Shunping! This is well deserved! >> >> On Sat,

Introducing Catalogs to Beam SQL

2025-06-10 Thread Ahmed Abualsaud via dev
Hey all, I was integrating our Java IcebergIO with Beam SQL (PR #34799 ) and got blocked on the fact that Beam SQL currently lacks a "Catalog" concept. This is fundamental to modern data architectures like Iceberg, where they are used to manage table meta

Re: Introducing Catalogs to Beam SQL

2025-06-11 Thread Ahmed Abualsaud via dev
ere we can have a > > discussion about the goals, alternative solutions, already tried ways, > > etc? This would be really cool! > > > > Best, > > > > Jan > > > > On 6/10/25 16:12, Ahmed Abualsaud via dev wrote: > > > Hey all, > > >

Re: [RESULT] [VOTE] Release 2.66.0, release candidate #2

2025-07-01 Thread Ahmed Abualsaud via dev
Thanks Vitaly! On Tue, Jul 1, 2025 at 12:10 PM XQ Hu via dev wrote: > Congratulations! Thanks a lot for all your hard work!!! > > On Tue, Jul 1, 2025 at 11:29 AM Vitaly Terentyev via dev < > dev@beam.apache.org> wrote: > >> I'm happy to announce that we have unanimously approved this release. >>

Re: [Announce] Java11+ is now required to build Beam repo. User pipeline not affected.

2025-07-16 Thread Ahmed Abualsaud via dev
Amazing, thank you Yi! On Wed, Jul 16, 2025 at 4:59 PM Yi Hu via dev wrote: > Hi Beam Developers, > > As part of Java8 deprecation [1], Beam repo now requires Java11+ to build > jars from Beam repository sources. > > * What's affected: > > Commit [2] upgraded certain Gradle plugins that dropped

[Beam SQL] Addressing gaps in metadata management to improve usability

2025-07-22 Thread Ahmed Abualsaud via dev
Hey everyone, Building on the previous thread regarding Catalogs in Beam, @Talat Uyarer and I noticed several areas where Beam SQL's usability could be significantly improved, particularly concerning its interaction with existing