Re: [VOTE][vendor-calcite] Vendored Dependencies Release

2025-07-25 Thread Valentyn Tymofieiev via dev
+1 On Fri, Jul 25, 2025 at 3:56 AM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > +1 > > Thanks, > Cham > > On Wed, Jul 23, 2025 at 4:05 PM Yi Hu via dev wrote: > >> Hi everyone, >> >> Please review and vote on the release candidate #1 for >> beam-vendor-calcite-1_40_0, version 0.1,

Re: [ANNOUNCE] New Committer: Shunping Huang

2025-06-10 Thread Valentyn Tymofieiev via dev
Well deserved! Congratulations, Shunping! On Tue, Jun 10, 2025 at 6:12 AM Shunping Huang wrote: > Thank you all! I am honored to be a committer! > > On Tue, Jun 10, 2025 at 8:55 AM Jan Lukavský wrote: > >> Congrats Shunping! >> On 6/10/25 13:49, Ahmed Abualsaud via dev wrote: >> >> Congrats Shu

Re: [Discuss] Breaking change to disable argument abbreviation in Beam Python

2025-05-14 Thread Valentyn Tymofieiev via dev
+1. I never knew about this feature until I noticed it was causing an inconvenience for Colab/interactive runner users and required workarounds in Beam. On Wed, May 14, 2025 at 10:59 AM Rakesh Kumar wrote: > +1, I have used the abbreviations only for the local test. It is good that > all my pro

Re: 2.65.0 Branch Cut

2025-04-30 Thread Valentyn Tymofieiev via dev
Thanks! Good luck with the release. On Wed, Apr 30, 2025 at 2:20 PM Yi Hu via dev wrote: > Hey everyone, > > The Beam 2.65.0 release branch has been cut! There are currently 0 open > issues on the milestone . > > I will now start working on stabilizin

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-30 Thread Valentyn Tymofieiev via dev
ckling errors or why their >> transform is throwing Name errors has been a very painful experience, >> especially since it usually happens with users first experiences >> > >> > On Tue, Apr 29, 2025, 6:14 PM Valentyn Tymofieiev via dev < >> dev@beam.apach

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-29 Thread Valentyn Tymofieiev via dev
There are several reasons: - wide adoption in data processing community , see initial discussion: [1] - expectations on cloudpickle having a larger number of maintainers and contributors. - new releases of dill had breaking changes[2], which made adoption of a new version challenging. - cloudpi

Re: Make cloudpickle the default library in Beam 2.65.0

2025-04-28 Thread Valentyn Tymofieiev via dev
Thanks Claude! Great to see a lot of progress on this effort. The dependency on an old version of dill has been a persistent painpoint for many users. Please call out this change in the release notes, so that customers can provide feedback and find instructions on how to unblock themselves. It c

Re: [VOTE] Release 2.64.0 release candidate #2

2025-03-28 Thread Valentyn Tymofieiev via dev
+1 (binding) Inspected Dataflow containers, ran a few Python pipelines and checked the release notes. On Fri, Mar 28, 2025 at 7:28 AM Yi Hu via dev wrote: > +1 (non-binding) > > Validated with GCP-IO load tests ( > https://github.com/apache/beam/tree/master/it/google-cloud-platform) on > Datafl

Re: [ANNOUNCE] New Committer: Vitaly Terentev

2025-03-24 Thread Valentyn Tymofieiev via dev
Congratulations and thanks, Vitaly! On Mon, Mar 24, 2025 at 9:57 AM Danny McCormick via dev wrote: > Congratulations Vitaly! Thanks for all the work you've done on Beam > infrastructure in particular! > > On Mon, Mar 24, 2025 at 12:10 PM Ahmet Altay via dev > wrote: > >> Congratulations Vitaly!

Re: 2.64.0 Branch Cut

2025-03-20 Thread Valentyn Tymofieiev via dev
Thanks for the update, XQ. Good luck on the release. On Thu, Mar 20, 2025 at 4:50 PM XQ Hu via dev wrote: > Hey everyone, > > The Beam 2.64.0 release branch has been cut! There are currently 0 open > > issues on the milestone . > > I will now start wo

Re: Confluence access

2025-02-06 Thread Valentyn Tymofieiev via dev
Done, thanks for making the improvements! On Tue, Feb 4, 2025 at 8:22 PM Derrick Williams via dev wrote: > > Hi folks, > > Can someone grant me access to the Confluence wiki so that I can make > improvements? > My username is derrickaw. > > Thanks > Derrick

Re: [VOTE] Release 2.62.0, release candidate #1

2025-01-21 Thread Valentyn Tymofieiev via dev
2.61.0 and our options are: (1) >>> leave it in 2.62.0 or (2) upgrade enough dependencies that it might be high >>> risk. >>> >>> If that is correct and there is no more information that comes up, I am >>> going to choose (1). >>> >>>

Re: [VOTE] Release 2.62.0, release candidate #1

2025-01-17 Thread Valentyn Tymofieiev via dev
e > issue existed before the current release. > > On Fri, Jan 17, 2025 at 6:50 AM XQ Hu via dev wrote: > >> I tested this with my mackbook and with a clean venv and pip install >> "apache-beam[gcp]==2.61.0". numpy==2.1.3 is installed. >> >> On F

Re: [VOTE] Release 2.62.0, release candidate #1

2025-01-17 Thread Valentyn Tymofieiev via dev
@Chamikara Jayalath Does this bug also happen in 2.61.0 ? I am seeing that the numpy 2.x upgrade was first introduced earlier in 2.61.0: https://github.com/apache/beam/commit/6129c9a56d52ebb060417cb397e0764cdd8791bc In this case the regression would be a preexisting known issue and according to

Re: [VOTE] Release 2.62.0, release candidate #1

2025-01-16 Thread Valentyn Tymofieiev via dev
Thanks Cham for flagging this. It sounds like an inconvenience at minimum, I vote -1 and suggest to rollback the numpy upgrade on the release branch. Alternative: we upgrade numpy in the SDK containers but that leaves us with releasing a configuration that had less time to be tested. On Thu, Jan 1

Re: [PROPOSAL] Implement Beam SDK harness initialization capability for Python

2024-12-23 Thread Valentyn Tymofieiev via dev
ed rather than just a module.) > >> > >> On Fri, Dec 13, 2024 at 12:52 PM Danny McCormick via dev > >> wrote: > >> > > >> > Thanks - I actually was thinking about this today and was annoyed > that we don't have this ability. I'm +1 t

Re: [ANNOUNCE] New PMC Member: Danny McCormick

2024-12-20 Thread Valentyn Tymofieiev via dev
So well deserved!! Congratulations, Danny! On Fri, Dec 20, 2024, 19:56 Robert Bradshaw via dev wrote: > Hi all, > > Please join me and the rest of the Beam PMC in welcoming Danny > McCormick as the newest member of the PMC. > > Danny has been contributing to Beam for several years now, most

Next steps in moving Beam to cloudpickle

2024-12-18 Thread Valentyn Tymofieiev via dev
Recently there has been an increase in requests from Beam users highlighting the inconvenience of Beam's dependency on an old version of dill. Some time ago we started a project to support cloudpickle as an alternative pickler, aiming to switch to cloudpickle completely. I put together a tentativ

[PROPOSAL] Implement Beam SDK harness initialization capability for Python

2024-12-13 Thread Valentyn Tymofieiev via dev
Hi everyone, Currently we don't have a straightforward and documented way to do simple initialization steps on every Beam Python SDK worker before data processing starts. It is a rough edge that I've encountered on several occasions myself and in conversations with Beam users I put together some

Re: [VOTE] Release 2.60.0, release candidate #2

2024-10-15 Thread Valentyn Tymofieiev via dev
+1 (binding), checked dataflow containers and ran a few Python pipelines. On Tue, Oct 15, 2024 at 4:22 PM Ahmet Altay via dev wrote: > +1 (binding) > > On Tue, Oct 15, 2024 at 4:13 PM Danny McCormick via dev < > dev@beam.apache.org> wrote: > >> +1 (non-binding). Ran some ML examples against the

Re: Building dev docker image

2024-09-19 Thread Valentyn Tymofieiev via dev
On Thu, Sep 19, 2024 at 4:06 PM Robert Bradshaw via dev wrote: > > > On Thu, Sep 19, 2024 at 3:43 PM Joey Tran > wrote: > >> Ah okay then. I commented out the goavro line and the image does finish >> building. It seems that the python still has be built which takes a bit >> (using ` pip install

Re: Building dev docker image

2024-09-19 Thread Valentyn Tymofieiev via dev
or alternatively, commenting-out the cython dep in pyproject.toml might remove cythonization too. On Thu, Sep 19, 2024 at 4:16 PM Joey Tran wrote: > cython isn't installed in the virtualenv. Running `pip uninstall cython` > resulted in: > `WARNING: Skipping cython as it is not installed.` > > On

Re: [VOTE] Release 2.59.0, release candidate #1

2024-09-10 Thread Valentyn Tymofieiev via dev
Burke wrote: >> > Point taken. Added a comment. Largely these are long flaky or perma red >> suites. >> > >> > https://github.com/apache/beam/pull/32284#issuecomment-2319127125 >> > >> > On 2024/08/29 21:16:12 Valentyn Tymofieiev via dev wrote: >

Re: [VOTE] Release 2.59.0, release candidate #1

2024-08-29 Thread Valentyn Tymofieiev via dev
Thanks! Already been using this release for some ongoing experimentation, no issues observed. Passed feedback offline on Dataflow container release, but that is not release-blocking and can be taken care of in parallel. > * PR to run tests against release branch [12]. Could you please comment on

Re: Sunsetting Beam Python 3.8 Support

2024-08-26 Thread Valentyn Tymofieiev via dev
On Mon, Aug 26, 2024 at 11:57 AM Robert Bradshaw wrote: > On Mon, Aug 26, 2024 at 11:22 AM Valentyn Tymofieiev via dev > wrote: > > > > Interesting findings. When researching Dataflow Python usage with > internal telemetry, I see that Python 3.11 has slightly more usage th

Re: Sunsetting Beam Python 3.8 Support

2024-08-26 Thread Valentyn Tymofieiev via dev
Interesting findings. When researching Dataflow Python usage with internal telemetry, I see that Python 3.11 has slightly more usage than Python 3.8. When I exclude Dev SDKs (this might also exclude some Google-internal users who use bleeding-edge SDKs), Python 3.8 reaches to the top. If I exclude

Re: [DISCUSS] Beam 3.0: Paving the Path to the Next Generation Data Processing Framework

2024-08-22 Thread Valentyn Tymofieiev via dev
> Key to this will be a push to producing/consuming structured data (as has been mentioned) and also well-structured, language-agnostic configuration. > Unstructured data (aka "everything is bytes with coders") is overrated and should be an exception not the default. Structured data everywhere, w

Re: [VOTE] Release 2.58.1, release candidate #1

2024-08-16 Thread Valentyn Tymofieiev via dev
+1. Verified the diff content between two RCs. On Fri, Aug 16, 2024 at 8:42 AM Robert Burke wrote: > +1 (binding) > > Validated the linux-amd64 prism binary with a few pipelines. > > On 2024/08/16 00:25:58 Danny McCormick via dev wrote: > > Hi everyone, > > Please review and vote on the patch r

Re: [VOTE] Release 2.58.0, release candidate #2

2024-08-05 Thread Valentyn Tymofieiev via dev
+1. Verified that a cherry-pick I made actually made the difference in the new release. On Mon, Aug 5, 2024 at 2:56 PM Robert Burke wrote: > +1 (Binding) > > Once again validated the linux-amd prism binary against the java and > python validates runner tests. > > Asie: it is nice to see that the

Re: [VOTE] Release 2.58.0, release candidate #1

2024-07-23 Thread Valentyn Tymofieiev via dev
+1 (binding), checked that Dataflow containers have been released, checked release notes, spot-checked some Python test suites and ran a pipeline on Python 3.12. On Tue, Jul 23, 2024 at 6:20 AM Jack McCluskey via dev wrote: > Hey everyone, > > Validation has had positive results so far; however,

Re: Beam Example Bugs

2024-07-03 Thread Valentyn Tymofieiev via dev
Thanks for flagging this Joey, I reopened https://github.com/apache/beam/issues/31624. We can certainly validate any aspect of Beam during the release. I think it should be possible to detect issues like this in a website/playground test suite, and include that suite in the list of postcommit suit

Re: [VOTE] Release 2.57.0, release candidate #1

2024-06-25 Thread Valentyn Tymofieiev via dev
+1. On Tue, Jun 25, 2024 at 12:18 PM Kenneth Knowles wrote: > +1 (binding) > > I will continue to wait until 3 work days to conclude the vote, for plenty > of validation time. > > On Mon, Jun 24, 2024 at 8:38 PM Yi Hu via dev wrote: > >> +1 (non-binding) >> >> Validated DataflowTemplates integr

Re: [ANNOUNCE] New Committer: XQ Hu

2024-06-24 Thread Valentyn Tymofieiev via dev
Congratulations and thank you for all your contributions to Beam! On Mon, Jun 24, 2024 at 1:49 PM Robert Burke wrote: > Congratulations XQ! > > On Mon, Jun 24, 2024, 1:43 PM Svetak Sundhar via dev > wrote: > >> Congrats XQ! >> >> >> Svetak Sundhar >> >> Data Engineer >> s vetaksund...@google.

Re: [VOTE] Release 2.57.0, release candidate #1

2024-06-24 Thread Valentyn Tymofieiev via dev
Ran a Python 3.12 pipeline on Dataflow without issues, noted a suboptimal dependency resolution: https://github.com/apache/beam/issues/31676, verified that this is not a regression in 2.57.0, will follow up separately. https://github.com/apache/beam/pull/31513 has several failures, can you please

Re: [VOTE] Release 2.56.0, release candidate #2

2024-05-01 Thread Valentyn Tymofieiev via dev
't >> helpful so I'll try to capture the actual job which failed. >> >> Thanks, >> Danny >> >> On Wed, May 1, 2024 at 1:05 PM Valentyn Tymofieiev via dev < >> dev@beam.apache.org> wrote: >> >>> What is the high-level summary of

Re: [VOTE] Release 2.56.0, release candidate #2

2024-05-01 Thread Valentyn Tymofieiev via dev
What is the high-level summary of test failures on https://github.com/apache/beam/pull/31038 - are all issues preexisting/infra-related/already tracked? In particular, I noticed two failures in the Java JPMS test suite, which I hadn't come across before. On Wed, May 1, 2024 at 8:16 AM Ritesh Ghor

Re: Structured Logging in python

2024-04-15 Thread Valentyn Tymofieiev via dev
Thanks! I opened https://github.com/apache/beam/issues/30978 . Feel free to self-assign when you have time to work on it. On Mon, Apr 15, 2024 at 1:43 PM Geddy Schellevis wrote: > The “custom_data” field didn’t work. > I am Happy with helping with the implementation of this. > > Op ma 15 apr 20

Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Valentyn Tymofieiev via dev
;t verified that. > > Best, > Joey > > On Thu, Apr 11, 2024 at 3:52 PM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> I took a look and mentioned the PR to a few folks. Couple of thoughts: >> - We should avoid Beam adding a high-level functio

Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Valentyn Tymofieiev via dev
I took a look and mentioned the PR to a few folks. Couple of thoughts: - We should avoid Beam adding a high-level functionality only for Batch. - Supporting Windowing/Triggers would likely be non-trivial and worth considering early in the design. - If you'd like to continue working on this, I would

Re: tox issues in dev container

2024-04-05 Thread Valentyn Tymofieiev via dev
Could you please provide more info about how you create your environment? Also what OS do you use? On Fri, Apr 5, 2024 at 2:08 PM Joey Tran wrote: > Yeah that was the tox command I was running > > On Fri, Apr 5, 2024, 4:37 PM XQ Hu via dev wrote: > >> >> https://cwiki.apache.org/confluence/disp

Re: [VOTE] Patch Release 2.55.1, release candidate #2

2024-04-03 Thread Valentyn Tymofieiev via dev
Hi Danny, Thanks for volunteering to do this patch release. For review convenience, this is the diff: - Diff of release branches: https://github.com/apache/beam/compare/release-2.55.0...release-2.55.1 - The diff of tags v2.55.0-RC3 and v2.55.1-RC2: https://github.com/apache/beam/compare/v2.55

Re: Patch release proposal

2024-03-28 Thread Valentyn Tymofieiev via dev
If we do a patch release for Python SDK, let's also patch another known issue for which fix is available: https://github.com/apache/beam/blob/master/CHANGES.md#known-issues-1 On Thu, Mar 28, 2024 at 8:01 AM Yi Hu via dev wrote: > 2.55.0 release manager here > > The patch itself [1] is trivial, h

Re: [VOTE] Release 2.55.0, release candidate #3

2024-03-22 Thread Valentyn Tymofieiev via dev
+1 (binding). Checked some of the released artifacts, release blog, and ran a couple Python pipelines on Dataflow. > * GitHub Release notes [1] Is the link correct? It points to the milestone. On Fri, Mar 22, 2024 at 1:10 PM Yi Hu via dev wrote: > +1 (non-binding) > > 1. Checked published Jav

Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
x27;d be quite surprising if beam.Flatten would become >>>>> equivalent to FlatMap if passed only a single pcollection. One use case >>>>> that would be broken from that is cases where someone might be flattening >>>>> a >>>>> variable num

Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
On Thu, Mar 21, 2024 at 4:36 PM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> One possible alternative is to define beam.Flatten for a single >> collection to be functionally equivalent to beam.FlatMap(lambda x: x), but >> that would be a larger chang

Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
Thu, Mar 21, 2024 at 12:02 PM Joey Tran > wrote: > >> That's not really the same thing, is it? `beam.Flatten` combines two or >> more pcollections into a single pcollection while beam.FlatMap unpacks >> iterables of elements (i.e. PCollection> -> PCollection) >>

Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
Actually, disregard that, Flatten is used in a different context to flatten multiple collections. On Thu, Mar 21, 2024 at 11:55 AM Valentyn Tymofieiev wrote: > Hi, you can use beam.Flatten() instead. > > On Thu, Mar 21, 2024 at 10:55 AM Joey Tran > wrote: > >> Hey all, >> >> Using an identity f

Re: Python API: FlatMap default -> lambda x:x?

2024-03-21 Thread Valentyn Tymofieiev via dev
Hi, you can use beam.Flatten() instead. On Thu, Mar 21, 2024 at 10:55 AM Joey Tran wrote: > Hey all, > > Using an identity function for FlatMap comes up more often than using > FlatMap without an identity function. Would it make sense to use the > identity function as a default? > > > >

Re: Update confluent dependencies version in kafka io

2024-03-11 Thread Valentyn Tymofieiev via dev
Welcome to dev@ Maciej. I think as long as an upgrade doesn't cause breaking changes for the users, there shouldn't be any concerns. Having a dependency on a 5 yr old library on the other hand is a concern. For Python SDK, we try to upgrade to new major versions within a year after they are relea

Re: Issue building python SDK with M2 Mac

2024-03-08 Thread Valentyn Tymofieiev via dev
it sounds like the the error might be happening during building of python wheels, it seems that `-arch ` parameter is not being correctly evaluated for your platform and is omitted. I am not sure what is causing this. I am also not sure what dependency generates that command line (distutils/setupto

Re: [VOTE] Vendored Dependencies Release

2024-02-14 Thread Valentyn Tymofieiev via dev
+1 (binding) On Wed, Feb 14, 2024 at 7:52 AM Kenneth Knowles wrote: > +1 (binding) > > On Wed, Feb 14, 2024 at 10:48 AM Robert Burke wrote: > >> +1 (binding) >> >> On Wed, Feb 14, 2024, 7:35 AM Yi Hu via dev wrote: >> >>> +1 (non-binding) >>> >>> checked artifact packages not leaking namespace

Re: [ANNOUNCE] New Committer: Svetak Sundhar

2024-02-12 Thread Valentyn Tymofieiev via dev
Congrats, Svetak! On Mon, Feb 12, 2024 at 11:20 AM Kenneth Knowles wrote: > Hi all, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Svetak Sundhar (sve...@apache.org). > > Svetak has been with Beam since 2021. Svetak has contributed code to many > areas of Beam, i

Re: [VOTE] Release 2.54.0, release candidate #2

2024-02-09 Thread Valentyn Tymofieiev via dev
+1. Checked postcommit test results for Python SDK, and exercised a couple of Datadow scenarios. On Thu, Feb 8, 2024, 14:07 Svetak Sundhar via dev wrote: > +1 (Non-Binding) > > Tested with Python SDK on DirectRunner and Dataflow Runner > > > Svetak Sundhar > > Data Engineer > s vetaksund...@g

Re: [PROPOSAL] Re-release vendor grpc

2024-02-07 Thread Valentyn Tymofieiev via dev
On Wed, Feb 7, 2024 at 1:34 AM Sam Whittle via dev wrote: > Related to this, could a PMC member add my key to > https://dist.apache.org/repos/dist/release/beam/KEYS? > Done, thanks. > I've appended it to https://dist.apache.org/repos/dist/dev/beam/KEYS > Thanks! > Sam > > On Wed, Feb 7, 2024 at

Fwd: Community over Code EU 2024 Travel Assistance Applications now open!

2024-01-26 Thread Valentyn Tymofieiev via dev
FYI. -- Forwarded message - The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code EU 2024 are now open! TAC will be supporting Community over Code EU, Bratislava, Slovakia, June 3th - 5th, 2024. TAC exists to he

Re: Google Artifact Registry detects critical vuln CVE-2023-45853 in beam dataflow

2024-01-24 Thread Valentyn Tymofieiev via dev
> Does the beam project generally attempt to address as many of these vulnerabilities? Beam does not retroactively patch released container images, but we use the latest available docker base images during each Beam release. Many vulnerabilities concern software packages preinstalled in the Docker

Re: Hiding logging for beam playground examples

2023-11-15 Thread Valentyn Tymofieiev via dev
t 0x7ff2664c9870> for environment > ref_Environment_default_environment_2 (beam:env:embedded_python:v1, b'') > ``` > > The example code itself doesn't set the log level in some playground code. > Does anyone have a pointer to where? I'm not familiar > > On Wed, Nov 15, 2023 at

Re: Hiding logging for beam playground examples

2023-11-15 Thread Valentyn Tymofieiev via dev
Are the examples using LogElements? https://github.com/apache/beam/blob/2012107a0fa2bb3fedf1b5aedcb49445534b2dad/sdks/python/apache_beam/transforms/util.py#L1271 Note that LogElements by default prints to stdout, but can be configured to use a different logger. We could also change the default. O

Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-14 Thread Valentyn Tymofieiev via dev
+1 (binding). Tested Python SDK on a batch and a streaming pipeline. Verified that the memory leak[1] is no longer happening and pyarrow hotfix is applied. Sent an update to CHANGES.MD to call out both. Thanks for doing the release and patience with all the RCs. [1] https://github.com/apache/bea

Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-10 Thread Valentyn Tymofieiev via dev
As mentioned in another thread [1], there is a recently detected vulnerability in pyarrow [2]. It appears to be a concern for Beam users that we can mitigate in the upcoming release. We can reassess early next week in case there is a revised assessment for severity for this vulnerability. In the

Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Valentyn Tymofieiev via dev
>From https://pypi.org/project/pyarrow-hotfix/ : pyarrow_hotfix must be imported in your application or library code for it to take effect. Just installing the package is not sufficient: For Beam users, that means that the pipeline code running on the workers would need to import this module on

Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Valentyn Tymofieiev via dev
Hi Piotr, thanks for bringing this to the list. There is a FR to support pyarrow https://github.com/apache/beam/issues/28410 . I looked into it briefly in https://github.com/apache/beam/pull/28437 but saw some test failures and it has been on back burner. Given the news about vulnerability it woul

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-06 Thread Valentyn Tymofieiev via dev
> PR to run tests against release branch [12]. https://github.com/apache/beam/pull/28663 is closed and test signal is no longer available. did all the tests pass? On Fri, Oct 6, 2023 at 5:32 AM Alexey Romanenko wrote: > +1 (binding) > > — > Alexey > > > On 5 Oct 2023, at 18:38, Jean-Baptiste O

Re: [LAZY CONSENSUS] Create separate repository for Swift SDK

2023-09-25 Thread Valentyn Tymofieiev via dev
On Mon, Sep 25, 2023 at 9:03 AM Kenneth Knowles wrote: > Hi all, > > I propose to unblock Byron's work by creating a new repository for the > Beam Swift SDK. This will be the first of its kind, and break from > tradition of having Beam be kind of a mini-mono-repo. > > Discussion of the Swift SDK

Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-30 Thread Valentyn Tymofieiev via dev
We have identified the leak. https://github.com/apache/beam/issues/28246 has the details and workarounds. On Mon, Aug 28, 2023 at 9:57 AM Valentyn Tymofieiev wrote: > This appears to be a recent issue reported also by others (e.g. > https://github.com/apache/beam/issues/28142), it's being active

Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-28 Thread Valentyn Tymofieiev via dev
This appears to be a recent issue reported also by others (e.g. https://github.com/apache/beam/issues/28142), it's being actively investigated. Therefore, it is unlikely that memory fragmentation is an issue. On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev wrote: > Hi, thanks for reaching ou

Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-25 Thread Valentyn Tymofieiev via dev
+1 Verified that the issue detected in RC0 has been resolved. Successfully ran a Python pipeline on ARM Dataflow workers. Noted that Dataflow runner logs became less verbose as the result of https://github.com/apache/beam/pull/27788. One line that I often pay attention to no longer appears at the

Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-22 Thread Valentyn Tymofieiev via dev
Hi, thanks for reaching out. I'd be curious to see whether the memory consumption patterns you observe change if you switch the memory allocator library. For example, you could try to use a custom container, install jemalloc and enable it. See: https://beam.apache.org/documentation/runtime/enviro

Re: [VOTE] Release 2.50.0, release candidate #1

2023-08-21 Thread Valentyn Tymofieiev via dev
I tried running a Dataflow Python pipeline on RC1 and got an error: Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versi

Re: [RFC] Bootloader Buffered Logging

2023-08-16 Thread Valentyn Tymofieiev via dev
Thanks, Jack! left some comments, looking forward to this work! On Wed, Aug 16, 2023 at 10:31 AM Robert Burke wrote: > I've added some comments but generally +1 on this. > > A later change might be able to build from this to ensure the various > STDErr and STDOut logs from the SDK harness execut

Re: [RFC] Model Per Key RunInference

2023-07-27 Thread Valentyn Tymofieiev via dev
Thanks Danny! The narrative is well structured and easy to follow. I encourage more folks to take a look. I left a couple of comments, mostly about plans for memory management. On Thu, Jul 20, 2023 at 7:47 AM Danny McCormick via dev wrote: > Hey everyone! Today, many users have pipelines that ch

Re: [Feature Proposal] Add ARM Support to Beam SDK Container Images

2023-07-18 Thread Valentyn Tymofieiev via dev
Hi Celeste, Thanks for the proposal and researching the options. Using multi-arch images seems like a good way to reduce the complexity associated with correctly selecting the architecture on the runner. It sounds like there may be implications for release process, which future release managers m

Re: [VOTE] Release 2.49.0, release candidate #2

2023-07-14 Thread Valentyn Tymofieiev via dev
+1. Tested a few python pipelines on Dataflow Runner V1 and Runner V2. On Thu, Jul 13, 2023 at 12:54 PM Svetak Sundhar via dev wrote: > +1 (Non-Binding) > > Python quickstart Dataflow runner. > > > Svetak Sundhar > > Data Engineer > s vetaksund...@google.com > > > > On Thu, Jul 13, 2023 at 5

Re: Best patterns for a polling transform

2023-06-22 Thread Valentyn Tymofieiev via dev
> The below code runs fine with a single worker but with multiple workers there are duplicate values. > I’m using TimeDomain.WATERMARK here due to it simply not working when using REAL_TIME. The docs seem to suggest REAL_TIME would be the way to do this, however there seems to be no guarantee that

Re: [VOTE] Release 2.47.0, release candidate #3

2023-05-10 Thread Valentyn Tymofieiev via dev
+1. Checked Python streaming wordcount, Dataflow containers and some test results running on RC that I care aboutt. On Wed, May 10, 2023 at 3:22 PM Ritesh Ghorse via dev wrote: > +1 (non-binding) > > Validated Go SDK Quickstart on Direct and Dataflow runner > > On Wed, May 10, 2023 at 4:23 AM J

Re: [PROPOSAL] Preparing for 2.48.0 Release

2023-05-09 Thread Valentyn Tymofieiev via dev
> Absent a compelling reason otherwise, my view would be to just stick with the statement of dropping it as soon as it goes out of support This is the process we agreed upon last time we discussed the version support policy on dev@. On Fri, May 5, 2023 at 6:18 PM Robert Bradshaw via dev wrote:

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2023-05-02 Thread Valentyn Tymofieiev via dev
Hi All, just wanted to give a quick update on the effort discussed here: The action items from the retrospective are tracked in https://github.com/apache/beam/issues/25652. Many outdated dependencies were updated in https://github.com/apache/beam/pull/24599 by +Anand Inguva and remaining older

Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-26 Thread Valentyn Tymofieiev via dev
Thanks, Jack! re [12]: I am seeing some test errors - have they been investigated? Also, did all test suites run? I think I am not seeing output of some of the suites, like Run Python Dataflow V2 ValidatesRunner On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev wrote: > Hi everyone, >

Re: [ANNOUNCE] New committer: Anand Inguva

2023-04-21 Thread Valentyn Tymofieiev via dev
Congratulations! On Fri, Apr 21, 2023 at 8:19 PM Jan Lukavský wrote: > Congrats Anand! > On 4/21/23 20:05, Robert Burke wrote: > > Congratulations Anand! > > On Fri, Apr 21, 2023, 10:55 AM Danny McCormick via dev < > dev@beam.apache.org> wrote: > >> Woohoo, congrats Anand! This is very well dese

Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Valentyn Tymofieiev via dev
I think case in point dependency that would benefit from this testing is grpcio, which includes pre-releases, and broke us and multiple of it's released versions were yanked. https://pypi.org/project/grpcio/#history . We can look at how grpcio affected Beam previously. Couple of issues: - https:/

Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Valentyn Tymofieiev via dev
2. Make use of the current PreCommit and PostCommit test suite and modify it so that it installs pre-released dependencies. > Leads to noisy test signals if the pre-release candidate is unstable. I am favor of option 2 since it's a simple solution that is easy to implement and try out. The disadv

Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-07 Thread Valentyn Tymofieiev via dev
gt;>> Seems like this was a revert of a previous commit that was also not >>> included in the 2.46.0 release branch ( >>> https://github.com/apache/beam/pull/25627) ? >>> >>> If so we might not need a new RC but good to confirm. >>> >>> Tha

Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-03 Thread Valentyn Tymofieiev via dev
I have encountered a failure in a Python pipeline running with Runner v1: RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow Python worker version 2.45.0. Please check Dataflow worker startup logs and make sure that correct version of Beam SDK is installed. We should understand wh

Dependabot questions

2023-02-27 Thread Valentyn Tymofieiev via dev
I noticed that human-readable dependency reports are not being generated. Can this functionality be replaced with Dependabot? Does Dependabot provide a view of what is currently outdated from its standpoint? Also, I noticed that some dependencies are outdated, yet not updated by Dependabot. Possi

Re: Beam Release 2.46

2023-02-23 Thread Valentyn Tymofieiev via dev
Thanks for the update! I'd like to suggest that we include in the release voting email template a link to a PR that runs all tests against the release branch. I think we used to include it, but I haven't seen it in recent voting threads. Thanks, Valentyn On Thu, Feb 23, 2023 at 9:28 AM Danny McC

Re: Python 3.11 support in Apache Beam

2023-02-21 Thread Valentyn Tymofieiev via dev
Thanks a lot Anand. I'll take a look at the PRs. On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva wrote: > I was able to spin up a PR: https://github.com/apache/beam/pull/24599 > that updates the build dependencies of Apache Beam. > > Several GCP dependencies needed to be updated as well. I covered

Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Valentyn Tymofieiev via dev
On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva wrote: > Yes, it is related to protobuf only. But I think the update of these > dependencies are required for Python 3.11 since the newer versions have > support for Python 3.11 wheels. > Assuming you refer to protobuf. Yes, there are no wheels for 3.10

Re: Python 3.11 support in Apache Beam

2023-02-07 Thread Valentyn Tymofieiev via dev
Hi Anand, On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev wrote: > Hi all, > > We are planning to work on adding support for Python 3.11[1] to Apache > Beam Python SDK. > > As part of this effort, we are going to update the python build > dependencies defined at [2]. > > Right now, there is

Re: Subscribe

2023-01-24 Thread Valentyn Tymofieiev via dev
Hello Alan, To subscribe to the list, you should send an email to dev-subscr...@beam.apache.org instead. Best, Valentyn On Tue, Jan 24, 2023 at 5:19 PM Alan Zhang via dev wrote: >

Re: [VOTE] Release 2.44.0, release candidate #1

2023-01-11 Thread Valentyn Tymofieiev via dev
+1. I validated that Dataflow and Beam Python containers include necessary dependencies of Apache Beam and did additional validation (see inline). On Wed, Jan 11, 2023 at 12:48 AM Ahmet Altay wrote: > I validated python quick starts (direct, dataflow) X (batch, streaming). I > ran into an issue

Re: [VOTE] Release 2.43.0, release candidate #1

2022-11-10 Thread Valentyn Tymofieiev via dev
-1. It looks like the format of Python wheels has changed. We should update the stager code and python container entrypoint code, otherwise we will have a 2 min pipeline start time regression on some runners. Opened https://github.com/apache/beam/issues/24110 On Thu, Nov 10, 2022 at 11:10 AM Chami

Re: [ANNOUNCE] New committer: Yi Hu

2022-11-09 Thread Valentyn Tymofieiev via dev
I am with the Beam PMC on this, congratulations and very well deserved, Yi! On Wed, Nov 9, 2022 at 11:08 AM Byron Ellis via dev wrote: > Congratulations! > > On Wed, Nov 9, 2022 at 11:00 AM Pablo Estrada via dev > wrote: > >> +1 thanks Yi : D >> >> On Wed, Nov 9, 2022 at 10:47 AM Danny McCormic

Re: github reviewer help / tips

2022-11-08 Thread Valentyn Tymofieiev via dev
I use Notifier for Github Chrome extension. On Tue, Nov 8, 2022 at 10:29 AM Sachin Agarwal via dev wrote: > Hey folks, > > I've found myself repeatedly being very untimely in providing reviews

Re: Pipleline portable proto visualizaiton

2022-11-07 Thread Valentyn Tymofieiev via dev
s a stand alone tool, for someone motivated > enough. > > > > On Mon, Nov 7, 2022, 9:19 AM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> > >> I'd like to visualize a DAG for a Beam portable pipeline, from a .pb > file or a textprot

Pipleline portable proto visualizaiton

2022-11-07 Thread Valentyn Tymofieiev via dev
I'd like to visualize a DAG for a Beam portable pipeline, from a .pb file or a textproto representation. Is some runner's UI readily available to make it possible (without executing the job)? I was thinking perhaps Apache Hop integration (if we have one) might be able to do that. If not, it shoul

Re: [VOTE] Release 2.42.0, release candidate #2

2022-10-14 Thread Valentyn Tymofieiev via dev
+1 based on prior validation i did and the RC1-RC2 Delta . On Fri, Oct 14, 2022 at 10:22 AM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > +1 (binding) > > Thanks, > Cham > > On Fri, Oct 14, 2022 at 5:43 AM Alexey Romanenko > wrote: > >> +1 (binding) >> >> Tested with https://github

Re: [VOTE] Release 2.42.0, release candidate #1

2022-10-03 Thread Valentyn Tymofieiev via dev
I validated that Dataflow and Beam Python containers have dependencies that match Beam requirements. I came across https://github.com/apache/beam/pull/23200 - there are failed tests and I don't see test results for Python PostCommit suites. Do you know what's the status of both? Minor nits: missi

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-25 Thread Valentyn Tymofieiev via dev
nts. > > I will watch it here and be happy to spend quite some time on helping to > hash it out. > > BTW. You can also watch my talk I gave last year at PyWaw about "Managing > Python dependencies at Scale" > https://www.youtube.com/watch?v=_SjMdQLP30s&t=2549s w

[DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-23 Thread Valentyn Tymofieiev via dev
Hi everyone, Recently, several issues [1-3] have highlighted outage risks and developer inconveniences due to dependency management practices in Beam Python. With dependabot and other tooling that we have integrated with Beam, one of the missing pieces seems to be having a clear guideline of h