Re: [ANNOUNCE] New PMC Member: Danny McCormick
Woohoo! Congrats :-) On Fri, Jan 10, 2025 at 6:35 PM Kenneth Knowles wrote: > Congrats! > > On Thu, Jan 9, 2025 at 10:15 AM Yi Hu via dev wrote: > >> Congrats, Danny! >> >> On Wed, Jan 8, 2025 at 8:40 PM Austin Bennett < >> whatwouldausti...@gmail.com> wrote: >> >>> Congrats and Thanks, Danny! >>> >>> On Fri, Dec 27, 2024 at 5:51 AM Ahmed Abualsaud via dev < >>> dev@beam.apache.org> wrote: >>> Well deserved! Thanks for all your hard work Danny On Fri, Dec 20, 2024 at 7:58 PM LDesire wrote: > Congratulations Danny! 😀
Beam High Priority Issue Report (35)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/33698 The finalize_release job is flaky https://github.com/apache/beam/issues/33569 [Task]: Remove Google Analytics from Beam Website https://github.com/apache/beam/issues/33425 [Bug]: beam_Publish_Beam_SDK_Snapshots and beam_PostCommit_Python_Arm are extremely flaky due to failing to build wheels https://github.com/apache/beam/issues/33407 [Bug]: tfrecordio does not work with snappy >= 0.7 https://github.com/apache/beam/issues/33220 The PreCommit Flink Container job is flaky https://github.com/apache/beam/issues/33064 The PostCommit Python ValidatesContainer Dataflow job is flaky https://github.com/apache/beam/issues/32997 [Bug]: Non Retained Messages missing after MqttIO.Read checkpoint restore https://github.com/apache/beam/issues/32949 The PostCommit Java ValidatesRunner Flink Java8 job is flaky https://github.com/apache/beam/issues/32509 [Bug]: Unable to Restart Google Spanner Change Streams Consumer due to tableExists(table_name) bug https://github.com/apache/beam/issues/32161 The Publish Beam SDK Snapshots job is flaky https://github.com/apache/beam/issues/32144 The PerformanceTests WordCountIT PythonVersions job is flaky https://github.com/apache/beam/issues/31846 The Clean Up GCP Resources job is flaky https://github.com/apache/beam/issues/31254 [Failing Test]: Onnx inference unit tests are failing. https://github.com/apache/beam/issues/30799 The PostCommit Python Dependency job is flaky https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava Dataflow job is flaky https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for large Kafka topic https://github.com/apache/beam/issues/29515 [Bug]: WriteToFiles in python leave few records in temp directory when writing to large number (100+) of files https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness doesn't update user counters in OnTimer callback functions https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader provided by apache beam does not pick the event time for watermarking https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/25946 [Task]: Support more Beam portable schema types as Python types https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should P1 Issues with no update in the last week: https://github.com/apache/beam/issues/31931 The IcebergIO Integration Tests job is flaky https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark Dataflow job is flaky https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance Tests job is flaky https://github.com/apache/beam/issues/30525 The PostCommit Python ValidatesContainer Dataflow With RC job is flaky https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch job is flaky https://github.com/apache/beam/issues/25975 [Bug]: KinesisIO processing-time watermarking can cause data loss
[Bug] Problem with subscription
Hi! I am trying to subscribe to both the mailing lists of users and developers. I have already sent several emails to both dev-subscr...@beam.apache.org and user-subscr...@beam.apache.org but I am never receiving anything that notifies me about the subscription neither the new emails on the mailing list. My email is this, ksobrena...@ks32.dev. Greetings - Enrique (ksobrenat32)
Re: [Bug] Problem with subscription
Hello team! I have been working with Enrique and I tried to help him figure out if there was something wrong with his subscribe requests to no avail. Anybody knows how to escalate/debug where his subscription request is failing/getting stuck? Best -P. On 2025/01/22 19:37:45 Enrique Calderon wrote: > Hi! > I am trying to subscribe to both the mailing lists of users and developers. I > have already sent several emails to both dev-subscr...@beam.apache.org and > user-subscr...@beam.apache.org but I am never receiving anything that > notifies me about the subscription neither the new emails on the mailing list. > My email is this, ksobrena...@ks32.dev. > Greetings > - Enrique (ksobrenat32)
Re: [Bug] Problem with subscription
I have checked everything and even tried again but with no success. I can also confirm it works because I am able to receive the confirmation email on another email account but not on this one. On Wednesday, January 22nd, 2025 at 3:32 PM, XQ Hu wrote: > Can you check your spam folder? I just tested it with my personal gmail and I > got the confirmation email and just replied to it. > > On Wed, Jan 22, 2025 at 4:31 PM Enrique Calderon wrote: > >> No, I haven't received any emails at all, including any subscription emails. >> Re-sending this because I did not reply to all >> On Wednesday, January 22nd, 2025 at 3:13 PM, XQ Hu wrote: >> >>> Did you get the email like confirm subscribe to u...@beam.apache.org to >>> confirm your subscription? >>> >>> On Wed, Jan 22, 2025 at 3:44 PM Robert Bradshaw via dev >>> wrote: >>> Welcome to the community, Enrique! I have no idea why the subscriptions aren't working, or how to debug this. Apache infra would probably have people who would be better at looking into this, as they run the mailing lists. On Wed, Jan 22, 2025 at 11:45 AM Pablo Estrada wrote: > > Hello team! > I have been working with Enrique and I tried to help him figure out if > there was something wrong with his subscribe requests to no avail. > Anybody knows how to escalate/debug where his subscription request is > failing/getting stuck? > Best > -P. > > On 2025/01/22 19:37:45 Enrique Calderon wrote: > > Hi! > > I am trying to subscribe to both the mailing lists of users and > > developers. I have already sent several emails to both > > dev-subscr...@beam.apache.org and user-subscr...@beam.apache.org but I > > am never receiving anything that notifies me about the subscription > > neither the new emails on the mailing list. > > My email is this, ksobrena...@ks32.dev. > > Greetings > > - Enrique (ksobrenat32)
Re: [Bug] Problem with subscription
Welcome to the community, Enrique! I have no idea why the subscriptions aren't working, or how to debug this. Apache infra would probably have people who would be better at looking into this, as they run the mailing lists. On Wed, Jan 22, 2025 at 11:45 AM Pablo Estrada wrote: > > Hello team! > I have been working with Enrique and I tried to help him figure out if there > was something wrong with his subscribe requests to no avail. Anybody knows > how to escalate/debug where his subscription request is failing/getting stuck? > Best > -P. > > On 2025/01/22 19:37:45 Enrique Calderon wrote: > > Hi! > > I am trying to subscribe to both the mailing lists of users and developers. > > I have already sent several emails to both dev-subscr...@beam.apache.org > > and user-subscr...@beam.apache.org but I am never receiving anything that > > notifies me about the subscription neither the new emails on the mailing > > list. > > My email is this, ksobrena...@ks32.dev. > > Greetings > > - Enrique (ksobrenat32)
Re: [Bug] Problem with subscription
cc this to the user list to test my recent subscription with my private gmail. On Wed, Jan 22, 2025 at 3:44 PM Robert Bradshaw via dev wrote: > Welcome to the community, Enrique! > > I have no idea why the subscriptions aren't working, or how to debug > this. Apache infra would probably have people who would be better at > looking into this, as they run the mailing lists. > > On Wed, Jan 22, 2025 at 11:45 AM Pablo Estrada wrote: > > > > Hello team! > > I have been working with Enrique and I tried to help him figure out if > there was something wrong with his subscribe requests to no avail. Anybody > knows how to escalate/debug where his subscription request is > failing/getting stuck? > > Best > > -P. > > > > On 2025/01/22 19:37:45 Enrique Calderon wrote: > > > Hi! > > > I am trying to subscribe to both the mailing lists of users and > developers. I have already sent several emails to both > dev-subscr...@beam.apache.org and user-subscr...@beam.apache.org but I am > never receiving anything that notifies me about the subscription neither > the new emails on the mailing list. > > > My email is this, ksobrena...@ks32.dev. > > > Greetings > > > - Enrique (ksobrenat32) >
Re: [Bug] Problem with subscription
Did you get the email like confirm subscribe to u...@beam.apache.org to confirm your subscription? On Wed, Jan 22, 2025 at 3:44 PM Robert Bradshaw via dev wrote: > Welcome to the community, Enrique! > > I have no idea why the subscriptions aren't working, or how to debug > this. Apache infra would probably have people who would be better at > looking into this, as they run the mailing lists. > > On Wed, Jan 22, 2025 at 11:45 AM Pablo Estrada wrote: > > > > Hello team! > > I have been working with Enrique and I tried to help him figure out if > there was something wrong with his subscribe requests to no avail. Anybody > knows how to escalate/debug where his subscription request is > failing/getting stuck? > > Best > > -P. > > > > On 2025/01/22 19:37:45 Enrique Calderon wrote: > > > Hi! > > > I am trying to subscribe to both the mailing lists of users and > developers. I have already sent several emails to both > dev-subscr...@beam.apache.org and user-subscr...@beam.apache.org but I am > never receiving anything that notifies me about the subscription neither > the new emails on the mailing list. > > > My email is this, ksobrena...@ks32.dev. > > > Greetings > > > - Enrique (ksobrenat32) >
Re: [Bug] Problem with subscription
Can you check your spam folder? I just tested it with my personal gmail and I got the confirmation email and just replied to it. On Wed, Jan 22, 2025 at 4:31 PM Enrique Calderon wrote: > No, I haven't received any emails at all, including any subscription > emails. > Re-sending this because I did not reply to all > On Wednesday, January 22nd, 2025 at 3:13 PM, XQ Hu > wrote: > > Did you get the email like confirm subscribe to u...@beam.apache.org to > confirm your subscription? > > On Wed, Jan 22, 2025 at 3:44 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> Welcome to the community, Enrique! >> >> I have no idea why the subscriptions aren't working, or how to debug >> this. Apache infra would probably have people who would be better at >> looking into this, as they run the mailing lists. >> >> On Wed, Jan 22, 2025 at 11:45 AM Pablo Estrada >> wrote: >> > >> > Hello team! >> > I have been working with Enrique and I tried to help him figure out if >> there was something wrong with his subscribe requests to no avail. Anybody >> knows how to escalate/debug where his subscription request is >> failing/getting stuck? >> > Best >> > -P. >> > >> > On 2025/01/22 19:37:45 Enrique Calderon wrote: >> > > Hi! >> > > I am trying to subscribe to both the mailing lists of users and >> developers. I have already sent several emails to both >> dev-subscr...@beam.apache.org and user-subscr...@beam.apache.org but I >> am never receiving anything that notifies me about the subscription neither >> the new emails on the mailing list. >> > > My email is this, ksobrena...@ks32.dev. >> > > Greetings >> > > - Enrique (ksobrenat32) >> > >
Re: Using resource hints or annotations for transform expansion
On Tue, Jan 21, 2025 at 4:51 PM Robert Bradshaw via dev wrote: > On Tue, Jan 21, 2025 at 7:26 AM Kenneth Knowles wrote: > > > > On Tue, Jan 21, 2025 at 2:35 AM Jan Lukavský wrote: > >> > >> > >> On 1/20/25 18:18, Kenneth Knowles wrote: > >> > >> This all sounds good. I will add my standard comment that this hint is > a property of the data, not the pipeline logic. So it is a different type > of hint than key invariance and fanout ratio). > >> > >> This is not a problem for the proposed approach, in my opinion. > Obviously, almost always there will be some pipeline code that is written > specifically for the data in mind. > >> > >> A couple other examples of "hints" that you can keep in mind are > Combine.withHotkeyFanout, Redistribute (both variants), and > GroupByKey.fewKeys. These were chosen to be expressed as transforms, even > though they are more like hints. I bring up these examples to say that we > don't have to be too pedantic here, because it is already too late :-) > >> > >> And anyhow a runner is always allowed to implement any piece of a > pipeline with anything that has the same "behavior", whether or not it is > expressed as a hint or some other way (that's the whole point of Beam, and > how we have fusion, combiner lifting, flatten unzipping, multiple runners, > etc). > >> > >> It is probably less appropriate for reusable transforms that are > expected to be used in more than one context. That can be up to > transform/pipeline authors. > >> > >> And to bring it back around and connect to the above: having an API > like GroupByKey.biggerThanMemory() as an API choice is just as fine with me > as GroupByKey.fewKeys() and it can just be a composite that adds the > hint/annotation to the primitive node. No need to combine API design with > model design / no need to force users to express things in terms of the > lowest level parts of the model. > >> > >> I was thinking about that as well. But there is a problem. The GBK is > often part of some other transform (e.g. FileIO, but can be any other). We > need a way to (optionally) change the behavior of a transform that is part > of some outer composite. Therefore this should work for > >> > >> FileIO.write(...).addAnnotation(GroupByKey.HUGE) > > > > It is a very good point that library transforms should probably not be > annotated but they do need to be adjusted when executed. FWIW this is also > why windowing strategy is on PCollection and automatically propagated. > > > > But also another good example: FileIO has GBKs that are small even if > the data incoming is huge. In the analogy with windowing strategy, the > library transform has to own the re-windowing / re-sizing. > > > > So maybe PCollection.addAnnotation(SizeEstimate.HUGE) could make more > sense. > > This doesn't solve the problem, as the operation you're trying to > modify may be entirely internal to the composite. (Unless we have > annotations that get attached to inputs and "follow" through like > windowing, with operations that can add/remove/modify these > annotations.) > > Being able to annotate a composite and having it apply (per runner > semantics) to all subtransforms doesn't seem too bad. If you really > need to have part of the transform be executed one way, and part > another, that feels like you need to break apart (re-implement) the > transform itself. > Or a transform (GBK in this case) should be able to perform a call to get the aggregated set of annotations of composites that surround it. That way you can just add the annotation to the outermost transform (FileIO in this case) and the GBK should be able to change the behavior based on that. - Cham > > > But then it starts to look like a lot of manual propagation of > annotations (if we don't make it default) or a lot of manually undoing > annotations (if we do make it default). > > And that too. >
Re: [Bug] Problem with subscription
Probably your email provider filters it? move dev to Bcc. On Wed, Jan 22, 2025 at 4:43 PM Enrique Calderon wrote: > I have checked everything and even tried again but with no success. I can > also confirm it works because I am able to receive the confirmation email > on another email account but not on this one. > On Wednesday, January 22nd, 2025 at 3:32 PM, XQ Hu > wrote: > > Can you check your spam folder? I just tested it with my personal gmail > and I got the confirmation email and just replied to it. > > On Wed, Jan 22, 2025 at 4:31 PM Enrique Calderon > wrote: > >> No, I haven't received any emails at all, including any subscription >> emails. >> Re-sending this because I did not reply to all >> On Wednesday, January 22nd, 2025 at 3:13 PM, XQ Hu >> wrote: >> >> Did you get the email like confirm subscribe to u...@beam.apache.org to >> confirm your subscription? >> >> On Wed, Jan 22, 2025 at 3:44 PM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> Welcome to the community, Enrique! >>> >>> I have no idea why the subscriptions aren't working, or how to debug >>> this. Apache infra would probably have people who would be better at >>> looking into this, as they run the mailing lists. >>> >>> On Wed, Jan 22, 2025 at 11:45 AM Pablo Estrada >>> wrote: >>> > >>> > Hello team! >>> > I have been working with Enrique and I tried to help him figure out if >>> there was something wrong with his subscribe requests to no avail. Anybody >>> knows how to escalate/debug where his subscription request is >>> failing/getting stuck? >>> > Best >>> > -P. >>> > >>> > On 2025/01/22 19:37:45 Enrique Calderon wrote: >>> > > Hi! >>> > > I am trying to subscribe to both the mailing lists of users and >>> developers. I have already sent several emails to both >>> dev-subscr...@beam.apache.org and user-subscr...@beam.apache.org but I >>> am never receiving anything that notifies me about the subscription neither >>> the new emails on the mailing list. >>> > > My email is this, ksobrena...@ks32.dev. >>> > > Greetings >>> > > - Enrique (ksobrenat32) >>> >> >> >
Re: [ANNOUNCE] New PMC Member: Danny McCormick
Congratulations Danny! On Wed, Jan 22, 2025 at 2:39 AM Reza Rokni via dev wrote: > Woohoo! Congrats :-) > > On Fri, Jan 10, 2025 at 6:35 PM Kenneth Knowles wrote: > >> Congrats! >> >> On Thu, Jan 9, 2025 at 10:15 AM Yi Hu via dev >> wrote: >> >>> Congrats, Danny! >>> >>> On Wed, Jan 8, 2025 at 8:40 PM Austin Bennett < >>> whatwouldausti...@gmail.com> wrote: >>> Congrats and Thanks, Danny! On Fri, Dec 27, 2024 at 5:51 AM Ahmed Abualsaud via dev < dev@beam.apache.org> wrote: > Well deserved! Thanks for all your hard work Danny > > On Fri, Dec 20, 2024 at 7:58 PM LDesire wrote: > >> Congratulations Danny! 😀 > >
Urgent: Action Required for Iceberg Production
Hi team, Happy New Year! I have a question regarding the "org.apache.beam:beam-sdks-java-io-iceberg:2.61.0". Does the org.apache.beam.sdk.schemas.Schema getDataSchema(String destination) of class DynamicDestinations already exist? Our team tried to use DynamicDestinations to decide which iceberg table to write in the runtime but failed. Attached is my code class DynamicIcebergDestinations. Our problem is that we don't have a universal schema that can match all events to be implemented for getDataSchema(). We need to use a parameter to get the correct schema for each event, so we want to use: getDataSchema(String destination). But seems getDataSchema(String destination) is not implemented for org.apache.beam.sdk.io.iceberg.DynamicDestinations. Since getDataSchema() will be automatically called and it cannot return a universal schema, there is no way for us to use IcebergIO.writeRows(icebergCatalogConfig).to(dynamicDestinations). I also tried to use managed I/O connector, but I don't think Managed.write support with DynamicDestinations. Could you give some urgent guidance? Thank you so much. Sincerely, Luyao DynamicIcebergDestinations.java Description: Binary data