Re: [External Sender] Re: [Question] AWS Credentials Serializability

2023-09-27 Thread Moritz Mack
of these schemes. > > Thanks! > Ramya > > On Tue, Sep 26, 2023 at 1:40 AM Moritz Mack wrote: > >> Actually, I doubt a provider chain can solve your problem. it will >> always return the credentials of the first provider in the chain that can >> provide some rega

Re: [External Sender] Re: [Question] AWS Credentials Serializability

2023-09-25 Thread Moritz Mack
ws ProviderChain (which > is *not* supported by the Javadocs), is there any other solution that > currently exists within beam that I can leverage? Or is the only option for > me to contribute to the open source and write my own custom implementation? > > Thanks, > Ramya > &

Re: [Question] AWS Credentials Serializability

2023-09-25 Thread Moritz Mack
Hi Ramya, unfortunately only a subset of AWS credential providers is supported. Additionally, programmatic configuration is discouraged as there are options that are hard to support and there's no decent way to validate. Please have a look at the Javadocs to see what is supported: https://beam.ap

Re: Re:

2023-07-20 Thread Moritz Mack
Hi Jon, I just want to check in here briefly, are you still looking for support on this? Sadly yes, this totally lacks documentation and isn’t straight forward to set up. /Moritz On 21.06.23, 23:47, "Jon Molle via user" wrote: Hi Pavel, Thanks for your response! I took a look at running Beam

Re: Passing "conf" arguments using a portable runner in Java (spark job runner)

2023-07-20 Thread Moritz Mack
Hi Jon, sorry for the late replay. A while ago I was struggling with as well. Unfortunately, there’s no direct way to do this per pipeline. However, you can set default arguments by passing them to the job service container using the environment variable _JAVA_OPTIONS. I hope this still helps!

Re: Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

2023-05-22 Thread Moritz Mack
ache/beam/releases/tag/v2.47.0<https://urldefense.com/v3/__https:/github.com/apache/beam/releases/tag/v2.47.0__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yjitVjHQ$> Any knowledge in this would be appreciated. Thanks Sachin On Mon, Sep 5, 2022 at 12:

Re: [java] Trouble with gradle and using ParquetIO

2023-04-20 Thread Moritz Mack
Hi Evan, Not sure why maven suggests using “compileOnly”. That’s certainly wrong, make sure to use “implementation” in your case. Cheers, Moritz On 21.04.23, 01:52, "Evan Galpin" wrote: Hi all, I'm trying to make use of ParquetIO.  Based on what's documented in maven central, I'm including t

[Announcement] Planned removal of Spark 2 runner support in 2.46.0

2023-01-23 Thread Moritz Mack
Dear All, The runner for Spark 2 was deprecated quite a while back in August 2022 with the release of Beam 2.41.0 [1]. We’re planning to move ahead with this and finally remove support for Spark 2 (beam-runners-spark) to only maintain support for Spark 3 (beam-runners-spark-3) going forward. N

Re: Reading from AWS Kinesis Stream Cross account

2022-11-15 Thread Moritz Mack
Hi Sankar, First, as Alexey pointed out, please try and migrate to the Beam AWS SDK v2 as soon as possible. The SDK v1 (including the Kinesis module) has been long deprecated and will be removed some time soon. The AWS API doesn’t support cross-account access for Kinesis using an ARN. This is

Re: Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

2022-09-05 Thread Moritz Mack
Hi Sachin, I’d recommend migrating to the new AWS 2 IOs in beam-sdks-java-io-amazon-web-services2 (using Amazon’s Java SDK v2) rather soon. The previous ones (beam-sdks-java-io-amazon-web-services and beam-sdks-java-io-kinesis) are both deprecated and not actively maintained anymore. Please ha

Re: read messages from kakfa: 2 different message types in kafka topic

2022-08-09 Thread Moritz Mack
Hi Sigalit, Could you explain a bit more in detail what you mean by 2 different types of messages? Do they share the same schema, e.g. using a union / one of type? Or are you in fact talking about different messages with separate schemas (e.g. discriminated using a message header)? The recomme

Re: Oracle Database Connection Pool creation from Beam

2022-07-29 Thread Moritz Mack
Could you share some more details what you’ve tried so far? I suppose you are using the JdbcIO, right? Have you looked at JdbcIO.PoolableDataSourceProvider? / Moritz On 28.07.22, 17:35, "Koka, Deepthi via dev" wrote: Hi Team, We have an issue with the Oracle connections being used up and we ha

Re: GroupIntoBatches not working on Flink?

2022-07-27 Thread Moritz Mack
I noticed there’s also a similar bug open for the Spark runner https://github.com/apache/beam/issues/21378 Problem seems to be in SimpleDoFnRunner.TimerInternalsTimer#clear(), which doesn’t work with InMemoryTimerInternals (anymore). https://github.com/apache/beam/blob/master/runners/core-java/sr

Re: Metrics in Beam+Spark

2022-07-18 Thread Moritz Mack
Yes, that’s exactly what I was referring to. A - hopefully - easy way to avoid this problem might be to change the Spark configuration to use the following: --conf "spark.metrics.conf.driver.sink.jmx.class"="com.salesforce.einstein.data.platform.connectors.JmxSink" --conf "spark.metrics.conf.ex

Re: Metrics in Beam+Spark

2022-07-15 Thread Moritz Mack
uire us to make another sink? -Yushu On Thu, Jul 14, 2022 at 1:⁠​05 PM Moritz Mack < ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Exercise caution when opening attachments or clicking any links. ZjQcmQRYFpfptBannerEnd Thanks Mo

Re: Metrics in Beam+Spark

2022-07-14 Thread Moritz Mack
Hi Yushu, Wondering, how did you configure your Spark metrics sink? And what version of Spark are you using? Key is to configure Spark to use one of the sinks provided by Beam, e.g.: "spark.metrics.conf.*.sink.csv.class"="org.apache.beam.runners.spark.metrics.sink.CsvSink" Currently there’s sup

Re: RDD (Spark dataframe) into a PCollection?

2022-05-24 Thread Moritz Mack
Hi Yushu, Have a look at org.apache.beam.runners.spark.translation.EvaluationContext in the Spark runner. It maintains that mapping between PCollections and RDDs (wrapped in the BoundedDataset helper). As Reuven just pointed out, values are timestamped (and windowed) in Beam, therefore BoundedD

[SURVEY] Deprecation of Beam AWS IOs v1 (Java)

2022-04-26 Thread Moritz Mack
Hi Beam AWS user, as you might know, there’s currently two different versions of AWS IO connectors in Beam for the Java SDK: * amazon-web-services [1] and kinesis [2] for the AWS Java SDK v1 * amazon-web-services2 (including kinesis) [3] for the AWS Java SDK v2 With the recent release

Re: Write S3 File with CannedACL

2022-03-14 Thread Moritz Mack
forward. Regards, Moritz From: Alexey Romanenko Date: Thursday, 10. March 2022 at 11:09 To: user@beam.apache.org Cc: Moritz Mack Subject: Re: Write S3 File with CannedACL The contributions are very welcome! So, if you decide to go forward with this, please, take a look on these guides [1][2

Re: How to use avro serializer in Kafka write?

2021-12-20 Thread Moritz Mack
.withValueSerializer(TestObjectSerializer.class) //cannot directly use KafkaAvroSerializer .withTopic(options.getOutputTopic()) // just need serializer for value .values()); } Regards, Anjana From: Moritz Mack<mailto:mm...@talend.

Re: How to use avro serializer in Kafka write?

2021-12-17 Thread Moritz Mack
can provide the necessary configuration (schema.registry.url, …) But I’m not sure I fully understand your issue. Could you share some code snippets? Regards, Moritz From: Moritz Mack Reply to: "user@beam.apache.org" Date: Friday, 17. December 2021 at 12:57 To: "user@beam.apache.org

Re: How to use avro serializer in Kafka write?

2021-12-17 Thread Moritz Mack
Hi Anjana, Have you checked the Javadocs of KafkaIO? It is pretty straight forward: PCollection> input = pipeline .apply(KafkaIO.read() .withBootstrapServers("broker_1:9092,broker_2:9092") .withTopic("my_topic") .withKeyDeserializer(LongDeserializer.class) // Use Conflu