Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-15 Thread Ahmet Altay
Hi Teodor, Thank you for working on this. If I remember correctly, there were some opportunities to improve in the previous paper (e.g. not focusing deprecated runners, long running benchmarks, varying data sizes). And I am excited that you are keeping the community as part of your research proces

Re: Doubts on Looping inside a beam transform. Processing sequentially using Apache Beam

2020-12-15 Thread Vincent Marquez
Hi Feba, I can't say for sure *where* your pipeline is running out of memory, but I'm going to guess that it's due to the fact that CassandraIO currently only has the ability to read up an entire table, or have a single query attached. So if you are calling CassandraIO.read() that grabs all the "

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-15 Thread Teodor Spæren
Hey! Yeah, that paper was what prompted my master thesis! I definitivly will post here, once I get more data :) Teodor On Mon, Dec 14, 2020 at 06:56:30AM -0600, Rion Williams wrote: Hi Teodor, Although I’m sure you’ve come across it, this might have some valuable resources or methodologies

Re: SqsIO exception when moving to AWS2 SDK

2020-12-15 Thread Alexey Romanenko
Too fast “Send” button click =) You can find snapshot artifacts here: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-amazon-web-services2/2.27.0-SNAPSHOT/ > On 15 Dec 2020, at 13:14, Alexey Romanenko wrote: > > Quick update on this. > > We have f

Re: SqsIO exception when moving to AWS2 SDK

2020-12-15 Thread Alexey Romanenko
Quick update on this. We have fixed an issue with AwsCredentialsProvider serialisation [1] for AWS v2 IOs (well, why it’s not serialisable by default it’s a different question) in 2.27.0. Since it’s not yet released, feel free to test it with snapshot artifacts. [1] https://issues.apache.org/j

Doubts on Looping inside a beam transform. Processing sequentially using Apache Beam

2020-12-15 Thread Feba Fathima
Hi, We are creating a beam pipeline to do batch processing of data bundles. The pipeline reads records using CassandraIO. We want to process the data in batches of 30 min then group/stitch 30 min data and write it to another table. I have 300 bundles for each employee and we need to process at