Hi Teodor,
Thank you for working on this. If I remember correctly, there were some
opportunities to improve in the previous paper (e.g. not focusing
deprecated runners, long running benchmarks, varying data sizes). And I am
excited that you are keeping the community as part of your research proces
Hi Feba, I can't say for sure *where* your pipeline is running out of
memory, but I'm going to guess that it's due to the fact that CassandraIO
currently only has the ability to read up an entire table, or have a single
query attached. So if you are calling CassandraIO.read() that grabs all
the "
Hey!
Yeah, that paper was what prompted my master thesis! I definitivly will
post here, once I get more data :)
Teodor
On Mon, Dec 14, 2020 at 06:56:30AM -0600, Rion Williams wrote:
Hi Teodor,
Although I’m sure you’ve come across it, this might have some valuable
resources or methodologies
Too fast “Send” button click =)
You can find snapshot artifacts here:
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-amazon-web-services2/2.27.0-SNAPSHOT/
> On 15 Dec 2020, at 13:14, Alexey Romanenko wrote:
>
> Quick update on this.
>
> We have f
Quick update on this.
We have fixed an issue with AwsCredentialsProvider serialisation [1] for AWS v2
IOs (well, why it’s not serialisable by default it’s a different question) in
2.27.0.
Since it’s not yet released, feel free to test it with snapshot artifacts.
[1] https://issues.apache.org/j
Hi,
We are creating a beam pipeline to do batch processing of data bundles.
The pipeline reads records using CassandraIO. We want to process the data
in batches of 30 min then group/stitch 30 min data and write it to another
table. I have 300 bundles for each employee and we need to process at