On Tue, Jul 26, 2022 at 11:24 AM Elizaveta Lomteva < elizaveta.lomt...@akvelon.com> wrote:
> Hi, community! > Our team has prepared SparkReceiverIO Read via SDF PR [1]. We have started > working on integration tests for the SparkReceiverIO connector which will > allow to read data from Custom Spark Receivers in Apache Beam pipeline. > > A general Apache Beam recommendation is to implement “ write then read” > style integration tests. But in our case, only the Read interface was > implemented because Spark Receivers couldn't be used for the write. > > Since SparkReceiverIO is an abstract IO working with Spark Receivers, > there is no exact implementation for a particular source. Therefore, we > think to choose RabbitMQ as a test source for the following reasons: > > - It’s possible to implement a Custom Spark Receiver on RabbitMQ as a > test streaming receiver > - RabbitMQ is lightweight and easy to deploy > - There is a test container for RabbitMQ > - It’s possible to generate as much test input to the RabbitMQ as we > need > - Apache Beam has a RabbitMQ IO [2] that could hypothetically be used > in the “write” step of the test > > Cons of this choice are: > > - We would need a RabbitMQ test container and additional Kubernetes > configuration in ./test-infra > - The RabbitMQ peak throughput is less compared with Kafka, for > example [3] > > > Based on this, two questions arise: > > 1. > > Are there any restrictions when choosing a test source? Can we use > RabbitMQ in our case? > > I think the main requirement is that we want to test SparkReceiverIO in a way that is similar to the way it would be used by actual end-users. So if RabbitMQ-based receiver is a good representative for a typical Spark Receiver , this should be fine. > > 1. > 2. > > If RabbitMQ is suitable for our purposes, can we use the RabbitMQ IO > to write data in the integration test “write” step or should we use > RabbitMQ API directly without adding a dependency on Apache Beam RabbitMQ > IO? > > I would use RabbitMQIO and implement a write-then-read type test assuming we can develop a non-flaky test that uses both connectors. If you run into flakes I think just developing a test for the source is fine. Thanks, Cham > > 1. > > > Any ideas or comments would be greatly appreciated! > > Thank you in advance, > > Elizaveta > > [1] [BEAM-14378] [CdapIO] SparkReceiverIO Read via SDF #17828 – > https://github.com/apache/beam/pull/17828 > > [2] Apache Beam RabbitMQ IO – > https://github.com/apache/beam/tree/master/sdks/java/io/rabbitmq > [3] Benchmarking Apache Kafka, RabbitMQ article (2020 year) – > https://www.confluent.io/blog/kafka-fastest-messaging-system/ > > > >