On Tue, Jul 26, 2022 at 11:24 AM Elizaveta Lomteva <
elizaveta.lomt...@akvelon.com> wrote:

> Hi, community!
> Our team has prepared SparkReceiverIO Read via SDF PR [1]. We have started
> working on integration tests for the SparkReceiverIO connector which will
> allow to read data from Custom Spark Receivers in Apache Beam pipeline.
>
> A general Apache Beam recommendation is to implement “ write then read”
> style integration tests. But in our case, only the Read interface was
> implemented because Spark Receivers couldn't be used for the write.
>
> Since SparkReceiverIO is an abstract IO working with Spark Receivers,
> there is no exact implementation for a particular source. Therefore, we
> think to choose RabbitMQ as a test source for the following reasons:
>
>    - It’s possible to implement a Custom Spark Receiver on RabbitMQ as a
>    test streaming receiver
>    - RabbitMQ is lightweight and easy to deploy
>    - There is a test container for RabbitMQ
>    - It’s possible to generate as much test input to the RabbitMQ as we
>    need
>    - Apache Beam has a RabbitMQ IO [2]  that could hypothetically be used
>    in the “write” step of the test
>
> Cons of this choice are:
>
>    - We would need a RabbitMQ test container and additional Kubernetes
>    configuration in ./test-infra
>    - The RabbitMQ peak throughput is less compared with Kafka, for
>    example [3]
>
>
> Based on this, two questions arise:
>
>    1.
>
>    Are there any restrictions when choosing a test source? Can we use
>    RabbitMQ in our case?
>
>
I think the main requirement is that we want to test SparkReceiverIO in a
way that is similar to the way it would be used by actual end-users. So if
RabbitMQ-based receiver is a good representative for a typical Spark
Receiver , this should be fine.



>
>    1.
>    2.
>
>    If RabbitMQ is suitable for our purposes, can we use the RabbitMQ IO
>    to write data in the integration test “write” step or should we use
>    RabbitMQ API directly without adding a dependency on Apache Beam RabbitMQ
>    IO?
>
>

I would use RabbitMQIO and implement a write-then-read type test assuming
we can develop a non-flaky test that uses both connectors. If you run into
flakes I think just developing a test for the source is fine.

Thanks,
Cham


>
>    1.
>
>
> Any ideas or comments would be greatly appreciated!
>
> Thank you in advance,
>
> Elizaveta
>
> [1] [BEAM-14378] [CdapIO] SparkReceiverIO Read via SDF #17828 –
> https://github.com/apache/beam/pull/17828
>
> [2] Apache Beam RabbitMQ IO –
> https://github.com/apache/beam/tree/master/sdks/java/io/rabbitmq
> [3] Benchmarking Apache Kafka, RabbitMQ article (2020 year) –
> https://www.confluent.io/blog/kafka-fastest-messaging-system/
>
>
>
>

Reply via email to