Hi Faye,
Flink does not officially provide testing tools at the moment. However,
you can use internal Flink tools if they solve your problem.
The `flink-end-to-end-tests` module [1] shows some examples how we test
Flink together with other systems. Many tests are still using plain bash
scripts (in the `test-scripts` folder). The newer generation uses a test
base like StreamingKafkaITCase [2] or SQLClientKafkaITCase [3].
Alternatively, you can also replace the connectors with testing
connectors and just run integration tests for your pipeline. Like we do
it in StreamTableEnvironmentITCase [4].
[1] https://github.com/apache/flink/tree/master/flink-end-to-end-tests
[2]
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/java/org/apache/flink/tests/util/kafka/StreamingKafkaITCase.java
[3]
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/java/org/apache/flink/tests/util/kafka/SQLClientKafkaITCase.java
[4]
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/sql/StreamTableEnvironmentITCase.scala
Regards,
Timo
On 10.08.20 21:46, Faye Pressly wrote:
Hello!
I have built a Flink pipeline that involve
1) Reading events from a Kinesis Stream
2) Create two DataStream (using .filter) with events of type 'A' going
in stream1 and event of type 'B' going in stream2
3) Transform stream1 into a Table and use Table API to do a simple
window tumble and group by to counts the events
4) Interval join stream1 with stream2 in order to filter out some event
in stream1 that are not in stream2
5) Transform the result of the interval join into a table and use Table
API to do a simple Tumble Window Group by to count the events
6) Join 3) and 5) and transform back to a stream that sinks to an output
kinesis stream
I have read the documentation that shows some examples of Unit Testing
but I'm scratching my end to know how I'm going to be able to IT test my
pipeline to make sure all the computation are correct given an exact
input dataset?
Is there a proper way of writing IT to test my pipepleine?
Or will have I have to bring up a Flink cluster (with docker for
example, fire events with a python scripts and then check the results by
reading the output stream?
Thank you!