Hi,

I am writing my master thesis about how to orchestrate scalable tasks and
services with docker. The idea is to swap a task's environment very fast, I
plan to use Flink as the environment. For that I package Apache Flink into
container and start one master and X worker nodes plus one task on them.
Like a short term Flink PaaS.

I want to measure the performance of one Flink cluster and compare it to a
Flink cluster inside containers.
That benchmark will be part of a series of benchmarks in which I test disk,
network, memory and CPU. So the Flink benchmark can be an applied use-case
which mixes all sorts of demands.

I saw the TPC-H data generator example (
https://github.com/rmetzger/scratch/blob/distributed-tpch-generator/src/main/java/flink/generators/programs/TPCHGeneratorExample.java)
and I read about the sorting benchmarks on (http://sortbenchmark.org/).
TPC-H: I am not sure how meaningful it is to join those two tables and if
it is commonly used in publications. I thought the minute sort might be
interesting because it sounds small.

*Which benchmarks would you recommend?* The benchmarks should be easy to
implement.

I don't have any infrastructure, I want to rent bare metal servers at
softlayer (
https://www.softlayer.com/Store/orderHourlyBareMetalInstance/37276/64)
maybe 5 servers (4 cores, 8GB ram).* Does that make sense? Will I even see
differences in such a "small" cluster? *

Any help or pointing for papers which do benchmark of data processing
platforms very welcome. Thank you!

Best regards,
Tobias

Reply via email to