Yes of-course. I already feel a bit less intelligent for having asked the question ;-)
The status now is that I managed to have it all puzzled together. Copying the files from s3 to an ephemeral volume takes all of 2 seconds so it's really not an issue. The cluster starts and our fat jar and Apache Hop MainBeam class is found and started. The only thing that remains is figuring out how to configure the Flink cluster itself. I have a couple of m5.large ec2 instances in a node group on EKS and I set taskmanager.numberOfTaskSlots to "4". However, the tasks in the pipeline can't seem to find resources to start. Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout Parallelism was set to 1 for the runner and there are only 2 tasks in my first Beam pipeline so it should be simple enough but it just times out. Next step for me is to document the result which will end up on hop.apache.org. I'll probably also want to demo this in Austin at the upcoming Beam summit. Thanks a lot for your time and help so far! Cheers, Matt