Hi, Lately I was debugging some weird test failures on Travis and I needed to look into metrics like: - User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous check) - System wide memory consumption (including making sure that swap was disabled) - network usage - etc…
Without an access to the machines itself. For this purpose I implemented some periodic daemon thread logger. Log output looked like this: https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 <https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> I think it would be nice to add this feature to Flink itself, by extending existing MemoryLogger. Same lack of information that I had with travis could easily happen on productional environments. The problem is that there is no easy way to obtain such kind of information without using some external libraries (think about cross platform support). I have used for that: https://github.com/oshi/oshi <https://github.com/oshi/oshi> It has some minimal additional dependencies, one thing worth noting is a JNA - it’s JAR weights ~1MB. We would have two options to add this feature: 1. Include this oshi dependency in flink-runtime 2. Wrap oshi into flink-contrib/flink-resource-logger module and make this new module an optional/dynamically loaded dependency by flink-runtime (used only if user manually copies flink-resource-logger.jar to a class path). I would lean toward 1., since that’s a powerful tool and it’s dependencies are pretty minimal (except this JNA’s jar size). What do you think? Piotrek