[ https://issues.apache.org/jira/browse/FLINK-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006726#comment-17006726 ]
Victor Wong commented on FLINK-15448: ------------------------------------- [~zhuzh], with a long-running job it's actually quite not convenient to find the host information because the log file can be very large, and if you configured log file rotation, the host information of a container might be missing. In some circumstances, as mentioned by Xintong, "when you have lots of TM failures" it would be more annoying. About the redundancy concern, since the host information is mainly logged in case of exception, I don't think it's a problem:) > Log host informations for TaskManager failures. > ----------------------------------------------- > > Key: FLINK-15448 > URL: https://issues.apache.org/jira/browse/FLINK-15448 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.9.1 > Reporter: Victor Wong > Priority: Minor > > With Flink on Yarn, sometimes we ran into an exception like this: > {code:java} > java.util.concurrent.TimeoutException: The heartbeat of TaskManager with id > container_xxxx timed out. > {code} > We'd like to find out the host of the lost TaskManager to log into it for > more details, we have to check the previous logs for the host information, > which is a little time-consuming. > Maybe we can add more descriptive information to ResourceID of Yarn > containers, e.g. "container_xxx@host_name:port_number". > Here's the demo: > {code:java} > class ResourceID { > final String resourceId; > final String details; > public ResourceID(String resourceId) { > this.resourceId = resourceId; > this.details = resourceId; > } > public ResourceID(String resourceId, String details) { > this.resourceId = resourceId; > this.details = details; > } > public String toString() { > return details; > } > } > // in flink-yarn > private void startTaskExecutorInContainer(Container container) { > final String containerIdStr = container.getId().toString(); > final String containerDetail = container.getId() + "@" + > container.getNodeId(); > final ResourceID resourceId = new ResourceID(containerIdStr, > containerDetail); > ... > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)