Shanthoosh, Thank you for suggesting and submitting this SEP: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75957309
Couple of things I would want to point out so far: 1. Kudos on cleaning up the interface and introducing new ones (LocalityInfo and LocalityManager). I think we also need MetadataStorage one (details may be worked out later) to hide the locality storage implementation details. 2. Instead of using physical hostname we should stick to the LocationId, since some VMs may be running multiple processors on a single physical host. 3. Thank you for adding the diagrams. I think we can improve them little bit. - First diagram describes how local storage works. Please label it as such. - Second diagram describes the flow of JobModel generation. I am not sure if actual pictures help here. Consider writing it as a list. - Third diagram. Host affinity implementation flow. This is very helpful. I think, though, using function names doesn't give enough clarity on what is going on. May be we should add more explanation. For example: group(InputSSP) -> generate list of SSPs from the list of input streams/partitions. readTaskLocalityInfo() -> read locality mapping from the MetaDataStorage. Also we should add another step there - each processor will update locality information based on its mapping in the current JobModel. 4. Some time the perfect mapping to the same Locality is not possible (especially when a task dies and is distributed between other tasks). What should we do in this case? Thanks again. I will keep reading the document.