Shanthoosh,
Thank you for suggesting and submitting this SEP:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75957309

Couple of things I would want to point out so far:

   1.  Kudos on cleaning up the interface and introducing new ones
   (LocalityInfo and LocalityManager). I think we also need MetadataStorage
   one (details may be worked out later) to hide the locality storage
   implementation details.
   2. Instead of using physical hostname we should stick to the LocationId,
   since some VMs may be running multiple processors on a single physical host.
   3. Thank you for adding the diagrams. I think we can improve them little
   bit.
   - First diagram describes how local storage works. Please label it as
      such.
      - Second diagram describes the flow of JobModel generation. I am not
      sure if actual pictures help here. Consider writing it as a list.
      - Third diagram. Host affinity implementation flow. This is very
      helpful. I think, though, using function names doesn't give
enough clarity
      on what is going on. May be we should add more explanation. For example:
          group(InputSSP) -> generate list of SSPs from the list of input
      streams/partitions.
          readTaskLocalityInfo() -> read locality mapping from the
      MetaDataStorage.
       Also we should add another step there - each processor will update
      locality information based on its mapping in the current JobModel.
   4. Some time the perfect mapping to the same Locality is not possible
   (especially when a task dies and is distributed between other tasks). What
   should we do in this case?


Thanks again. I will keep reading the document.

Reply via email to