[ https://issues.apache.org/jira/browse/KAFKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204291#comment-14204291 ]
Gwen Shapira commented on KAFKA-1754: ------------------------------------- [~nehanarkhede] I agree that these are two serious concerns and something we'll have to give more thought to. 1. Resource isolation: [~acmurthy] generously offered to improve YARN's IO isolation, so Kafka's IO can be protected. Page cache is a much more challenging issue. I think (but am not sure) that SparkStreaming does not rely heavily on the page cache, so it may lend itself more readily to co-location. In any case, it remains to be seen whether YARN can help in that regard. 2. Can data locality be achieved with Kafka? If we can't get any benefits out of co-location, there's no point in trying to resolve the challenges :) Intuitively, I'd think that if we use partition keys, we can know what data is stored in each partition and can make use of that knowledge to optimize Stream processing. Perhaps skip some of the shuffle steps, and do more processing in the Kafka receiver. However, I don't have a specific design in mind for that at the moment. Need to think about it a bit more. > KOYA - Kafka on YARN > -------------------- > > Key: KAFKA-1754 > URL: https://issues.apache.org/jira/browse/KAFKA-1754 > Project: Kafka > Issue Type: New Feature > Reporter: Thomas Weise > Attachments: DT-KOYA-Proposal- JIRA.pdf > > > YARN (Hadoop 2.x) has enabled clusters to be used for a variety of workloads, > emerging as distributed operating system for big data applications. > Initiatives are on the way to bring long running services under the YARN > umbrella, leveraging it for centralized resource management and operations > ([YARN-896] and examples such as HBase, Accumulo or Memcached through > Slider). This JIRA is to propose KOYA (Kafka On Yarn), a YARN application > master to launch and manage Kafka clusters running on YARN. Brokers will use > resources allocated through YARN with support for recovery, monitoring etc. > Please see attached for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)