Actually, we need a broker. But a more stateful one. Hence the decision to use TTL on hbase. On 7 Oct 2013 08:38, "Benjamin Black" <b...@b3k.us> wrote:
> What you are discovering is that Kafka is a message broker, not a database. > > > On Sun, Oct 6, 2013 at 5:34 PM, Ravindranath Akila < > ravindranathak...@gmail.com> wrote: > > > Thanks a lot Neha! > > > > Actually, using keyed messages(with Simple Consumers) was the approach we > > took. But it seems we can't map each user to a new partition due to > > Zookeeper limitations. Rather, we will have to map a "group" of users on > > one partition. Then we can't fetch the messages for only one user. > > > > It seems our data is best put on HBase with a TTL and versioning. > > > > Thanks! > > > > R. A. > > On 6 Oct 2013 16:00, "Neha Narkhede" <neha.narkh...@gmail.com> wrote: > > > > > Kafka is designed to have of the order of few thousands of partitions > > > roughly less than 10,000. And the main bottleneck is zookeeper. A > better > > > way to design such a system is to have fewer partitions and use keyed > > > messages to distribute the data over a fixed set of partitions. > > > > > > Thanks, > > > Neha > > > On Oct 5, 2013 8:19 PM, "Ravindranath Akila" < > > ravindranathak...@gmail.com> > > > wrote: > > > > > > > Initially, I thought dynamic topic creation can be used to maintain > per > > > > user data on Kafka. The I read that partitions can and should be used > > for > > > > this instead. > > > > > > > > If a partition is to be used to map a user, can there be a million, > or > > > even > > > > billion partitions in a cluster? How does one go about designing > such a > > > > model. > > > > > > > > Can the replication tool be used to assign, say partitions 1 - 10,000 > > on > > > > replica 1, and 10,001 - 20,000 on replica 2? > > > > > > > > If not, since there is a ulimit on the file system, should one model > it > > > > based on a replica/topic/partition approach. Say users 1-10,000 go on > > > topic > > > > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on > topic > > > > 10k-2, and has 10,000 partitions. > > > > > > > > Simply put, how can a million stateful data points be handled? (I > > deduced > > > > that a userid-partition number mapping can be done via a partitioner, > > but > > > > unless a server can be configured to handle only a given set of > > > partitions, > > > > with a range based notation, it is almost impossible to handle a > large > > > > dataset. Is it that Kafka can only handle a limited set of stateful > > data > > > > right now?) > > > > > > > > > > > > > > > > > > http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions > > > > > > > > Btw, why does Kafka have to keep open each partition? Can't a > partition > > > be > > > > opened for read/write when needed only? > > > > > > > > Thanks in advance! > > > > > > > > > >