Dear cristian You seem to have spent considerable time in thinking about this. I thank you a lot and will study it to take ideas.
If you didn't then wow congrats you have a fast cpu there :) Aris On Sep 5, 2014 7:25 PM, "Christian Csar" <cac...@gmail.com> wrote: > The thought experiment I did ended up having a set of front end servers > corresponding to a given chunk of the user id space, each of which was a > separate subscriber to the same set of partitions. The you have one or > more partitions corresponding to that same chunk of users. You want the > chunk/set of partitions to be of a size where each of those front end > servers can process all the messages in it and send out the > chats/notifications/status change notifications perhaps/read receipts, > to those users who happen to be connected to the particular front end node. > > You would need to handle some deduplication on the consumers/FE servers > and would need to decide where to produce. Producing from every front > end server to potentially every broker could be expensive in terms of > connections and you might want to first relay the messages to the > corresponding front end cluster, but since we don't use large numbers of > producers it's hard for me to say. > > For persistence and offline delivery you can probably accept a delay in > user receipt so you can use another set of consumers that persist the > messages to the longer latency datastore on the backend and then get the > last 50 or so messages with a bit of lag when the user first looks at > history (see hipchat and hangouts lag). > > This gives you a smaller number of partitions and avoids the issue of > having to keep too much history on the Kafka brokers. There are > obviously a significant number of complexities to deal with. For example > if you are using default consumer code that commits offsets into > zookeeper it may be inadvisable at large scales you probably don't need > to worry about reaching. And remember I had done this only as a thought > experiment not a proper technical evaluation. I expect Kafka, used > correctly, can make aspects of building such a chat system much much > easier (you can avoid writing your own message replication system) but > it is definitely not plug and play using topics for users. > > Christian > > > On 09/05/2014 09:46 AM, Jonathan Weeks wrote: > > +1 > > > > Topic Deletion with 0.8.1.1 is extremely problematic, and coupled with > the fact that rebalance/broker membership changes pay a cost per partition > today, whereby excessive partitions extend downtime in the case of a > failure; this means fewer topics (e.g. hundreds or thousands) is a best > practice in the published version of kafka. > > > > There are also secondary impacts on topic count — e.g. useful > operational tools such as: http://quantifind.com/KafkaOffsetMonitor/ > start to become problematic in terms of UX with a massive number of topics. > > > > Once topic deletion is a supported feature, the use-case outlined might > be more tenable. > > > > Best Regards, > > > > -Jonathan > > > > On Sep 5, 2014, at 4:20 AM, Sharninder <sharnin...@gmail.com> wrote: > > > >> I'm not really sure about your exact use-case but I don't think having a > >> topic per user is very efficient. Deleting topics in kafka, at the > moment, > >> isn't really straightforward. You should rethink your date pipeline a > bit. > >> > >> Also, just because kafka has the ability to store messages for a certain > >> time, don't think of it as a data store. Kafka is a streaming system, > think > >> of it as a fast queue that gives you the ability to move your pointer > back. > >> > >> -- > >> Sharninder > >> > >> > >> > >> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <aris.alexis....@gmail.com> > >> wrote: > >> > >>> Thanks for the reply. If I use it only for activity streams like > twitter: > >>> > >>> I would want a topic for each #tag and a topic for each user and maybe > >>> foreach city. Would that be too many topics or it doesn't matter since > most > >>> of them will be deleted in a specified interval. > >>> > >>> > >>> > >>> Best Regards, > >>> Aris Giachnis > >>> > >>> > >>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sharnin...@gmail.com> > wrote: > >>> > >>>> Since you want all chats and mail history persisted all the time, I > >>>> personally wouldn't recommend kafka for your requirement. Kafka is > more > >>>> suitable as a streaming system where events expire after a certain > time. > >>>> Look at something more general purpose like hbase for persisting data > >>>> indefinitely. > >>>> > >>>> So, for example all activity streams can go into kafka from where > >>> consumers > >>>> will pick up messages to parse and put them to hbase or other clients. > >>>> > >>>> -- > >>>> Sharninder > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <snowboard...@gmail.com> > >>>> wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> I am building a big web application that I want to be massively > >>> scalable > >>>> (I > >>>>> am using cassandra and titan as a general db). > >>>>> > >>>>> I want to implement the following: > >>>>> > >>>>> real time web chat that is persisted so that user a in the future can > >>>>> recall his chat with user b,c,d much like facebook. > >>>>> mail like messages in the web application (not sure about this as it > is > >>>>> somewhat covered by the first one) > >>>>> user activity streams > >>>>> users subscribing to topics for example florida/musicevents > >>>>> > >>>>> Could i use kafka for this? can you recommend another technology > maybe? > >>>>> > >>>> > >>> > > > > > > >