Dear cristian

You seem to have spent considerable time in thinking about this.
I thank you a lot and will study it to take ideas.

If you didn't then wow congrats you have a fast cpu there :)

Aris
On Sep 5, 2014 7:25 PM, "Christian Csar" <cac...@gmail.com> wrote:

> The thought experiment I did ended up having a set of front end servers
> corresponding to a given chunk of the user id space, each of which was a
> separate subscriber to the same set of partitions. The you have one or
> more partitions corresponding to that same chunk of users. You want the
> chunk/set of partitions to be of a size where each of those front end
> servers can process all the messages in it and send out the
> chats/notifications/status change notifications perhaps/read receipts,
> to those users who happen to be connected to the particular front end node.
>
> You would need to handle some deduplication on the consumers/FE servers
> and would need to decide where to produce. Producing from every front
> end server to potentially every broker could be expensive in terms of
> connections and you might want to first relay the messages to the
> corresponding front end cluster, but since we don't use large numbers of
> producers it's hard for me to say.
>
> For persistence and offline delivery you can probably accept a delay in
> user receipt so you can use another set of consumers that persist the
> messages to the longer latency datastore on the backend and then get the
> last 50 or so messages with a bit of lag when the user first looks at
> history (see hipchat and hangouts lag).
>
> This gives you a smaller number of partitions and avoids the issue of
> having to keep too much history on the Kafka brokers. There are
> obviously a significant number of complexities to deal with. For example
> if you are using default consumer code that commits offsets into
> zookeeper it may be inadvisable at large scales you probably don't need
> to worry about reaching. And remember I had done this only as a thought
> experiment not a proper technical evaluation. I expect Kafka, used
> correctly, can make aspects of building such a chat system much much
> easier (you can avoid writing your own message replication system) but
> it is definitely not plug and play using topics for users.
>
> Christian
>
>
> On 09/05/2014 09:46 AM, Jonathan Weeks wrote:
> > +1
> >
> > Topic Deletion with 0.8.1.1 is extremely problematic, and coupled with
> the fact that rebalance/broker membership changes pay a cost per partition
> today, whereby excessive partitions extend downtime in the case of a
> failure; this means fewer topics (e.g. hundreds or thousands) is a best
> practice in the published version of kafka.
> >
> > There are also secondary impacts on topic count — e.g. useful
> operational tools such as: http://quantifind.com/KafkaOffsetMonitor/
> start to become problematic in terms of UX with a massive number of topics.
> >
> > Once topic deletion is a supported feature, the use-case outlined might
> be more tenable.
> >
> > Best Regards,
> >
> > -Jonathan
> >
> > On Sep 5, 2014, at 4:20 AM, Sharninder <sharnin...@gmail.com> wrote:
> >
> >> I'm not really sure about your exact use-case but I don't think having a
> >> topic per user is very efficient. Deleting topics in kafka, at the
> moment,
> >> isn't really straightforward. You should rethink your date pipeline a
> bit.
> >>
> >> Also, just because kafka has the ability to store messages for a certain
> >> time, don't think of it as a data store. Kafka is a streaming system,
> think
> >> of it as a fast queue that gives you the ability to move your pointer
> back.
> >>
> >> --
> >> Sharninder
> >>
> >>
> >>
> >> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <aris.alexis....@gmail.com>
> >> wrote:
> >>
> >>> Thanks for the reply. If I use it only for activity streams like
> twitter:
> >>>
> >>> I would want a topic for each #tag and a topic for each user and maybe
> >>> foreach city. Would that be too many topics or it doesn't matter since
> most
> >>> of them will be deleted in a specified interval.
> >>>
> >>>
> >>>
> >>> Best Regards,
> >>> Aris Giachnis
> >>>
> >>>
> >>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sharnin...@gmail.com>
> wrote:
> >>>
> >>>> Since you want all chats and mail history persisted all the time, I
> >>>> personally wouldn't recommend kafka for your requirement. Kafka is
> more
> >>>> suitable as a streaming system where events expire after a certain
> time.
> >>>> Look at something more general purpose like hbase for persisting data
> >>>> indefinitely.
> >>>>
> >>>> So, for example all activity streams can go into kafka from where
> >>> consumers
> >>>> will pick up messages to parse and put them to hbase or other clients.
> >>>>
> >>>> --
> >>>> Sharninder
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <snowboard...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I am building a big web application that I want to be massively
> >>> scalable
> >>>> (I
> >>>>> am using cassandra and titan as a general db).
> >>>>>
> >>>>> I want to implement the following:
> >>>>>
> >>>>> real time web chat that is persisted so that user a in the future can
> >>>>> recall his chat with user b,c,d much like facebook.
> >>>>> mail like messages in the web application (not sure about this as it
> is
> >>>>> somewhat covered by the first one)
> >>>>> user activity streams
> >>>>> users subscribing to topics for example florida/musicevents
> >>>>>
> >>>>> Could i use kafka for this? can you recommend another technology
> maybe?
> >>>>>
> >>>>
> >>>
> >
> >
>
>
>

Reply via email to