Hi Roman, Current Kafka messaging guarantee is at-least once, and we are working on transactional messaging features to make it exactly once. We are expecting it to be used as synchronization/replication layer for some storage systems as your use case after that.
As for your design, since you will probably have a lot of users and each user's data is small, you will end up with many small files on Kafka. If all you want is order preserving per user, you can probably just use keyed-messages with key as the user id, by that all messages with the same key will end up into the same partition and hence consumed by the same consumer client. With that you only need a fixed small number of partitions. Guozhang On Fri, Aug 8, 2014 at 12:35 PM, Roman Iakovlev <roman.iakov...@live.com> wrote: > Dear all, > > > > I'm new to Kafka, and I'm considering using it for a maybe not very usual > purpose. I want it to be a backend for data synchronization between a > magnitude of devices, which are not always online (mobile and embedded > devices). All the synchronized information belong to some user, and can be > identified by the user id. There are several data types, and a user can > have > many entries of each data type coming from many different devices. > > > > This solution has to scale up to hundreds of thousands of users, and, as > far > as I understand, Kafka stores every partition in a single file. I've been > thinking about creating a topic for every data type and a separate > partition > for every user. Amount of data stored by every user is no more than several > megabytes over the whole lifetime, because the data stored would be keyed > messages, and I'm expecting it to be compacted. > > > > So what I'm wondering is, would Kafka be a right approach for such task, > and > if yes, would this architecture (one topic per data type and one partition > per user) scale to specified extent? > > > > Thanks, > > Roman. > > -- -- Guozhang