Hi, Nick, Let me try to answer in-between the lines:
On Thu, Mar 31, 2016 at 12:49 AM, nick xander <nickxander...@gmail.com> wrote: > > * Do you guys experience issue with Kafka when it is used with log > compaction for Samza's state full management? > The critical issue on log-compaction in Kafka that we care about is the case where message compression and log-compaction are *both* used in the same topic. Currently, for changelog topics, we forcefully turned off compression. Hence, it is not a problem for Samza's KV-stores. It is still a problem for checkpoint topics if the Kafka producer is configured to use message compression. > * What is the avg number of keys per partition that you have observed in > Kafka's log compacted topic for state full management, total number of > partition, replication factor and number of Kafka brokers? > This number varies *a lot*, depending on how big your KV-store is. For example, we have seem around 5-10GB of RocksDB KV-stores being stored in changelog in LinkedIn. That will cause a long bootstrap time when the container is restarted on a different host. Hence, we included host-affinity feature in Samza 0.10, which cut down the bootstrap time for that particular job by 20x. > * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it > seems critical if Samza is used for stateful management? And what is the > timeline for Samza 0.10.1 that you are expecting? > We are planning to release Samza 0.10.1 very soon and are working on pending code reviews and validations now. Depending on the test/validation cycles, we hope to get Samza 0.10.1 release candidate ready in a month or so. Kafka 0.9 upgrade will likely not be in Samza 0.10.1, due to the tight release timeline this time. > * What is recommendation between the usage of Samza vs Kafka connect? > Should we use Samza for state full management and Kafka connect for other > stateless streaming soslution? > > KafkaConnect is mainly an ingest/output connector to/from Kafka, not having much stateful processing. Samza actually does both ingest/output and stateful process. If there are input data sources that Samza does not have a SystemConsumer implementation for yet, you can definitely use KafkaConnect for ingestion and Samza for stateful processing. Hope the above answered your questions. Thanks! -Yi > Thanks, > Nick >