Hi Yi,

        Thanks for the support, really appreciate it to have an
active/supportive community. Makes sense to not upgrade Samza to use Kafka
0.9 new client(which doesn’t rely on Zookeeper) because it might break the
clients using Kafka 0.8.2 broker. But as you were saying we might be able
to use 0.9 Broker with Samza 0.10 (Can you please confirm this? I tried
going through different documentation, seems possible. But “This means that
upgraded brokers and clients may not be compatible with older versions” in
Kafka 0.9 documentation worries me). It would be great if you guys could do
some sanity testing with Kafka 0.9 broker and see if there are any issues
using Samza 0.10, as you guys are the experts in the field and we will not
be able to identify all the use cases Samza using Kafka for. Ex: Tools
packaged under *org.apache.kafka.clients.tools.** have been moved to
*org.apache.kafka.tools.*(I suppose this is only for script), bunch of
Kafka configurations that got deprecated that might affect? and other potential
breaking changes
<http://kafka.apache.org/documentation.html#upgrade_9_breaking>. It would
be really helpful for this community if Samza team could confirm that Kafka
0.9 could be safely used with 0.10 or 0.10.1 version.



Thanks,

Nick
------------------------------


Hi, Nick,



Thanks for digging out the details from KAFKA JIRAs! I appreciated it!*



As for upgrading to Kafka 0.9 to fix those critical issues, I am totally w/

you. The discussion on whether Samza 0.10.1 should include Kafka 0.9 fixes

or not has just started (by your thread :)). So, we are happy to

accommodate the request if the community has need for that.



As for LinkedIn deployment, we actually have already deployed an internal

version of Kafka that has most of the 0.9 fixes for log-compaction w/

compressed messages. I will need to check w/ our Kafka team to see whether

the bugs you mentioned also is included. There is a bit concern on pushing

out Kafka 0.9 (with client libs) to Samza 0.10.1 due to the fact that some

of the community members are still running Kafka 0.8.2 brokers in their

production and this change might incur some migration cost. Besides, Kafka

0.9 also introduces a new client library changes that requires code change

in Samza's KafkaSystemConsumer/KafkaSystemProducer. Hence, our original

thought is to keep Samza 0.10.1 as a light-weighted release and incorporate

Kafka 0.9 in the next major release.



However, if Kafka 0.9 brokers are supporting Kafka 0.8.2 clients, I don't

think that it should block you from using Kafka 0.9 broker and Samza 0.10

together to fix the server side issues you mentioned. If there any

client-side change in Samza that is needed, we are happy to help and if

necessary, we can also change the scope of Samza 0.10.1 to include Kafka

0.9 client libraries.



Please let me know if the above works for you. If not, let me know the

specific issues that we need to use Kafka 0.9 client and we can find a

solution together.



Thanks a lot!



-Yi



On Fri, Apr 1, 2016 at 3:34 PM, nick xander <nickxander...@gmail.com> wrote:



> Hi Yi,

>

>         Thanks for the clarification, it was helpful.

>

>

> I would also like to know your views on the below issues and if you have

> employed something to overcome those.

>

> LogCompaction Issues:

>

> https://issues.apache.org/jira/browse/KAFKA-2163 - Offsets manager cache

> should prevent stale-offset-cleanup while an offset load is in progress;

> otherwise we can lose consumer offsets – *Might be an issue as it will

> result in no offset to be read thereby failing the bootstrap of local key

> value store*

>

> https://issues.apache.org/jira/browse/KAFKA-2118 - Cleaner cannot clean

> after shutdown during replaceSegments –

> *Will prevent reading log compacted topic causing failure of local key

> value store bootstrap*

>

> https://issues.apache.org/jira/browse/KAFKA-2235 - LogCleaner offset map

> overflow –

> *Will probably be an issue for some clients who has smaller  message size

> and large number of keys. They need to fine tune a lot to make sure that

> this doesn't happen.*

>

>

>

> Replication Issues:

>

> https://issues.apache.org/jira/browse/KAFKA-2477 - Replicas spuriously

> deleting all segments in partition –

> *Will cause the data in changelog topic to be lost resulting in failure of

> local key value store bootstrap. *

>

>

> Though Samza can be plugged with different messaging systems, Kafka is the

> major system that is supported today for state-full processing. If that's

> the case the following bugs will potentially make Samza also to not work

> properly (Ex: if there is replication issue called out below in a log

> compacted topic happens, then Samza might not be able to restore its local

> key value store).. Since you are running Samza with state-full processing,

> the above issues might result your Samza job with key value store in an

> in-consistent state. Are you using Samza with stateful processing for

> critical applications which cannot tolerate loss of data or

> inconsistencies? (Because with the above bugs you might not be able to run

> the job for critical application as it might fail if it is hit with the

> above issues). I believe that upgrading to 0.9 Kafka is much critical to

> ensure that Samza also works properly (I do understand that its not a
issue

> with Samza, but I believe that the one of the primary reason for

> customers/devs choosing Samza is its fine ability to do state-full

> processing and if that is not working or will fail due to dependency on

> Kafka, it becomes necessary to upgrade to Kafka asap), please correct me
if

> I am wrong here.

>

>

> Thanks,

>

> Nick

>

>

>

>

> ------------------------------

>

>

>

> Hi, Nick,

>

>

>

> Let me try to answer in-between the lines:

>

>

>

> On Thu, Mar 31, 2016 at 12:49 AM, nick xander <nickxander...@gmail.com>

>

> wrote:

>

>

>

> >

>

> > * Do you guys experience issue with Kafka when it is used with log

>

> > compaction for Samza's state full management?

>

> >

>

>

>

> The critical issue on log-compaction in Kafka that we care about is the

>

> case where message compression and log-compaction are *both* used in the

>

> same topic. Currently, for changelog topics, we forcefully turned off

>

> compression. Hence, it is not a problem for Samza's KV-stores. It is still

>

> a problem for checkpoint topics if the Kafka producer is configured to use

>

> message compression.

>

>

>

>

>

> > * What is the avg number of keys per partition that you have observed in

>

> > Kafka's log compacted topic for state full management, total number of

>

> > partition, replication factor and number of Kafka brokers?

>

> >

>

>

>

> This number varies *a lot*, depending on how big your KV-store is. For

>

> example, we have seem around 5-10GB of RocksDB KV-stores being stored in

>

> changelog in LinkedIn. That will cause a long bootstrap time when the

>

> container is restarted on a different host. Hence, we included

>

> host-affinity feature in Samza 0.10, which cut down the bootstrap time for

>

> that particular job by 20x.

>

>

>

>

>

> > * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it

>

> > seems critical if Samza is used for stateful management? And what is the

>

> > timeline for Samza 0.10.1 that you are expecting?

>

> >

>

>

>

> We are planning to release Samza 0.10.1 very soon and are working on

>

> pending code reviews and validations now. Depending on the test/validation

>

> cycles, we hope to get Samza 0.10.1 release candidate ready in a month or

>

> so. Kafka 0.9 upgrade will likely not be in Samza 0.10.1, due to the tight

>

> release timeline this time.

>

>

>

>

>

> > * What is recommendation between the usage of Samza vs Kafka connect?

>

> > Should we use Samza for state full management and Kafka connect for
other

>

> > stateless streaming soslution?

>

> >

>

> >

>

> KafkaConnect is mainly an ingest/output connector to/from Kafka, not
having

>

> much stateful processing. Samza actually does both ingest/output and

>

> stateful process. If there are input data sources that Samza does not have

>

> a SystemConsumer implementation for yet, you can definitely use

>

> KafkaConnect for ingestion and Samza for stateful processing.

>

>

>

> Hope the above answered your questions.

>

>

>

> Thanks!

>

>

>

> -Yi

>

>

>

> On Thu, Mar 31, 2016 at 9:49 AM, nick xander <nickxander...@gmail.com>

> wrote:

>

> > Hi All,

> >     As per this article:

> >

>
http://www.confluent.io/blog/290-reasons-to-upgrade-to-apache-kafka-0.9.0.0

> > there are some well know bugs and feature improvements around log

> > compaction (state full management in Samza) and Replication. I also saw

> in

> > Samza issues about this upgrade:

> > https://issues.apache.org/jira/browse/SAMZA-855. My questions here:

> >

> > * Do you guys experience issue with Kafka when it is used with log

> > compaction for Samza's state full management?

> > * What is the avg number of keys per partition that you have observed in

> > Kafka's log compacted topic for state full management, total number of

> > partition, replication factor and number of Kafka brokers?

> > * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it

> > seems critical if Samza is used for stateful management? And what is the

> > timeline for Samza 0.10.1 that you are expecting?

> > * What is recommendation between the usage of Samza vs Kafka connect?

> > Should we use Samza for state full management and Kafka connect for
other

> > stateless streaming soslution?

> >

> > Thanks,

> > Nick

> >

>

Reply via email to