Hi all, I've been helping debug an issue filed against kafka-python related to compatibility w/ Hortonworks 2.3.0.0 kafka release. As I understand it, HDP is currently based on snapshots of apache/kafka trunk, merged with some custom patches from HDP itself.
In this case, HDP's 2.3.0.0 kafka release missed a compatibility patch that I believe is critical for third-party library support. Unfortunately the patch -- KAFKA-1841 -- was initially only applied to the 0.8.2 branch (it was merged to trunk several months later in KAFKA-2068). Because it wasn't on trunk, it didn't get included in the HDP kafka releases. If you recall, KAFKA-1841 was needed to maintain backwards and forwards compatibility wrt the change from zookeeper to kafka-backed offset storage. Not having this patch is fine if you only ever use the clients / libraries distributed in the that release -- and I imagine that is probably most folks that are using it. But if you remember the thread on this issue back in the 0.8.2-beta review, the API incompatibility made third-party clients hard to develop and maintain if the goal is to support multiple broker versions w/ the same client code [this is the goal of kafka-python]. Anyways, I'm really glad that the fix made it into the apache release, but now I'm sad that it didn't make it into HDP's release. Anyways, I think there's a couple takeaways here: (1) I'd recommend anyone using HDP who intends to use third-party kafka consumers should upgrade to 2.3.4.0 or later. That version appears to include the compatibility patch (KAFKA-2068). Of course if anyone is on list from HDP, they may be able to provide better help on this. (2) I think more care should probably be taken to help vendors or anyone tracking changes on trunk wrt released versions. Is there a list of all KAFKA-XXXX patches that are released but not merged into trunk ? KAFKA-1841 is obviously near and dear to my heart, but I wonder if there are other patches like it? Happy holidays to all, and may the force be with you -Dana