I am looking into Kafka Connect and Confluent HDFSSinkConnector. The goal is to save data from various topics to HDFS. We have at least two different formats of the data in Kafka - raw data (JSON) - that we want to save as SequenceFile and normalized data (Protobuf) that we want to save as Parquet.
(I understand that Confluent expects to use Avro but I succeeded with writing my custom converters and RecordWriters that work fine without Avro and ShemaRegistry). Question: Is there a specific reason that key.converter value,converter are defined per Kafka Connect cluster and not per a specific connector? It means that all the data in Kafka(in all the topics) should be stored in the same format - or I will need two different clusters: one with value.converter = MyCustomJsonConverter and another with MyCustomProtobufConverter. It becomes even worse in case of Protobuf - every topic has a different Protobuf schema - therefore needs a different converter and having a dozen of Kafka clusters sounds like a very bad option. Wouldn't it make more sense to have the key.converter and value.converter defined on the specific Connector level? Any other suggestions?