On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe <cmcc...@apache.org> wrote: > We used URIs as file paths in Hadoop. I think it was a mistake, for a > few different reasons. > > URIs are actually very complex. You probably know about scheme, host, > and port, but did you know about authority, user-info, query, fragment, > scheme-specific-part? Do you know what they do in Hadoop? The mapping > isn't obvious (and it wouldn't be obvious in Kafka either).
URIs are just a hashmap of key=string.. just like Properties... The Consumer and Producer is just having such hashMap.. and these values are easy to translate to boolean, integer.. etc. We would just need to add such mapping as part of this task when done. I don't see anything difficult there. > > When you flip back and forth between URIs and strings (and you > inevitably will do this, when serializing or sending things over the > wire), you run into tons of really hard problems. Should you preserve > the "fragment" (the thing after the hash mark) for your URI, or not? It > may not do anything now, but maybe it will do something later. URIs > also have complex string escaping rules. Parsing URIs is very messy, > especially when you start talking about non-Java programming languages. Why flip back and forth? URIs would generate the same HashMap that's being generated today.. I don't see any mess here. Besides... This would be an addition, not replacement... And I'm talking only about the Java API now. Again, All the properties on ProducerConfig and ConsumerConfig seems easy to be mapped as primitive types (String, numbers.. booleans). Serialization shouldn't be a problem there. it would generate the same properties it's generated now. > > URIs are designed for a world where you talk to a single host over a > single port. That isn't the world distributed systems live in. You > don't want your clients to fail to bootstrap because the single server > you specified is having a bad day, even when the other 8 servers are up. I have seen a few projects using this style of URI: I would make it doing the same here: If you have multiple hosts: KafkaConsumer consumer = new KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)?property1=value"); if you have a single host: KafkaConsumer consumer = new KafkaConsumer("kafka://host2:port?property1=value&property2=value2"); One example of an apache project using a similar approach is qpid-jms: http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/index.html#failover-configuration-options > The bottom line is that URIs are the wrong abstraction for the job. > They just don't express what we really want, and they introduce a lot of > complexity and ambiguity. I have seen the opposite to be honest. this has been simpler for me and users I know than using a HashMap.. . users in my experience tend to write this faster. users can certainly put up with the HashMap.. but this is easier to remember. I'm just proposing what I think it's a simpler API. Perhaps we should move into the KIP discussion itself here.. I first intended to start this thread to see if it would make sense or not... But I don't have authorization to create the KIP page.. so again.. based on the contributing page.. can someone add me authorizations to the WIKI space?