Re: [DISCUSS] URIs on Producer and Consumer

Clebert Suconic Thu, 05 Oct 2017 12:10:36 -0700

On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe <[email protected]> wrote:
> We used URIs as file paths in Hadoop.  I think it was a mistake, for a
> few different reasons.
>
> URIs are actually very complex.  You probably know about scheme, host,
> and port, but did you know about authority, user-info, query, fragment,
> scheme-specific-part?  Do you know what they do in Hadoop?  The mapping
> isn't obvious (and it wouldn't be obvious in Kafka either).


URIs are just a hashmap of key=string.. just like Properties...

The Consumer and Producer is just having such hashMap.. and these
values are easy to translate to boolean, integer.. etc. We would just
need to add such mapping as part of this task when done. I don't see
anything difficult there.


>
> When you flip back and forth between URIs and strings (and you
> inevitably will do this, when serializing or sending things over the
> wire), you run into tons of really hard problems.  Should you preserve
> the "fragment" (the thing after the hash mark) for your URI, or not?  It
> may not do anything now, but maybe it will do something later.  URIs
> also have complex string escaping rules.  Parsing URIs is very messy,
> especially when you start talking about non-Java programming languages.


Why flip back and forth? URIs would generate the same HashMap that's
being generated today.. I don't see any mess here.
Besides... This would be an addition, not replacement...

And I'm talking only about the Java API now.

Again, All the properties on ProducerConfig and ConsumerConfig seems
easy to be mapped as primitive types (String, numbers.. booleans).

Serialization shouldn't be a problem there. it would generate the same
properties it's generated now.

>
> URIs are designed for a world where you talk to a single host over a
> single port.  That isn't the world distributed systems live in.  You
> don't want your clients to fail to bootstrap because the single server
> you specified is having a bad day, even when the other 8 servers are up.

I have seen a few projects using this style of URI: I would make it
doing the same here:

If you have multiple hosts:

KafkaConsumer consumer = new
KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)?property1=value");

if you have a single host:
KafkaConsumer consumer = new
KafkaConsumer("kafka://host2:port?property1=value&property2=value2");


One example of an apache project using a similar approach is qpid-jms:
http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/index.html#failover-configuration-options


> The bottom line is that URIs are the wrong abstraction for the job.
> They just don't express what we really want, and they introduce a lot of
> complexity and ambiguity.

I have seen the opposite to be honest. this has been simpler for me
and users I know than using a HashMap.. .  users in my experience tend
to write this faster.

users can certainly put up with the HashMap.. but this is easier to
remember. I'm just proposing what I think it's a simpler API.




Perhaps we should move into the KIP discussion itself here.. I first
intended to start this thread to see if it would make sense or not...
But I don't have authorization to create the KIP page.. so again..
based on the contributing page.. can someone add me authorizations to
the WIKI space?

Re: [DISCUSS] URIs on Producer and Consumer

Reply via email to