First, let me apologize for not realizing/noticing this until today.  One
reason I left my last company was not being paid to work on Kafka nor being
able to afford any time for a while to work on it. Now in my new gig (just
wrapped up my first week, woo hoo) while I am still not "paid to work on
Kafka" I can afford some more time for it now and maybe in 6 months I will
be able to hire folks to work on Kafka (with more and more time for myself
to work on it too) while we also work on client projects (especially Kafka
based ones).

So, I understand about the changes that were made to fix open file handles
and make the random pinning be timed based (with a very large default
time).  Got all that.

But, doesn't this completely negate what has been communicated to the
community for a very long time and the expectation they have? I think it
does.

The expected functionality for random partitioning is that "This can be
done in a round-robin fashion simply to balance load" and that the
"producer" does it for you.

Isn't a primary use case for partitions to paralyze consumers? If so then
the expectation would be that all consumers would be getting in parallel
equally in a "round robin fashion" the data that was produced for the
topic... simply to balance load...with the producer handling it and with
the client application not having to-do anything. This randomness occurring
every 10 minutes can't balance load.

If users are going to work around this anyways (as I would honestly do too)
doing a pseudo semantic random key and essentially forcing real randomness
to simply balance load to my consumers running in parallel would we still
end up hitting the KAFKA-1017 problem anyways? If not then why can't we
just give users the functionality and put back the 3 lines of code 1)
if(key == null) 2)  random.nextInt(numPartitions) 3) else ... If we would
bump into KAFKA-1017 by working around it then we have not really solved
the root cause problem and removing expected functionality for a corner
case that might have other work arounds and/or code changes to solve it
another way or am I still not getting something?

Also, I was looking at testRandomPartitioner in AsyncProducerTest and I
don't see how this would ever fail, the assertion is always for partitionId
== 0 and it should be checking that data is going to different partitions
for a topic, right?

Let me know, I think this is an important discussion and even if it ends up
as telling the community to only use one partition that is all you need and
partitions become our super columns (Apache Cassandra joke, its funny) then
we manage and support it and that is just how it is but if partitions are a
good thing and having multiple consumers scale in parrelel for a single
topic also good then we have to manage and support that.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

Reply via email to