[ https://issues.apache.org/jira/browse/KAFKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220604#comment-15220604 ]
Jiangjie Qin edited comment on KAFKA-3334 at 3/31/16 8:21 PM: -------------------------------------------------------------- [~singhashish] I think we are on the same page that we want to let user have a clear idea about where to look at if something goes wrong. In terms of documentation, it is probably extremely difficult to document all the possible scenario user might see because we have so many different configuration combinations and each combination might result in different behaviors. Documentation based on scenario might never be enough:) I was thinking about the following: 1. The default configurations should just work out of the box in general if user does not change any configurations. 2. For each configuration, we need to document clearly what this configuration is for and what are the possible impact. 3. For each exception thrown to user, a clear yet brief message about what was wrong should be in the error message itself. In the documentation of the exceptions, we can list the possible places this exception is thrown (this should match the info in the error message) and what is the possible cause as well as suggested solution. Taking this particular case as an example, in the TimeoutException thrown from producer.send() user will see {noformat} "The producer failed to fetch the metadata for the topic XXX after XXX ms. Please see the exception documentation for possible cause." {noformat} And the documentation of TimeoutException should have something like {noformat} "This exception can be thrown in the following cases: 1. The producer cannot fetch the metadata of a topic. This only happens when the producer is sending message to the topic for the first time. It is more likely to happen if the topic did not exist on the brokers. The new topic creation on the broker might take some time. User can retry send the message in this case. 2. blah blah blah" {noformat} I feel this is more intuitive for the users to get an idea about what went wrong because at the end of the day, the first thing user will see is the exception. If the exception itself does not provide clear pointer, users do not know where to start. For example, if user see TimeoutException, what are they supposed to search or read? So my point is that we should provide crystal clear message in the exception itself, through both error message and documentation. I agree that it might also be useful if we provide the detail on how KafkaProducer sends the message. But it seems for users really care about the internal details, reading the code is probably the best way. was (Author: becket_qin): [~singhashish] I think we are on the same page that we want to let user have a clear idea where to look at if something goes wrong. In terms of documentation, it is probably extremely difficult to document all the possible scenario user might see because we have so many different configuration combinations and each combination might result in different behaviors. Documentation based on scenario might never be enough:) I was thinking about the following: 1. The default configurations should just work out of the box in general if user does not change any configurations. 2. For each configuration, we need to document clearly what this configuration is for and what are the possible impact. 3. For each exception thrown to user, a clear yet brief message about what was wrong should be in the error message itself. In the documentation of the exceptions, we can list the possible places this exception is thrown (this should match the info in the error message) and what is the possible cause as well as suggested solution. Taking this particular case as an example, in the TimeoutException thrown from producer.send() user will see {noformat} "The producer failed to fetch the metadata for the topic XXX after XXX ms. Please see the exception documentation for possible cause." {noformat} And the documentation of TimeoutException should have something like {noformat} "This exception can be thrown in the following cases: 1. The producer cannot fetch the metadata of a topic. This only happens when the producer is sending message to the topic for the first time. It is more likely to happen if the topic did not exist on the brokers. The new topic creation on the broker might take some time. User can retry send the message in this case. 2. blah blah blah" {noformat} I feel this is more intuitive for the users to get an idea about what went wrong because at the end of the day, the first thing user will see is the exception. If the exception itself does not provide clear pointer, users do not know where to start. For example, if user see TimeoutException, what are they supposed to search or read? So my point is that we should provide crystal clear message in the exception itself, through both error message and documentation. I agree that it might also be useful if we provide the detail on how KafkaProducer sends the message. But it seems for users really care about the internal details, reading the code is probably the best way. > First message on new topic not actually being sent, no exception thrown > ----------------------------------------------------------------------- > > Key: KAFKA-3334 > URL: https://issues.apache.org/jira/browse/KAFKA-3334 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.9.0.0 > Environment: Linux, Java > Reporter: Aleksandar Stojadinovic > Assignee: Ashish K Singh > Fix For: 0.10.1.0 > > > Although I've seen this issue pop around the internet in a few forms, I'm not > sure it is yet properly fixed. > When publishing to a new topic, with auto create-enabled, the java client > (0.9.0) shows this WARN message in the log, and the message is not sent > obviously: > org.apache.kafka.clients.NetworkClient - Error while fetching metadata with > correlation id 0 : {file.topic=LEADER_NOT_AVAILABLE} > In the meantime I see in the console the message that a log for partition is > created. The next messages are patched through normally, but the first one is > never sent. No exception is ever thrown, either by calling get on the future, > or with the async usage, like everything is perfect. > I notice when I leave my application blocked on the get call, in the > debugger, then the message may be processed, but with significant delay. This > is consistent with another issue I found for the python client. Also, if I > call partitionsFor previously, the topic is created and the message is sent. > But it seems silly to call it every time, just to mitigate this issue. > {code} > Future<RecordMetadata> recordMetadataFuture = producer.send(new > ProducerRecord<>(topic, key, file)); > RecordMetadata recordMetadata = recordMetadataFuture.get(30, > TimeUnit.SECONDS); > {code} > I hope I'm clear enough. > Related similar (but not same) issues: > https://issues.apache.org/jira/browse/KAFKA-1124 > https://github.com/dpkp/kafka-python/issues/150 > http://stackoverflow.com/questions/35187933/how-to-resolve-leader-not-available-kafka-error-when-trying-to-consume -- This message was sent by Atlassian JIRA (v6.3.4#6332)