[ 
https://issues.apache.org/jira/browse/KAFKA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1771:
-----------------------------------------
    Attachment: kafka-1771.wip.patch

[~becket_qin], I'm attaching the WIP patch I created just based on my 
investigation of the problem -- it changes start_simple_consumer to iterate 
over partitions and replicas, which is what I think was originally intended.

This at least gets rid of the uncaught exception (at least for testcase_0131), 
but the test still isn't passing:

{quote}
Validate for data matched on topic [test_1] across replicas  :  FAILED
{quote}

I haven't had time to look into this any further than that.

> replicate_testsuite data verification broken if num_partitions > 
> replica_factor
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-1771
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1771
>             Project: Kafka
>          Issue Type: Bug
>          Components: system tests
>    Affects Versions: 0.8.1.1
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>         Attachments: kafka-1771.wip.patch
>
>
> As discussed in KAFKA-1763,   testcase_0131,  testcase_0132, and 
> testcase_0133 currently fail with an exception:
> {quote}
> Traceback (most recent call last):
> File
> "/mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/
> replica_basic_test.py", line 434, in runTest
> kafka_system_test_utils.validate_simple_consumer_data_matched_across_replic
> as(self.systemTestEnv, self.testcaseEnv)
> File
> "/mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_tes
> t_utils.py", line 2223, in
> validate_simple_consumer_data_matched_across_replicas
> replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList
> IndexError: list index out of range
> {quote}
> The root cause seems to be kafka_system_test_utils.start_simple_consumer. The 
> current logic seems incorrect. It should be generating one consumer per 
> partition per replica so it can verify the data from all sources, but it 
> currently has a loop involving the list of brokers, where that loop variable 
> isn't even used.
> But probably a bigger issue is that it's generating multiple processes in the 
> background. It records pids to the single well-known entity pid path, which 
> means only the last pid is saved and we could easily leave zombie processes 
> if one of them hangs for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to