Ewen Cheslack-Postava created KAFKA-1771:
--------------------------------------------
Summary: replicate_testsuite data verification broken if
num_partitions > replica_factor
Key: KAFKA-1771
URL: https://issues.apache.org/jira/browse/KAFKA-1771
Project: Kafka
Issue Type: Bug
Components: system tests
Affects Versions: 0.8.1.1
Reporter: Ewen Cheslack-Postava
As discussed in KAFKA-1763, testcase_0131, testcase_0132, and testcase_0133
currently fail with an exception:
{quote}
Traceback (most recent call last):
File
"/mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/
replica_basic_test.py", line 434, in runTest
kafka_system_test_utils.validate_simple_consumer_data_matched_across_replic
as(self.systemTestEnv, self.testcaseEnv)
File
"/mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_tes
t_utils.py", line 2223, in
validate_simple_consumer_data_matched_across_replicas
replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList
IndexError: list index out of range
{quote}
The root cause seems to be kafka_system_test_utils.start_simple_consumer. The
current logic seems incorrect. It should be generating one consumer per
partition per replica so it can verify the data from all sources, but it
currently has a loop involving the list of brokers, where that loop variable
isn't even used.
But probably a bigger issue is that it's generating multiple processes in the
background. It records pids to the single well-known entity pid path, which
means only the last pid is saved and we could easily leave zombie processes if
one of them hangs for some reason.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)