Scott Hendricks created KAFKA-13004: ---------------------------------------
Summary: Trogdor performance decreases sharply with large amounts of tasks. Key: KAFKA-13004 URL: https://issues.apache.org/jira/browse/KAFKA-13004 Project: Kafka Issue Type: Bug Components: tools Environment: We run our Trogdor clusters within Kubernetes. Reporter: Scott Hendricks Assignee: Scott Hendricks As part of my performance tests, I am running 3000 workloads within Trogdor. The clients seem to be able to handle this fine, but when I go to reset and run the same test again, Trogdor seems sluggish. Here are the steps to reproduce this: # Run 3000 workloads in Trogdor, a combination of Produce/Consume workloads. # Wait for the workloads to complete. # Run the DELETE API calls to destroy all 3000 workloads to reset for the next run. # Confirm via the API that there are no workloads defined in the system. # Run an additional 3000 workloads in Trogdor similar to step 1. The Coordinator takes a long time to start the second batch of 3000. There seems to be some performance issue in the framework that will take a while to debug. At this point I don't know if it only affects the Coordinator, or if the Agents are affected as well. I do not currently have the time to look into this, so I am creating this issue to track it. The workaround I am employing is destroying and recreating the Trogdor cluster in between test runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)