GitHub user apurvam opened a pull request:

    https://github.com/apache/kafka/pull/1903

    KAFKA-4213: First set of system tests for replication throttling 

    Added the first set of system tests for replication quotas. These tests 
validate throttling behavior during partition reassigment.
    
    Along with this patch are fixes to the test framework which include:
    
    1. KakfaService.verify_replica_reassignment: this method was a no-op and 
would always return success, as explained in KAFKA-4204. This patch adds a 
workaround to the problems mentioned there, by grepping correctly for success, 
failure, and 'in progress' states of partition reassignment.
    2.ProduceConsumeValidateTest.annotate_missing_messages would call 
missing.pop() to enumerate the first 20 missing messages. This meant that all 
future counts of what is actually missing would be off by 20, leading to the 
impression of data loss.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apurvam/kafka throttling-tests

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1903
    
----
commit 372db72c2da55dd2aba70019b258429855832804
Author: Apurva Mehta <apurva.1...@gmail.com>
Date:   2016-09-09T18:53:42Z

    Merge remote-tracking branch 'apache/trunk' into trunk

commit ae912d444d3fb63c2e5487f88949408e0b1207e9
Author: Jason Gustafson <ja...@confluent.io>
Date:   2016-09-09T20:44:55Z

    KAFKA-3807; Fix transient test failure caused by race on future completion
    
    Author: Jason Gustafson <ja...@confluent.io>
    
    Reviewers: Dan Norwood <norw...@confluent.io>, Ismael Juma 
<ism...@juma.me.uk>
    
    Closes #1821 from hachikuji/KAFKA-3807

commit d0a86ffdec330f6e7213a370287a2d81bb93e2bc
Author: Vahid Hashemian <vahidhashem...@us.ibm.com>
Date:   2016-09-10T07:16:23Z

    KAFKA-4145; Avoid redundant integration testing in ProducerSendTests
    
    Author: Vahid Hashemian <vahidhashem...@us.ibm.com>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1842 from vahidhashemian/KAFKA-4145

commit 42b5583561895e308063ed9e2186d83c83ca35d8
Author: Jason Gustafson <ja...@confluent.io>
Date:   2016-09-11T07:46:20Z

    KAFKA-4147; Fix transient failure in 
ConsumerCoordinatorTest.testAutoCommitDynamicAssignment
    
    Author: Jason Gustafson <ja...@confluent.io>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1841 from hachikuji/KAFKA-4147

commit e7697ad0ab0f292ad1e29d9a159d113574bfcf67
Author: Eric Wasserman <eric.wasser...@gmail.com>
Date:   2016-09-12T01:45:05Z

    KAFKA-1981; Make log compaction point configurable
    
    Now uses LogSegment.largestTimestamp to determine age of segment's messages.
    
    Author: Eric Wasserman <eric.wasser...@gmail.com>
    
    Reviewers: Jun Rao <jun...@gmail.com>
    
    Closes #1794 from ewasserman/feat-1981

commit b36034eaa4eb284fafddb1a7507a2cf187993e62
Author: Damian Guy <damian....@gmail.com>
Date:   2016-09-12T04:00:32Z

    MINOR: catch InvalidStateStoreException in QueryableStateIntegrationTest
    
    A couple of the tests may transiently fail in QueryableStateIntegrationTest 
as they are not catching InvalidStateStoreException. This exception is expected 
during rebalance.
    
    Author: Damian Guy <damian....@gmail.com>
    
    Reviewers: Eno Thereska, Guozhang Wang
    
    Closes #1840 from dguy/minor-fix

commit 642b709f919a02379f9d0c9313586b02d179ca78
Author: Tim Brooks <t...@uncontended.net>
Date:   2016-09-13T03:28:01Z

    KAFKA-2311; Make KafkaConsumer's ensureNotClosed method thread-safe
    
    Here is the patch on github ijuma.
    
    Acquiring the consumer lock (the single thread access controls) requires 
that the consumer be open. I changed the closed variable to be volatile so that 
another thread's writes will visible to the reading thread.
    
    Additionally, there was an additional check if the consumer was closed 
after the lock was acquired. This check is no longer necessary.
    
    This is my original work and I license it to the project under the 
project's open source license.
    
    Author: Tim Brooks <t...@uncontended.net>
    
    Reviewers: Jason Gustafson <ja...@confluent.io>
    
    Closes #1637 from tbrooks8/KAFKA-2311

commit ca539df5887bdfdbe86ba45f5514ed54b3b648d4
Author: Dong Lin <lindon...@gmail.com>
Date:   2016-09-14T00:33:54Z

    KAFKA-4158; Reset quota to default value if quota override is deleted
    
    Author: Dong Lin <lindon...@gmail.com>
    
    Reviewers: Joel Koshy <jjkosh...@gmail.com>, Jiangjie Qin 
<becket....@gmail.com>
    
    Closes #1851 from lindong28/KAFKA-4158

commit ba712d29eb2880fbf1709b5d0921028735a09f68
Author: Ismael Juma <ism...@juma.me.uk>
Date:   2016-09-14T16:16:29Z

    MINOR: Give a name to the coordinator heartbeat thread
    
    Followed the same naming pattern as the producer sender thread.
    
    Author: Ismael Juma <ism...@juma.me.uk>
    
    Reviewers: Jason Gustafson
    
    Closes #1854 from ijuma/heartbeat-thread-name

commit 7c0f9b70e4e1ff643d953af50ca70e2d448ef431
Author: David Chen <mvj...@gmail.com>
Date:   2016-09-14T17:38:40Z

    KAFKA-4162: Fixed typo "rebalance"
    
    Author: David Chen <mvj...@gmail.com>
    
    Reviewers: Ewen Cheslack-Postava <e...@confluent.io>
    
    Closes #1853 from mvj3/KAFKA-4162

commit 5ec0ffb32ef556ff22b24ad239f9aa546ac9783a
Author: Jason Gustafson <ja...@confluent.io>
Date:   2016-09-15T01:04:58Z

    KAFKA-4172; Ensure fetch responses contain the requested partitions
    
    Author: Jason Gustafson <ja...@confluent.io>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1857 from hachikuji/KAFKA-4172

commit 8792ef05dc6253dfb1b673832cc0030e8bd3f075
Author: Jason Gustafson <ja...@confluent.io>
Date:   2016-09-15T05:31:52Z

    KAFKA-4160: Ensure rebalance listener not called with coordinator lock
    
    Author: Jason Gustafson <ja...@confluent.io>
    
    Reviewers: Guozhang Wang <wangg...@gmail.com>
    
    Closes #1855 from hachikuji/KAFKA-4160

commit 57a69d9cab6884e0ed67f2a91320fc918034bfd2
Author: Damian Guy <damian....@gmail.com>
Date:   2016-09-15T15:57:48Z

    HOTFIX: fix KafkaStreams SmokeTest
    
    Set the NUM_STREAM_THREADS_CONFIG = 1 in SmokeTestClient as we get locking 
issues when we have NUM_STREAM_THREADS_CONFIG > 1 and we have Standby Tasks, 
i.e., replicas. This is because the Standby Tasks can be assigned to the same 
KafkaStreams instance as the active task, hence the directory is locked
    
    Author: Damian Guy <damian....@gmail.com>
    
    Reviewers: Eno Thereska, Guozhang Wang
    
    Closes #1861 from dguy/fix-smoketest

commit 5f555091bdd04cb49acf3da40ca1de84c731f6cb
Author: Bill Bejeck <bbej...@gmail.com>
Date:   2016-09-16T00:08:00Z

    KAFKA-4131; Multiple Regex KStream-Consumers cause Null pointer exception
    
    Fix for bug outlined in KAFKA-4131
    
    Author: bbejeck <bbej...@gmail.com>
    
    Reviewers: Damian Guy, Guozhang Wang
    
    Closes #1843 from bbejeck/KAFKA-4131_mulitple_regex_consumers_cause_npe

commit bed93e182a52909a108849a3960123b592937653
Author: Ben Stopford <benstopf...@gmail.com>
Date:   2016-09-16T05:25:56Z

    KAFKA-1464; Add a throttling option to the Kafka replication
    
    This applies to Replication Quotas
    based on KIP-73 
[(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas)
 originally motivated by KAFKA-1464.
    
    System Tests Run: 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/544/
    
    **This first PR demonstrates the approach**.
    
    **_Overview of Change_**
    The guts of this change are relatively small. Throttling occurs on both 
leader and follower sides. A single class tracks the throttled throughput in 
and out of each broker (**_ReplicationQuotaManager_**).
    
    On the follower side, the Follower Throttled Rate is calculated as fetch 
responses arrive. Then, before the next fetch request is sent, we check to see 
if the quota is violated, removing throttled partitions from the request if it 
is. This is all encapsulated in a few lines of code in the 
**_ReplicaFetcherThread_**. There is existing code to handle temporal back off, 
if the request ends up being empty.
    
    On the leader side it's a little more complex. When a fetch request arrives 
in the leader, it is built, partition by partition, in 
**_ReplicaManager.readFromLocalLog_**. As we put each partition into the fetch 
response, we check if the total size fits in the current quota. If the quota is 
exceeded, the partition will not be added to the fetch response. Importantly, 
we don't increase the quota at this point, we just check to see if the bytes 
will fit.
    
    Now, if there aren't enough bytes to send the response immediately, which 
is common if we're catching up and throttled, then the request will be put in 
purgatory. I've added some simple code to **_DelayedFetch_** to handle 
throttled partitions (throttled partitions are checked against the quota, 
rather than the messages available in the log).
    
    When the delayed fetch completes, and exits purgatory, 
_**ReplicaManager.readFromLocalLog**_ will be called again. This is why 
_**ReplicaManager.readFromLocalLog**_ does not actually increase the quota, it 
just checks whether enough bytes are available for a partition.
    
    Finally, when there are enough bytes to be sent, or the delayed fetch times 
out, the response will be sent. Before it is sent the throttled-outbound-rate 
is increased, based on the size of throttled partitions being sent. This is at 
the end of _**KafkaApis.handleFetchRequest**_, exactly where client quotas are 
recorded.
    
    There is an acceptance test which asserts the whole throttling process 
stabilises on the desired value. This covers a number of use cases including 
many-to-many replication. See **_ReplicationQuotaTest_**.
    
    Note:
    It should be noted that this protocol can over-request. The request is 
built, based on the quota at time t1 (_ReplicaManager.readFromLocalLog_). The 
bytes in the response are recorded at time t2 (end of 
_KafkaApis.handleFetchRequest_), where t2 > t1. For this reason I originally 
included an OverRequestedRate as a JMX metric, but testing has not seen 
revealed any obvious issue. Over-requesting is quickly compensated by 
subsequent requests, stabilising close to the quota value.
    
    _**Main stuff left to do:**_
    - The fetch size is currently unbounded. This will be addressed in KIP-74, 
but we need to ensure this ensures requests don’t go beyond the throttle 
window.
    - There are two failures showing up in the system tests on this branch:  
StreamsSmokeTest.test_streams (which looks like it fails regularly) and 
OffsetValidationTest.test_broker_rolling_bounce (which I need to look into)
    
    _**Stuff left to do that could be deferred:**_
    - Add the extra metrics specified in the KIP.
    - There are no system tests.
    - There is no validation for the cluster size / throttle combination that 
could lead to ISR dropouts
    
    Author: Ben Stopford <benstopf...@gmail.com>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>, Apurva Mehta 
<apu...@confluent.io>, Jun Rao <jun...@gmail.com>
    
    Closes #1776 from benstopford/rep-quotas-v2

commit 7f3f0b1e511cc2579e8c99596b206e51feeb078a
Author: Damian Guy <damian....@gmail.com>
Date:   2016-09-16T16:58:36Z

    KAFKA-3776: Unify store and downstream caching in streams
    
    This is joint work between dguy and enothereska. The work implements 
KIP-63. Overview of main changes:
    
    - New byte-based cache that acts as a buffer for any persistent store and 
for forwarding changes downstream.
    - Forwarding record path changes: previously a record in a task completed 
end-to-end. Now it may be buffered in a processor node while other records 
complete in the task.
    - Cleanup and state stores and decoupling of cache from state store and 
forwarding.
    - More than 80 new unit and integration tests.
    
    Author: Damian Guy <damian....@gmail.com>
    Author: Eno Thereska <eno.there...@gmail.com>
    
    Reviewers: Matthias J. Sax, Guozhang Wang
    
    Closes #1752 from enothereska/KAFKA-3776-poc

commit 5ea969a0d67c7df8344df819076baf6f54570cdb
Author: Randall Hauch <rha...@gmail.com>
Date:   2016-09-16T21:55:46Z

    KAFKA-4183; Corrected Kafka Connect's JSON Converter to properly convert 
from null to logical values
    
    The `JsonConverter` class has `LogicalTypeConverter` implementations for 
Date, Time, Timestamp, and Decimal, but these implementations fail when the 
input literal value (deserialized from the message) is null.
    
    Test cases were added to check for these cases, and these failed before the 
`LogicalTypeConverter` implementations were fixed to consider whether the 
schema has a default value or is optional, similarly to how the 
`JsonToConnectTypeConverter` implementations do this. Once the fixes were made, 
the new tests pass.
    
    Author: Randall Hauch <rha...@gmail.com>
    
    Reviewers: Shikhar Bhushan <shik...@confluent.io>, Jason Gustafson 
<ja...@confluent.io>
    
    Closes #1867 from rhauch/kafka-4183

commit 8b549e8f6485256cc586ba01ac5178871af21a65
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-09-16T22:54:33Z

    KAFKA-4173; SchemaProjector should successfully project missing Struct 
field when target field is optional
    
    Author: Shikhar Bhushan <shik...@confluent.io>
    
    Reviewers: Konstantine Karantasis <konstant...@confluent.io>, Jason 
Gustafson <ja...@confluent.io>
    
    Closes #1865 from shikhar/kafka-4173

commit c9cec1bdc6abf0058319fae0a944790f3ad4fc7d
Author: Sumit Arrawatia <sumit.arrawa...@gmail.com>
Date:   2016-09-17T03:10:13Z

    KAFKA-4093; Cluster Id (KIP-78)
    
    This PR implements  KIP-78:Cluster Identifiers 
[(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id#KIP-78:ClusterId-Overview)
 and includes the following changes:
    
    1. Changes to broker code
        - generate cluster id and store it in Zookeeper
        - update protocol to add cluster id to metadata request and response
        - add ClusterResourceListener interface, ClusterResource class and 
ClusterMetadataListeners utility class
        - send ClusterResource events to the metric reporters
    2. Changes to client code
        - update Cluster and Metadata code to support cluster id
        - update clients for sending ClusterResource events to interceptors, 
(de)serializers and metric reporters
    3. Integration tests for interceptors, (de)serializers and metric reporters 
for clients and for protocol changes and metric reporters for broker.
    4. System tests for upgrading from previous versions.
    
    Author: Sumit Arrawatia <sumit.arrawa...@gmail.com>
    Author: Ismael Juma <ism...@juma.me.uk>
    
    Reviewers: Jun Rao <jun...@gmail.com>, Ismael Juma <ism...@juma.me.uk>
    
    Closes #1830 from arrawatia/kip-78

commit 378e30c677c4f9d734ecd6858a925ff309c2ceb1
Author: Rajini Sivaram <rajinisiva...@googlemail.com>
Date:   2016-09-17T17:06:05Z

    KAFKA-3492; Secure quotas for authenticated users
    
    Implementation and tests for secure quotas at <user> and <user, client-id> 
levels as described in KIP-55. Also adds dynamic default quotas for 
<client-id>, <user> and <user-client-id>. For each client connection, the most 
specific quota matching the connection is used, with user quota taking 
precedence over client-id quota.
    
    Author: Rajini Sivaram <rajinisiva...@googlemail.com>
    
    Reviewers: Jun Rao <jun...@gmail.com>
    
    Closes #1753 from rajinisivaram/KAFKA-3492

commit a78dabbc5a70e0be829d0143258c0c719ebfda74
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2016-09-17T21:43:43Z

    HOTFIX: Increase timeout for bounce test
    
    Author: Eno Thereska <eno.there...@gmail.com>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1874 from enothereska/hotfix-bounce-test

commit ce6f8a6ef73a54989e95d6ad6e4022e192392a8c
Author: Matthias J. Sax <matth...@confluent.io>
Date:   2016-09-17T21:45:29Z

    HOTFIX: changed quickstart donwload from 0.10.0.0 to 0.10.0.1
    
    Author: Matthias J. Sax <matth...@confluent.io>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1869 from mjsax/hotfix-doc

commit 751622eceaf4eed5532b1e44f1738cbd81ae359d
Author: Grant Henke <granthe...@gmail.com>
Date:   2016-09-17T21:47:56Z

    KAFKA-4157; Transient system test failure in 
replica_verification_test.test_replica_lags
    
    …t.test_replica_lags
    
    Author: Grant Henke <granthe...@gmail.com>
    
    Reviewers: Ashish Singh <asi...@cloudera.com>, Ismael Juma 
<ism...@juma.me.uk>
    
    Closes #1849 from granthenke/replica-verification-fix

commit 639b7cd133d62275f6ada3344579ca7f755ef2fb
Author: Jaikiran Pai <jaikiran....@gmail.com>
Date:   2016-09-17T22:01:32Z

    MINOR: Update the README.md to include a note about GRADLE_USER_HOME
    
    Trying to build the source and publish it to internal Maven repo, I ran 
into an issue that I explain in the mailing list discussion here 
https://www.mail-archive.com/devkafka.apache.org/msg56359.html.
    
    The commit here updates the README.md to make a note that the 
GRADLE_USER_HOME environment variable plays a role in deciding which file to 
add the maven configs to.
    
    Author: Jaikiran Pai <jaikiran....@gmail.com>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1837 from jaikiran/readme-update-grade-user-home

commit 6965270a84eee075adf466bd949ae7f1ff41e579
Author: Andrey Neporada <nepor...@gmail.com>
Date:   2016-09-18T16:12:53Z

    KAFKA-2063; Bound fetch response size (KIP-74)
    
    This PR is implementation of 
[KIP-74](https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes)
 which is originally motivated by 
[KAFKA-2063](https://issues.apache.org/jira/browse/KAFKA-2063).
    
    Author: Andrey Neporada <nepor...@gmail.com>
    Author: Ismael Juma <ism...@juma.me.uk>
    
    Reviewers: Jun Rao <jun...@gmail.com>, Jiangjie Qin <becket....@gmail.com>, 
Jason Gustafson <ja...@confluent.io>
    
    Closes #1812 from nepal/kip-74

commit bdc95ea18c01374d26f77eaa157c58f8f30e6edf
Author: Luke Zaparaniuk <luke.zaparan...@gmail.com>
Date:   2016-09-18T21:59:00Z

    MINOR: Fix reference to argument in `LogSegment.translateOffset`
    
    Changed the lowerBound argument reference in the summary comment of the 
translateOffset method to match the actual argument name: startingFilePosition.
    
    Author: Luke Zaparaniuk <luke.zaparan...@gmail.com>
    
    Reviewers: Ismael Juma <ism...@juma.me.uk>
    
    Closes #1876 from lukezaparaniuk/patch-1

commit 2a23e81f0d14d55684f6df4d3a717affcb40ab52
Author: Damian Guy <damian....@gmail.com>
Date:   2016-09-19T17:28:58Z

    HOTFIX: logic in QuerybaleStateIntegrationTest.shouldBeAbleToQueryState 
incorrect
    
    The logic in `verifyCanGetByKey` was incorrect. It was
    ```
    windowState.size() < keys.length &&
    countState.size() < keys.length &&
    System.currentTimeMillis() < timeout
    ```
    but should be:
    ```
    (windowState.size() < keys.length || countState.size() < keys.length) && 
System.currentTimeMillis() < timeout
    ```
    
    Author: Damian Guy <damian....@gmail.com>
    
    Reviewers: Guozhang Wang <wangg...@gmail.com>
    
    Closes #1879 from dguy/minor-fix-test

commit 21ab564f4dd8d3d7a443d74f444cee3b6eecc664
Author: Damian Guy <damian....@gmail.com>
Date:   2016-09-19T17:30:58Z

    KAFKA-4175: Can't have StandbyTasks in KafkaStreams where 
NUM_STREAM_THREADS_CONFIG > 1
    
    standby tasks should be assigned per consumer not per process
    
    Author: Damian Guy <damian....@gmail.com>
    
    Reviewers: Eno Thereska, Guozhang Wang
    
    Closes #1862 from dguy/kafka-4175

commit 184c0a6c04784491055033c747c13633df4ba1c0
Author: Damian Guy <damian....@gmail.com>
Date:   2016-09-19T18:00:53Z

    KAFKA-4163: NPE in StreamsMetadataState during re-balance operations
    
    During rebalance operations the Cluster object gets set to Cluster.empty(). 
This can result in NPEs when doing certain operation on StreamsMetadataState. 
This should throw a StreamsException if the Cluster is empty as it is not yet 
(re-)initialized
    
    Author: Damian Guy <damian....@gmail.com>
    
    Reviewers: Eno Thereska, Guozhang Wang
    
    Closes #1845 from dguy/streams-meta-hotfix

commit 3e983af14c138c555a0443c94c3d05631073c2fd
Author: Rajini Sivaram <rajinisiva...@googlemail.com>
Date:   2016-09-19T19:16:45Z

    KAFKA-4079; Documentation for secure quotas
    
    Details in KIP-55.
    
    Author: Rajini Sivaram <rajinisiva...@googlemail.com>
    
    Reviewers: Jun Rao <jun...@gmail.com>
    
    Closes #1847 from rajinisivaram/KAFKA-4079

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to