The problem turns out to be logging in
kafka.security.auth.SimpleAclAuthorizor. We had logging on because we need
to log denied authorization attempts; all logging in that class is at debug
level with no way to log only denials, so the volume is huge. With logging
turned on, especially on clusters to which Mirrormaker is producing,
cluster performance collapses. We're developing a workaround, but an option
to log denials at WARN level and approvals at DEBUG would be quite helpful.



[image: BandwidthBlue.png]



Phillip Walker  •  Manager, Software Development, Network Engineering

900 Main Campus Drive, Suite 500, Raleigh, NC 27606



m: 919-802-5847  o: 919-238-1452

e: pwal...@bandwidth.com  •  linkedin
<https://www.linkedin.com/in/phillipwalker/> •  twitter
<https://twitter.com/bandwidth>



On Mon, Jul 31, 2017 at 12:16 PM, Meghana Narasimhan <
mnarasim...@bandwidth.com> wrote:

> Hi,
> We recently enabled timestamp and security features in our production
> clusters. We have 5 clusters which are smaller and 2 larger aggreagtion
> clusters which mirror data from the 5 clusters.
>
> The version of Kafka is 0.10.1.1.
>
> For security we enabled the brokers to have both PLAINTEXT and
>  SASL_PLAINTEXT listeners and also enabled inter broker security and
> authorization.
>
> Enabling the above features did not have any impact on the smaller clusters
> but we saw a dramatic decrease in throughput and packets in each of the
> broker servers of the aggregation clusters.
> MirrorMaker was keeping up with the lag from the smaller clusters, but some
> of the consumer clients which were consuming from aggregation clusters
> could not keep up with the load anymore.
>
> We also saw a lot of ISR shrinks and expands, but increasing the
> num.replica.fetchers
> replica.lag.time.max.ms seemed to fix the ISR issue but we continued to
> see
> the throughput and packet issue. We then disabled just inter broker
> security but again that did not make a difference. We finally rolled back
> all the security related changes, No authentication or authorization on the
> aggregation cluster and that seemed to fix the throughput and packet issue.
> Both these parameters look normal again.
>
> Any ideas or thoughts on what could have gone wrong or is this the expected
> behavior ?
>
> Thanks,
> Meghana
>

Reply via email to