Mathias,

Thanks for the info. I took a quick look. The biggest difference I saw is
the org.xerial.snappy.SnappyNative.rawCompress() call. In 0.8.1.1, it uses
about 0.05% of the CPU. In 0.8.2.0, it uses about 0.10% of the CPU. We did
upgrade snappy from 1.0.5 in 0.8.1.1 to 1.1.1.6 in 0.8.2.0. Could you try
to use the same version of snappy in 0.8.1.1 and 0.8.2.0 and rerun your
tests?

Jun

On Mon, Feb 2, 2015 at 1:15 PM, Mathias Söderberg <
mathias.soederb...@gmail.com> wrote:

> Hi all,
>
> I ran the same hprof test on 0.8.1.1, and also did a re-run on
> 0.8.2.0-rc2, attached logs from both runs. Both runs lasted for 30-40
> minutes. The configurations used can be found over here:
> https://gist.github.com/mthssdrbrg/5fcb9fbdb851d8cc66a2. The
> configuration used for the first run (on 0.8.2-beta) can be found here:
> https://gist.github.com/mthssdrbrg/7df34a795e07eef10262.
>
> The cluster is running on EC2, currently on 4 m1.xlarge, with a ZooKeeper
> cluster consisting of 3 nodes. There's 1248 topics, and a total of 4143
> partitions, using replication factor 3.
>
> Let me know if there's any other information I can provide.
>
> Best regards,
> Mathias
>
> On Mon Feb 02 2015 at 6:50:54 PM Jay Kreps <jay.kr...@gmail.com> wrote:
>
>> Looking at that profile:
>>
>> Misc. blocking socket activity, not actual CPU work:
>>    1  5.24%  5.24%   67781 300447 java.net.PlainSocketImpl.socketAccept
>>    2  5.24% 10.49%   67775 300515 java.net.PlainSocketImpl.socketAccept
>>    3  5.24% 15.73%   67773 300567 java.net.PlainSocketImpl.socketAccept
>>    4  5.24% 20.97%   67683 301396 sun.nio.ch.ServerSocketChannelImpl.
>> accept0
>>    5  5.23% 26.20%   67582 301395 sun.nio.ch.EPollArrayWrapper.epollWait
>>    6  5.23% 31.42%   67529 301519 sun.nio.ch.EPollArrayWrapper.epollWait
>>    7  5.13% 36.55%   66297 302447 sun.nio.ch.EPollArrayWrapper.epollWait
>>    8  3.95% 40.51%   51071 302446 sun.nio.ch.EPollArrayWrapper.epollWait
>>    9  3.65% 44.16%   47234 303479 sun.nio.ch.EPollArrayWrapper.epollWait
>>   10  3.64% 47.80%   47019 302444 sun.nio.ch.EPollArrayWrapper.epollWait
>>   11  3.62% 51.42%   46768 302445 sun.nio.ch.EPollArrayWrapper.epollWait
>>   12  3.27% 54.69%   42237 303475 sun.nio.ch.EPollArrayWrapper.epollWait
>>   13  3.16% 57.85%   40892 303476 sun.nio.ch.EPollArrayWrapper.epollWait
>>   14  3.14% 60.99%   40556 303478 sun.nio.ch.EPollArrayWrapper.epollWait
>>   15  3.05% 64.04%   39428 303480 sun.nio.ch.EPollArrayWrapper.epollWait
>>   16  2.68% 66.72%   34673 303477 sun.nio.ch.EPollArrayWrapper.epollWait
>>   17  1.23% 67.95%   15867 303520 sun.nio.ch.EPollArrayWrapper.epollWait
>>   18  0.98% 68.93%   12663 303541 sun.nio.ch.EPollArrayWrapper.epollWait
>>   19  0.92% 69.85%   11920 303536 sun.nio.ch.EPollArrayWrapper.epollWait
>>   20  0.85% 70.70%   11015 303546 sun.nio.ch.EPollArrayWrapper.epollWait
>>   21  0.82% 71.53%   10625 303534 sun.nio.ch.EPollArrayWrapper.epollWait
>>   22  0.69% 72.21%    8866 303512 sun.nio.ch.EPollArrayWrapper.epollWait
>>
>>   TransferTo to write data to socket, not actual CPU work (mostly)
>>   23  0.36% 72.57%    4592 304991 sun.nio.ch.FileDispatcherImpl.write0
>>   24  0.36% 72.92%    4589 305021 sun.nio.ch.FileDispatcherImpl.write0
>>   25  0.35% 73.27%    4503 304992 sun.nio.ch.FileDispatcherImpl.write0
>>   26  0.32% 73.60%    4198 305022 sun.nio.ch.FileDispatcherImpl.write0
>>   27  0.25% 73.85%    3250 305246 sun.nio.ch.FileDispatcherImpl.write0
>>   28  0.25% 74.10%    3249 305497 sun.nio.ch.FileDispatcherImpl.write0
>>
>>   Request channel - Actual CPU
>>   29  0.22% 74.32%    2862 305000 sun.misc.Unsafe.unpark
>>   30  0.17% 74.49%    2163 304838 sun.misc.Unsafe.unpark
>>   31  0.14% 74.63%    1795 305240 sun.misc.Unsafe.unpark
>>
>>   Purgatory - Actual CPU
>>   32  0.12% 74.75%    1553 305137 scala.collection.immutable.
>> HashMap.$plus
>>   33  0.12% 74.87%    1546 305100 java.util.concurrent.
>> ConcurrentHashMap.get
>>   34  0.12% 74.98%    1531 305181 java.util.concurrent.
>> ConcurrentHashMap.get
>>   35  0.12% 75.10%    1526 305234 scala.collection.immutable.
>> HashMap.$plus
>>   36  0.12% 75.22%    1521 305401 scala.collection.immutable.
>> HashMap.$plus
>>   37  0.12% 75.34%    1519 305186 java.util.concurrent.
>> ConcurrentHashMap.get
>>   38  0.12% 75.46%    1517 305264 java.util.concurrent.
>> ConcurrentHashMap.get
>>   39  0.12% 75.57%    1514 305271 java.util.concurrent.
>> ConcurrentHashMap.get
>>   40  0.12% 75.69%    1511 305250 scala.collection.immutable.
>> HashMap.$plus
>>   41  0.12% 75.81%    1499 305155 java.util.concurrent.
>> ConcurrentHashMap.get
>>   42  0.12% 75.92%    1496 305113 scala.collection.immutable.
>> HashMap.$plus
>>   43  0.12% 76.04%    1496 305263 scala.collection.immutable.
>> HashMap.$plus
>>   44  0.11% 76.15%    1480 305235 scala.collection.immutable.
>> HashMap.$plus
>>   45  0.11% 76.26%    1444 305185 scala.collection.immutable.
>> HashMap.$plus
>>   46  0.11% 76.37%    1428 305102 java.util.concurrent.
>> ConcurrentHashMap.get
>>   47  0.11% 76.48%    1418 305320 java.util.concurrent.
>> ConcurrentHashMap.get
>>
>>   Compression - Actual CPU
>>   48  0.09% 76.58%    1187 306546 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   49  0.09% 76.67%    1174 305584 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   50  0.09% 76.76%    1173 305545 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   51  0.09% 76.85%    1165 305661 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   52  0.09% 76.94%    1152 307069 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   53  0.09% 77.02%    1121 305997 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   54  0.09% 77.11%    1117 307035 org.xerial.snappy.
>> SnappyNative.rawCompress
>>   55  0.09% 77.19%    1106 306190 org.xerial.snappy.
>> SnappyNative.rawCompress
>>
>> On Mon, Feb 2, 2015 at 9:39 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>
>> > Ah, yeah, you're right. That is just wait time not CPU time. We should
>> > check that profile it must be something else on the list.
>> >
>> > -Jay
>> >
>> > On Mon, Feb 2, 2015 at 9:33 AM, Jun Rao <j...@confluent.io> wrote:
>> >
>> >> Hi, Mathias,
>> >>
>> >> From the hprof output, it seems that the top CPU consumers are
>> >> socketAccept() and epollWait(). As far as I am aware, there hasn't been
>> >> any
>> >> significant changes in the socket server code btw 0.8.1 and 0.8.2.
>> Could
>> >> you run the same hprof test on 0.8.1 so that we can see the difference?
>> >>
>> >> Jaikiran,
>> >>
>> >> The fix you provided in probably unnecessary. The channel that we use
>> in
>> >> SimpleConsumer (BlockingChannel) is configured to be blocking. So even
>> >> though the read from the socket is in a loop, each read blocks if
>> there is
>> >> no bytes received from the broker. So, that shouldn't cause extra CPU
>> >> consumption.
>> >>
>> >> Thanks,
>> >>
>> >> Jun
>> >>
>> >> On Mon, Jan 26, 2015 at 10:05 AM, Mathias Söderberg <
>> >> mathias.soederb...@gmail.com> wrote:
>> >>
>> >> > Hi Neha,
>> >> >
>> >> > I sent an e-mail earlier today, but noticed now that it didn't
>> actually
>> >> go
>> >> > through.
>> >> >
>> >> > Anyhow, I've attached two files, one with output from a 10 minute run
>> >> and
>> >> > one with output from a 30 minute run. Realized that maybe I should've
>> >> done
>> >> > one or two runs with 0.8.1.1 as well, but nevertheless.
>> >> >
>> >> > I upgraded our staging cluster to 0.8.2.0-rc2, and I'm seeing the
>> same
>> >> CPU
>> >> > usage as with the beta version (basically pegging all cores). If I
>> >> manage
>> >> > to find the time I'll do another run with hprof on the rc2 version
>> later
>> >> > today.
>> >> >
>> >> > Best regards,
>> >> > Mathias
>> >> >
>> >> > On Tue Dec 09 2014 at 10:08:21 PM Neha Narkhede <n...@confluent.io>
>> >> wrote:
>> >> >
>> >> >> The following should be sufficient
>> >> >>
>> >> >> java
>> >> >> -agentlib:hprof=cpu=samples,depth=100,interval=20,lineno=
>> >> >> y,thread=y,file=kafka.hprof
>> >> >> <classname>
>> >> >>
>> >> >> You would need to start the Kafka server with the settings above for
>> >> >> sometime until you observe the problem.
>> >> >>
>> >> >> On Tue, Dec 9, 2014 at 3:47 AM, Mathias Söderberg <
>> >> >> mathias.soederb...@gmail.com> wrote:
>> >> >>
>> >> >> > Hi Neha,
>> >> >> >
>> >> >> > Yeah sure. I'm not familiar with hprof, so any particular options
>> I
>> >> >> should
>> >> >> > include or just run with defaults?
>> >> >> >
>> >> >> > Best regards,
>> >> >> > Mathias
>> >> >> >
>> >> >> > On Mon Dec 08 2014 at 7:41:32 PM Neha Narkhede <n...@confluent.io
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> > > Thanks for reporting the issue. Would you mind running hprof and
>> >> >> sending
>> >> >> > > the output?
>> >> >> > >
>> >> >> > > On Mon, Dec 8, 2014 at 1:25 AM, Mathias Söderberg <
>> >> >> > > mathias.soederb...@gmail.com> wrote:
>> >> >> > >
>> >> >> > > > Good day,
>> >> >> > > >
>> >> >> > > > I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and
>> >> noticed
>> >> >> > that
>> >> >> > > > the CPU usage on the broker machines went up by roughly 40%,
>> from
>> >> >> ~60%
>> >> >> > to
>> >> >> > > > ~100% and am wondering if anyone else has experienced
>> something
>> >> >> > similar?
>> >> >> > > > The load average also went up by 2x-3x.
>> >> >> > > >
>> >> >> > > > We're running on EC2 and the cluster currently consists of
>> four
>> >> >> > > m1.xlarge,
>> >> >> > > > with roughly 1100 topics / 4000 partitions. Using Java 7
>> >> (1.7.0_65
>> >> >> to
>> >> >> > be
>> >> >> > > > exact) and Scala 2.9.2. Configurations can be found over here:
>> >> >> > > > https://gist.github.com/mthssdrbrg/7df34a795e07eef10262.
>> >> >> > > >
>> >> >> > > > I'm assuming that this is not expected behaviour for
>> 0.8.2-beta?
>> >> >> > > >
>> >> >> > > > Best regards,
>> >> >> > > > Mathias
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > --
>> >> >> > > Thanks,
>> >> >> > > Neha
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thanks,
>> >> >> Neha
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>

Reply via email to