Jun, Yeah, sure, I'll take it for a spin tomorrow.
On Mon Feb 02 2015 at 11:08:42 PM Jun Rao <j...@confluent.io> wrote: > Mathias, > > Thanks for the info. I took a quick look. The biggest difference I saw is > the org.xerial.snappy.SnappyNative.rawCompress() call. In 0.8.1.1, it uses > about 0.05% of the CPU. In 0.8.2.0, it uses about 0.10% of the CPU. We did > upgrade snappy from 1.0.5 in 0.8.1.1 to 1.1.1.6 in 0.8.2.0. Could you try > to use the same version of snappy in 0.8.1.1 and 0.8.2.0 and rerun your > tests? > > Jun > > On Mon, Feb 2, 2015 at 1:15 PM, Mathias Söderberg < > mathias.soederb...@gmail.com> wrote: > > > Hi all, > > > > I ran the same hprof test on 0.8.1.1, and also did a re-run on > > 0.8.2.0-rc2, attached logs from both runs. Both runs lasted for 30-40 > > minutes. The configurations used can be found over here: > > https://gist.github.com/mthssdrbrg/5fcb9fbdb851d8cc66a2. The > > configuration used for the first run (on 0.8.2-beta) can be found here: > > https://gist.github.com/mthssdrbrg/7df34a795e07eef10262. > > > > The cluster is running on EC2, currently on 4 m1.xlarge, with a ZooKeeper > > cluster consisting of 3 nodes. There's 1248 topics, and a total of 4143 > > partitions, using replication factor 3. > > > > Let me know if there's any other information I can provide. > > > > Best regards, > > Mathias > > > > On Mon Feb 02 2015 at 6:50:54 PM Jay Kreps <jay.kr...@gmail.com> wrote: > > > >> Looking at that profile: > >> > >> Misc. blocking socket activity, not actual CPU work: > >> 1 5.24% 5.24% 67781 300447 java.net.PlainSocketImpl.socketAccept > >> 2 5.24% 10.49% 67775 300515 java.net.PlainSocketImpl.socketAccept > >> 3 5.24% 15.73% 67773 300567 java.net.PlainSocketImpl.socketAccept > >> 4 5.24% 20.97% 67683 301396 sun.nio.ch.ServerSocketChannelImpl. > >> accept0 > >> 5 5.23% 26.20% 67582 301395 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 6 5.23% 31.42% 67529 301519 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 7 5.13% 36.55% 66297 302447 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 8 3.95% 40.51% 51071 302446 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 9 3.65% 44.16% 47234 303479 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 10 3.64% 47.80% 47019 302444 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 11 3.62% 51.42% 46768 302445 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 12 3.27% 54.69% 42237 303475 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 13 3.16% 57.85% 40892 303476 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 14 3.14% 60.99% 40556 303478 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 15 3.05% 64.04% 39428 303480 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 16 2.68% 66.72% 34673 303477 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 17 1.23% 67.95% 15867 303520 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 18 0.98% 68.93% 12663 303541 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 19 0.92% 69.85% 11920 303536 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 20 0.85% 70.70% 11015 303546 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 21 0.82% 71.53% 10625 303534 sun.nio.ch.EPollArrayWrapper. > epollWait > >> 22 0.69% 72.21% 8866 303512 sun.nio.ch.EPollArrayWrapper. > epollWait > >> > >> TransferTo to write data to socket, not actual CPU work (mostly) > >> 23 0.36% 72.57% 4592 304991 sun.nio.ch.FileDispatcherImpl.write0 > >> 24 0.36% 72.92% 4589 305021 sun.nio.ch.FileDispatcherImpl.write0 > >> 25 0.35% 73.27% 4503 304992 sun.nio.ch.FileDispatcherImpl.write0 > >> 26 0.32% 73.60% 4198 305022 sun.nio.ch.FileDispatcherImpl.write0 > >> 27 0.25% 73.85% 3250 305246 sun.nio.ch.FileDispatcherImpl.write0 > >> 28 0.25% 74.10% 3249 305497 sun.nio.ch.FileDispatcherImpl.write0 > >> > >> Request channel - Actual CPU > >> 29 0.22% 74.32% 2862 305000 sun.misc.Unsafe.unpark > >> 30 0.17% 74.49% 2163 304838 sun.misc.Unsafe.unpark > >> 31 0.14% 74.63% 1795 305240 sun.misc.Unsafe.unpark > >> > >> Purgatory - Actual CPU > >> 32 0.12% 74.75% 1553 305137 scala.collection.immutable. > >> HashMap.$plus > >> 33 0.12% 74.87% 1546 305100 java.util.concurrent. > >> ConcurrentHashMap.get > >> 34 0.12% 74.98% 1531 305181 java.util.concurrent. > >> ConcurrentHashMap.get > >> 35 0.12% 75.10% 1526 305234 scala.collection.immutable. > >> HashMap.$plus > >> 36 0.12% 75.22% 1521 305401 scala.collection.immutable. > >> HashMap.$plus > >> 37 0.12% 75.34% 1519 305186 java.util.concurrent. > >> ConcurrentHashMap.get > >> 38 0.12% 75.46% 1517 305264 java.util.concurrent. > >> ConcurrentHashMap.get > >> 39 0.12% 75.57% 1514 305271 java.util.concurrent. > >> ConcurrentHashMap.get > >> 40 0.12% 75.69% 1511 305250 scala.collection.immutable. > >> HashMap.$plus > >> 41 0.12% 75.81% 1499 305155 java.util.concurrent. > >> ConcurrentHashMap.get > >> 42 0.12% 75.92% 1496 305113 scala.collection.immutable. > >> HashMap.$plus > >> 43 0.12% 76.04% 1496 305263 scala.collection.immutable. > >> HashMap.$plus > >> 44 0.11% 76.15% 1480 305235 scala.collection.immutable. > >> HashMap.$plus > >> 45 0.11% 76.26% 1444 305185 scala.collection.immutable. > >> HashMap.$plus > >> 46 0.11% 76.37% 1428 305102 java.util.concurrent. > >> ConcurrentHashMap.get > >> 47 0.11% 76.48% 1418 305320 java.util.concurrent. > >> ConcurrentHashMap.get > >> > >> Compression - Actual CPU > >> 48 0.09% 76.58% 1187 306546 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 49 0.09% 76.67% 1174 305584 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 50 0.09% 76.76% 1173 305545 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 51 0.09% 76.85% 1165 305661 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 52 0.09% 76.94% 1152 307069 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 53 0.09% 77.02% 1121 305997 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 54 0.09% 77.11% 1117 307035 org.xerial.snappy. > >> SnappyNative.rawCompress > >> 55 0.09% 77.19% 1106 306190 org.xerial.snappy. > >> SnappyNative.rawCompress > >> > >> On Mon, Feb 2, 2015 at 9:39 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> > >> > Ah, yeah, you're right. That is just wait time not CPU time. We should > >> > check that profile it must be something else on the list. > >> > > >> > -Jay > >> > > >> > On Mon, Feb 2, 2015 at 9:33 AM, Jun Rao <j...@confluent.io> wrote: > >> > > >> >> Hi, Mathias, > >> >> > >> >> From the hprof output, it seems that the top CPU consumers are > >> >> socketAccept() and epollWait(). As far as I am aware, there hasn't > been > >> >> any > >> >> significant changes in the socket server code btw 0.8.1 and 0.8.2. > >> Could > >> >> you run the same hprof test on 0.8.1 so that we can see the > difference? > >> >> > >> >> Jaikiran, > >> >> > >> >> The fix you provided in probably unnecessary. The channel that we use > >> in > >> >> SimpleConsumer (BlockingChannel) is configured to be blocking. So > even > >> >> though the read from the socket is in a loop, each read blocks if > >> there is > >> >> no bytes received from the broker. So, that shouldn't cause extra CPU > >> >> consumption. > >> >> > >> >> Thanks, > >> >> > >> >> Jun > >> >> > >> >> On Mon, Jan 26, 2015 at 10:05 AM, Mathias Söderberg < > >> >> mathias.soederb...@gmail.com> wrote: > >> >> > >> >> > Hi Neha, > >> >> > > >> >> > I sent an e-mail earlier today, but noticed now that it didn't > >> actually > >> >> go > >> >> > through. > >> >> > > >> >> > Anyhow, I've attached two files, one with output from a 10 minute > run > >> >> and > >> >> > one with output from a 30 minute run. Realized that maybe I > should've > >> >> done > >> >> > one or two runs with 0.8.1.1 as well, but nevertheless. > >> >> > > >> >> > I upgraded our staging cluster to 0.8.2.0-rc2, and I'm seeing the > >> same > >> >> CPU > >> >> > usage as with the beta version (basically pegging all cores). If I > >> >> manage > >> >> > to find the time I'll do another run with hprof on the rc2 version > >> later > >> >> > today. > >> >> > > >> >> > Best regards, > >> >> > Mathias > >> >> > > >> >> > On Tue Dec 09 2014 at 10:08:21 PM Neha Narkhede <n...@confluent.io > > > >> >> wrote: > >> >> > > >> >> >> The following should be sufficient > >> >> >> > >> >> >> java > >> >> >> -agentlib:hprof=cpu=samples,depth=100,interval=20,lineno= > >> >> >> y,thread=y,file=kafka.hprof > >> >> >> <classname> > >> >> >> > >> >> >> You would need to start the Kafka server with the settings above > for > >> >> >> sometime until you observe the problem. > >> >> >> > >> >> >> On Tue, Dec 9, 2014 at 3:47 AM, Mathias Söderberg < > >> >> >> mathias.soederb...@gmail.com> wrote: > >> >> >> > >> >> >> > Hi Neha, > >> >> >> > > >> >> >> > Yeah sure. I'm not familiar with hprof, so any particular > options > >> I > >> >> >> should > >> >> >> > include or just run with defaults? > >> >> >> > > >> >> >> > Best regards, > >> >> >> > Mathias > >> >> >> > > >> >> >> > On Mon Dec 08 2014 at 7:41:32 PM Neha Narkhede < > n...@confluent.io > >> > > >> >> >> wrote: > >> >> >> > > >> >> >> > > Thanks for reporting the issue. Would you mind running hprof > and > >> >> >> sending > >> >> >> > > the output? > >> >> >> > > > >> >> >> > > On Mon, Dec 8, 2014 at 1:25 AM, Mathias Söderberg < > >> >> >> > > mathias.soederb...@gmail.com> wrote: > >> >> >> > > > >> >> >> > > > Good day, > >> >> >> > > > > >> >> >> > > > I upgraded a Kafka cluster from v0.8.1.1 to v0.8.2-beta and > >> >> noticed > >> >> >> > that > >> >> >> > > > the CPU usage on the broker machines went up by roughly 40%, > >> from > >> >> >> ~60% > >> >> >> > to > >> >> >> > > > ~100% and am wondering if anyone else has experienced > >> something > >> >> >> > similar? > >> >> >> > > > The load average also went up by 2x-3x. > >> >> >> > > > > >> >> >> > > > We're running on EC2 and the cluster currently consists of > >> four > >> >> >> > > m1.xlarge, > >> >> >> > > > with roughly 1100 topics / 4000 partitions. Using Java 7 > >> >> (1.7.0_65 > >> >> >> to > >> >> >> > be > >> >> >> > > > exact) and Scala 2.9.2. Configurations can be found over > here: > >> >> >> > > > https://gist.github.com/mthssdrbrg/7df34a795e07eef10262. > >> >> >> > > > > >> >> >> > > > I'm assuming that this is not expected behaviour for > >> 0.8.2-beta? > >> >> >> > > > > >> >> >> > > > Best regards, > >> >> >> > > > Mathias > >> >> >> > > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > -- > >> >> >> > > Thanks, > >> >> >> > > Neha > >> >> >> > > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Thanks, > >> >> >> Neha > >> >> >> > >> >> > > >> >> > >> > > >> > > >> > > >