-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review84278
-----------------------------------------------------------


Thanks for the updated patch. This looks good. I ended up rebasing while you 
were working on this :) I have a few additional edits which I noted below which 
I will upload shortly.


core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment135488>

    Minor improvement: we can avoid an extra copy by filtering the iterator 
above, and then materializing once.



core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment135497>

    I'm wondering if it would be helpful to split stats into compressed vs 
noncompressed.
    
    E.g., x bytes read (from y compressed bytes); n messages read (from m 
compressed messages) and so on...



core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment135489>

    The last statement can be !redundant && !obsoleteDelete



core/src/main/scala/kafka/message/ByteBufferMessageSet.scala
<https://reviews.apache.org/r/24214/#comment135490>

    I actually had a different thought - i.e., to avoid duplicating the 
compression code in BBMS. Then I ran into the issue that you probably saw - 
i.e., the BBMS create method isn't very amenable to refactor with pre-assigned 
offsets. So I think what you originally had was actually better.
    
    Ideally we should have a compress (raw bytes) method and just use that in 
both places. In fact, we can consider using the Compressor in clients - which 
will have the added benefit of identical compression in use in both the broker 
and clients. E.g., right now it is possible to be under the message size limit 
on the client and still exceed it on the broker.



core/src/main/scala/kafka/message/MessageSet.scala
<https://reviews.apache.org/r/24214/#comment135491>

    Can do without this addition.



core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala
<https://reviews.apache.org/r/24214/#comment135494>

    Minor improvement here to avoid the extra hashmap.



core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala
<https://reviews.apache.org/r/24214/#comment135495>

    Can use Stream.cons for convenience.



core/src/test/scala/unit/kafka/log/LogTest.scala
<https://reviews.apache.org/r/24214/#comment135496>

    Few more minor edits - to test appending keyed compressed messages.


- Joel Koshy


On May 18, 2015, 5:29 p.m., Manikumar Reddy O wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24214/
> -----------------------------------------------------------
> 
> (Updated May 18, 2015, 5:29 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1374
>     https://issues.apache.org/jira/browse/KAFKA-1374
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Addressing Joel's comments
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/LogCleaner.scala 
> abea8b251895a5cc0788c6e25b112a2935a3f631 
>   core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 
> 9dfe914991aaf82162e5e300c587c794555d5fd0 
>   core/src/main/scala/kafka/message/MessageSet.scala 
> 28b56e68cfdbbf107dd7cbd248ffa8fa6bbcd13f 
>   core/src/test/scala/kafka/tools/TestLogCleaning.scala 
> 844589427cb9337acd89a5239a98b811ee58118e 
>   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 
> 3b5aa9dc3b7ac5893c1d281ae1326be0e9ed8aad 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 
> 76d3bfd378f32fd2b216b3ebdec86e2070491924 
> 
> Diff: https://reviews.apache.org/r/24214/diff/
> 
> 
> Testing
> -------
> 
> /*TestLogCleaning stress test output for compressed messages/
> 
> Producing 100000 messages...
> Logging produce requests to 
> /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to 
> /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
> 100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
> De-duplicating and validating output files...
> Validated 9005 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to 
> /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to 
> /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
> 1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to 
> /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to 
> /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
> 10000000 rows of data produced, 1645281 rows of data consumed (83.5% 
> reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> /*TestLogCleaning stress test output for non-compressed messages*/
> 
> Producing 100000 messages...
> Logging produce requests to 
> /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to 
> /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
> 100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
> De-duplicating and validating output files...
> Validated 17874 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to 
> /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to 
> /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
> 1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to 
> /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to 
> /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
> 10000000 rows of data produced, 1136608 rows of data consumed (88.6% 
> reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> Thanks,
> 
> Manikumar Reddy O
> 
>

Reply via email to