[ https://issues.apache.org/jira/browse/KAFKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923060#comment-13923060 ]
Joe Stein edited comment on KAFKA-1275 at 3/6/14 9:05 PM: ---------------------------------------------------------- Maybe an example after the paragraph "Log compaction is a mechanism to give finer-grained per-record retention, rather than the coarser-grained time-based retention. The idea is to selectively remove records where we have a more recent update with the same primary key. This way the log is guaranteed to have at least the last state for each key." Something like... Lets say you have a topic that every time inventory information for a product changes it is sent to a topic Message1 with Primary Key == PRODUCTID99876 is produced the inventory for PRODUCTID99876 changes Message 2 with Primary Key == PRODUCTID99876 is produced the inventory changes again for PRODUCTID99876 Message 3 with Primary Key == PRODUCTID99876 Now, since this data stream is only having the inventory information as a snapshot every old one is likely to not have relevance (since for this example we are only interested in the latest). Compaction would get rid of message 1 and 2 only leaving 3 as it is the current state... or pick something from the uses cases you mentioned above in the docs (my example is not very realistic just trying to express the thought though) I think it looks good (but now that I understand it I may no longer be very objective). Another option (instead of example or in addition too, dunno) is after the log compaction diagram (the before and after compaction) maybe explain that 0,2 are not copied because the primary K1 is latest state is 3 (so that is kept), 1,5 are not copied because the primary key K2 has the latest state as offset 9, etc... was (Author: joestein): Maybe an example after the paragraph "Log compaction is a mechanism to give finer-grained per-record retention, rather than the coarser-grained time-based retention. The idea is to selectively remove records where we have a more recent update with the same primary key. This way the log is guaranteed to have at least the last state for each key." Something like... Lets say you have a topic that every time inventory information for a product changes it is sent to a topic Message1 with Primary Key == PRODUCTID99876 is produced the inventory for PRODUCTID99876 changes Message 2 with Primary Key == PRODUCTID99876 is produced the inventory changes again for PRODUCTID99876 Message 3 with Primary Key == PRODUCTID99876 Now, since this data stream is only having the inventory information as a snapshot every old one is likely to not have relevance (since for this example we are only interested in the latest). Compaction would get rid of message 1 and 2 only leaving 3 as it is the current state... or pick something from the uses cases you mentioned above in the docs (my example is not very realistic just trying to express the thought though) I think it looks good (but now that I understand it I may no longer be very objective). Another option (instead of example or in addition too, dunno) is after the log compaction (before and after) diagram maybe explain that 0,2 are not copied because the primary K1 is latest state is 3 (so that is kept), 1,5 are not copied because the primary key K2 has the latest state as offset 9, etc... > fixes for quickstart documentation > ---------------------------------- > > Key: KAFKA-1275 > URL: https://issues.apache.org/jira/browse/KAFKA-1275 > Project: Kafka > Issue Type: Bug > Components: website > Affects Versions: 0.8.1 > Reporter: Evan Zacks > Assignee: Jay Kreps > Priority: Minor > Labels: documentation > Fix For: 0.8.1 > > Attachments: KAFKA-1275-quickstart-doc.patch > > > The quickstart guide refers to commands that no longer exist in the master > git branch per changes in KAFKA-554. > If changes for the documentation to match 0.8.1 are already in development > elsewhere, please feel free to discard this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)