[jira] [Comment Edited] (KAFKA-1275) fixes for quickstart documentation

Joe Stein (JIRA) Thu, 06 Mar 2014 13:06:37 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923060#comment-13923060
 ]


Joe Stein edited comment on KAFKA-1275 at 3/6/14 9:05 PM:
----------------------------------------------------------

Maybe an example after the paragraph

"Log compaction is a mechanism to give finer-grained per-record retention, 
rather than the coarser-grained time-based retention. The idea is to 
selectively remove records where we have a more recent update with the same 
primary key. This way the log is guaranteed to have at least the last state for 
each key."

Something like... 

Lets say you have a topic that every time inventory information for a product 
changes it is sent to a topic

Message1 with Primary Key == PRODUCTID99876 is produced

the inventory for PRODUCTID99876 changes

Message 2 with Primary Key == PRODUCTID99876 is produced

the inventory changes again for PRODUCTID99876

Message 3 with Primary Key == PRODUCTID99876

Now, since this data stream is only having the inventory information as a 
snapshot every old one is likely to not have relevance (since for this example 
we are only interested in the latest).  Compaction would get rid of message 1 
and 2 only leaving 3 as it is the current state... or pick something from the 
uses cases you mentioned above in the docs (my example is not very realistic 
just trying to express the thought though)

I think it looks good (but now that I understand it I may no longer be very 
objective).

Another option (instead of example or in addition too, dunno) is after the log 
compaction diagram (the before and after compaction) maybe explain that 0,2 are 
not copied because the primary K1 is latest state is 3 (so that is kept), 1,5 
are not copied because the primary key K2 has the latest state as offset 9, 
etc... 


was (Author: joestein):
Maybe an example after the paragraph

"Log compaction is a mechanism to give finer-grained per-record retention, 
rather than the coarser-grained time-based retention. The idea is to 
selectively remove records where we have a more recent update with the same 
primary key. This way the log is guaranteed to have at least the last state for 
each key."

Something like... 

Lets say you have a topic that every time inventory information for a product 
changes it is sent to a topic

Message1 with Primary Key == PRODUCTID99876 is produced

the inventory for PRODUCTID99876 changes

Message 2 with Primary Key == PRODUCTID99876 is produced

the inventory changes again for PRODUCTID99876

Message 3 with Primary Key == PRODUCTID99876

Now, since this data stream is only having the inventory information as a 
snapshot every old one is likely to not have relevance (since for this example 
we are only interested in the latest).  Compaction would get rid of message 1 
and 2 only leaving 3 as it is the current state... or pick something from the 
uses cases you mentioned above in the docs (my example is not very realistic 
just trying to express the thought though)

I think it looks good (but now that I understand it I may no longer be very 
objective).

Another option (instead of example or in addition too, dunno) is after the log 
compaction (before and after) diagram maybe explain that 0,2 are not copied 
because the primary K1 is latest state is 3 (so that is kept), 1,5 are not 
copied because the primary key K2 has the latest state as offset 9, etc... 

> fixes for quickstart documentation
> ----------------------------------
>
>                 Key: KAFKA-1275
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1275
>             Project: Kafka
>          Issue Type: Bug
>          Components: website
>    Affects Versions: 0.8.1
>            Reporter: Evan Zacks
>            Assignee: Jay Kreps
>            Priority: Minor
>              Labels: documentation
>             Fix For: 0.8.1
>
>         Attachments: KAFKA-1275-quickstart-doc.patch
>
>
> The quickstart guide refers to commands that no longer exist in the master 
> git branch per changes in KAFKA-554.
> If changes for the documentation to match 0.8.1 are already in development 
> elsewhere, please feel free to discard this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (KAFKA-1275) fixes for quickstart documentation

Reply via email to