It’s not necessarily the wrong tool since deduplication is a standard scenario, 
but just setting expectations. If you have enough memory I wonder if it would 
make sense to do it all in-memory with an in-memory store. Depends on whether 
disk or memory space is at a premium.

Thanks
Eno

> On Mar 10, 2017, at 11:05 AM, Ian Duffy <i...@ianduffy.ie> wrote:
> 
> Hi Eno,
> 
> Thanks for the fast response.
> 
> We are doing a deduplication process here, so yes you are correct the keys
> are normally unique. Sounds like a wrong tool for the job issue on my end.
> 
> Thanks for your input here.
> 
> 
> 
> On 10 March 2017 at 10:59, Eno Thereska <eno.there...@gmail.com> wrote:
> 
>> Hi Ian,
>> 
>> Sounds like you have a total topic size of ~20GB (96 partitions x 200mb).
>> If most keys are unique then group and reduce might not be as effective in
>> grouping/reducing. Can you comment on the key distribution? Are most keys
>> unique? Or do you expect lots of keys to be the same in the topic?
>> 
>> Thanks
>> Eno
>> 
>> 
>>> On Mar 10, 2017, at 9:05 AM, Ian Duffy <i...@ianduffy.ie> wrote:
>>> 
>>> Hi All,
>>> 
>>> I'm doing a groupBy and reduce on a kstream which results in a state
>> store
>>> being created.
>>> 
>>> This state store is growing to be massive, its filled up a 20gb drive.
>> This
>>> feels very unexpected. Is there some cleanup or flushing process for the
>>> state stores that I'm missing or is such a large size expected?
>>> 
>>> The topic in question has 96 partitions and the state is about ~200mb
>>> average for each one.
>>> 
>>> 175M 1_0
>>> 266M 1_1
>>> 164M 1_10
>>> 177M 1_11
>>> 142M 1_12
>>> 271M 1_13
>>> 158M 1_14
>>> 280M 1_15
>>> 286M 1_16
>>> 181M 1_17
>>> 185M 1_18
>>> 187M 1_19
>>> 281M 1_2
>>> 278M 1_20
>>> 188M 1_21
>>> 262M 1_22
>>> 166M 1_23
>>> 177M 1_24
>>> 268M 1_25
>>> 264M 1_26
>>> 147M 1_27
>>> 179M 1_28
>>> 276M 1_29
>>> 177M 1_3
>>> 157M 1_30
>>> 137M 1_31
>>> 247M 1_32
>>> 275M 1_33
>>> 169M 1_34
>>> 267M 1_35
>>> 283M 1_36
>>> 171M 1_37
>>> 166M 1_38
>>> 277M 1_39
>>> 160M 1_4
>>> 273M 1_40
>>> 278M 1_41
>>> 279M 1_42
>>> 170M 1_43
>>> 139M 1_44
>>> 272M 1_45
>>> 179M 1_46
>>> 283M 1_47
>>> 263M 1_48
>>> 267M 1_49
>>> 181M 1_5
>>> 282M 1_50
>>> 166M 1_51
>>> 161M 1_52
>>> 176M 1_53
>>> 152M 1_54
>>> 172M 1_55
>>> 148M 1_56
>>> 268M 1_57
>>> 144M 1_58
>>> 177M 1_59
>>> 271M 1_6
>>> 279M 1_60
>>> 266M 1_61
>>> 194M 1_62
>>> 177M 1_63
>>> 267M 1_64
>>> 177M 1_65
>>> 271M 1_66
>>> 175M 1_67
>>> 168M 1_68
>>> 140M 1_69
>>> 175M 1_7
>>> 173M 1_70
>>> 179M 1_71
>>> 178M 1_72
>>> 166M 1_73
>>> 180M 1_74
>>> 177M 1_75
>>> 276M 1_76
>>> 177M 1_77
>>> 162M 1_78
>>> 266M 1_79
>>> 194M 1_8
>>> 158M 1_80
>>> 187M 1_81
>>> 162M 1_82
>>> 163M 1_83
>>> 177M 1_84
>>> 286M 1_85
>>> 165M 1_86
>>> 171M 1_87
>>> 162M 1_88
>>> 179M 1_89
>>> 145M 1_9
>>> 166M 1_90
>>> 190M 1_91
>>> 159M 1_92
>>> 284M 1_93
>>> 172M 1_94
>>> 149M 1_95
>> 
>> 

Reply via email to