It’s not necessarily the wrong tool since deduplication is a standard scenario, but just setting expectations. If you have enough memory I wonder if it would make sense to do it all in-memory with an in-memory store. Depends on whether disk or memory space is at a premium.
Thanks Eno > On Mar 10, 2017, at 11:05 AM, Ian Duffy <i...@ianduffy.ie> wrote: > > Hi Eno, > > Thanks for the fast response. > > We are doing a deduplication process here, so yes you are correct the keys > are normally unique. Sounds like a wrong tool for the job issue on my end. > > Thanks for your input here. > > > > On 10 March 2017 at 10:59, Eno Thereska <eno.there...@gmail.com> wrote: > >> Hi Ian, >> >> Sounds like you have a total topic size of ~20GB (96 partitions x 200mb). >> If most keys are unique then group and reduce might not be as effective in >> grouping/reducing. Can you comment on the key distribution? Are most keys >> unique? Or do you expect lots of keys to be the same in the topic? >> >> Thanks >> Eno >> >> >>> On Mar 10, 2017, at 9:05 AM, Ian Duffy <i...@ianduffy.ie> wrote: >>> >>> Hi All, >>> >>> I'm doing a groupBy and reduce on a kstream which results in a state >> store >>> being created. >>> >>> This state store is growing to be massive, its filled up a 20gb drive. >> This >>> feels very unexpected. Is there some cleanup or flushing process for the >>> state stores that I'm missing or is such a large size expected? >>> >>> The topic in question has 96 partitions and the state is about ~200mb >>> average for each one. >>> >>> 175M 1_0 >>> 266M 1_1 >>> 164M 1_10 >>> 177M 1_11 >>> 142M 1_12 >>> 271M 1_13 >>> 158M 1_14 >>> 280M 1_15 >>> 286M 1_16 >>> 181M 1_17 >>> 185M 1_18 >>> 187M 1_19 >>> 281M 1_2 >>> 278M 1_20 >>> 188M 1_21 >>> 262M 1_22 >>> 166M 1_23 >>> 177M 1_24 >>> 268M 1_25 >>> 264M 1_26 >>> 147M 1_27 >>> 179M 1_28 >>> 276M 1_29 >>> 177M 1_3 >>> 157M 1_30 >>> 137M 1_31 >>> 247M 1_32 >>> 275M 1_33 >>> 169M 1_34 >>> 267M 1_35 >>> 283M 1_36 >>> 171M 1_37 >>> 166M 1_38 >>> 277M 1_39 >>> 160M 1_4 >>> 273M 1_40 >>> 278M 1_41 >>> 279M 1_42 >>> 170M 1_43 >>> 139M 1_44 >>> 272M 1_45 >>> 179M 1_46 >>> 283M 1_47 >>> 263M 1_48 >>> 267M 1_49 >>> 181M 1_5 >>> 282M 1_50 >>> 166M 1_51 >>> 161M 1_52 >>> 176M 1_53 >>> 152M 1_54 >>> 172M 1_55 >>> 148M 1_56 >>> 268M 1_57 >>> 144M 1_58 >>> 177M 1_59 >>> 271M 1_6 >>> 279M 1_60 >>> 266M 1_61 >>> 194M 1_62 >>> 177M 1_63 >>> 267M 1_64 >>> 177M 1_65 >>> 271M 1_66 >>> 175M 1_67 >>> 168M 1_68 >>> 140M 1_69 >>> 175M 1_7 >>> 173M 1_70 >>> 179M 1_71 >>> 178M 1_72 >>> 166M 1_73 >>> 180M 1_74 >>> 177M 1_75 >>> 276M 1_76 >>> 177M 1_77 >>> 162M 1_78 >>> 266M 1_79 >>> 194M 1_8 >>> 158M 1_80 >>> 187M 1_81 >>> 162M 1_82 >>> 163M 1_83 >>> 177M 1_84 >>> 286M 1_85 >>> 165M 1_86 >>> 171M 1_87 >>> 162M 1_88 >>> 179M 1_89 >>> 145M 1_9 >>> 166M 1_90 >>> 190M 1_91 >>> 159M 1_92 >>> 284M 1_93 >>> 172M 1_94 >>> 149M 1_95 >> >>