Hi Ian, Sounds like you have a total topic size of ~20GB (96 partitions x 200mb). If most keys are unique then group and reduce might not be as effective in grouping/reducing. Can you comment on the key distribution? Are most keys unique? Or do you expect lots of keys to be the same in the topic?
Thanks Eno > On Mar 10, 2017, at 9:05 AM, Ian Duffy <i...@ianduffy.ie> wrote: > > Hi All, > > I'm doing a groupBy and reduce on a kstream which results in a state store > being created. > > This state store is growing to be massive, its filled up a 20gb drive. This > feels very unexpected. Is there some cleanup or flushing process for the > state stores that I'm missing or is such a large size expected? > > The topic in question has 96 partitions and the state is about ~200mb > average for each one. > > 175M 1_0 > 266M 1_1 > 164M 1_10 > 177M 1_11 > 142M 1_12 > 271M 1_13 > 158M 1_14 > 280M 1_15 > 286M 1_16 > 181M 1_17 > 185M 1_18 > 187M 1_19 > 281M 1_2 > 278M 1_20 > 188M 1_21 > 262M 1_22 > 166M 1_23 > 177M 1_24 > 268M 1_25 > 264M 1_26 > 147M 1_27 > 179M 1_28 > 276M 1_29 > 177M 1_3 > 157M 1_30 > 137M 1_31 > 247M 1_32 > 275M 1_33 > 169M 1_34 > 267M 1_35 > 283M 1_36 > 171M 1_37 > 166M 1_38 > 277M 1_39 > 160M 1_4 > 273M 1_40 > 278M 1_41 > 279M 1_42 > 170M 1_43 > 139M 1_44 > 272M 1_45 > 179M 1_46 > 283M 1_47 > 263M 1_48 > 267M 1_49 > 181M 1_5 > 282M 1_50 > 166M 1_51 > 161M 1_52 > 176M 1_53 > 152M 1_54 > 172M 1_55 > 148M 1_56 > 268M 1_57 > 144M 1_58 > 177M 1_59 > 271M 1_6 > 279M 1_60 > 266M 1_61 > 194M 1_62 > 177M 1_63 > 267M 1_64 > 177M 1_65 > 271M 1_66 > 175M 1_67 > 168M 1_68 > 140M 1_69 > 175M 1_7 > 173M 1_70 > 179M 1_71 > 178M 1_72 > 166M 1_73 > 180M 1_74 > 177M 1_75 > 276M 1_76 > 177M 1_77 > 162M 1_78 > 266M 1_79 > 194M 1_8 > 158M 1_80 > 187M 1_81 > 162M 1_82 > 163M 1_83 > 177M 1_84 > 286M 1_85 > 165M 1_86 > 171M 1_87 > 162M 1_88 > 179M 1_89 > 145M 1_9 > 166M 1_90 > 190M 1_91 > 159M 1_92 > 284M 1_93 > 172M 1_94 > 149M 1_95