Hi Jing, While I do agree that NDV is a little confusing at first sight, it seems quite concise once I got the meaning. So personally I am OK with keeping it as is, but proper documentation would be helpful. If we really want to replace it with a more professional name, *cardinality* might be a good alternative.
Thanks, Jiangjie (Becket) Qin On Thu, Jun 2, 2022 at 12:51 AM Jing Ge <j...@ververica.com> wrote: > Hi Dev, > > I am not really sure if it is feasible to start this discussion. According > to the contribution guidelines, dev ml is the right place to reach > consensus. > > In ColumnStats, Currently ndv, which stands for "number of distinct > values", is used. First of all, it is difficult to understand the meaning > with the abbreviation. Second, it might be good to use a professional > naming instead. > > > > Suggestion: > > replace ndv with granularityNumber: > > > > The good news, afaik, is that the method getNdv() hasn't been used within > Flink which means the renaming will have very limited impact. > > > > ColumnStats { > > /** number of distinct values. */ > > @Deprecated > private final Long ndv; > > > > /**Granularity refers to the level of details used to sort and separate > data at column level. Highly granular data is categorized or separated very > precisely. For example, the granularity number of gender columns should > normally be 2. The granularity number of the month column will be 12. In > the SQL world, it means the number of distinct values. */ > > private final Long granularityNumber; > > > > @Deprecated > public Long getNdv() > { return ndv; } > > > > public Long getGranularityNumber() > { return granularityNumber; } > } > > Best regards, > -- > > Jing >