Hi Jing,

While I do agree that NDV is a little confusing at first sight, it seems
quite concise once I got the meaning. So personally I am OK with keeping it
as is, but proper documentation would be helpful. If we really want to
replace it with a more professional name, *cardinality* might be a good
alternative.

Thanks,

Jiangjie (Becket) Qin

On Thu, Jun 2, 2022 at 12:51 AM Jing Ge <j...@ververica.com> wrote:

> Hi Dev,
>
> I am not really sure if it is feasible to start this discussion. According
> to the contribution guidelines, dev ml is the right place to reach
> consensus.
>
> In ColumnStats, Currently ndv, which stands for "number of distinct
> values", is used. First of all, it is difficult to understand the meaning
> with the abbreviation. Second, it might be good to use a professional
> naming instead.
>
>
>
> Suggestion:
>
> replace ndv with granularityNumber:
>
>
>
> The good news, afaik, is that the method getNdv() hasn't been used within
> Flink which means the renaming will have very limited impact.
>
>
>
> ColumnStats {
>
> /** number of distinct values. */
>
> @Deprecated
> private final Long ndv;
>
>
>
> /**Granularity refers to the level of details used to sort and separate
> data at column level. Highly granular data is categorized or separated very
> precisely. For example, the granularity number of gender columns should
> normally be 2. The granularity number of the month column will be 12. In
> the SQL world, it means the number of distinct values. */
>
> private final Long granularityNumber;
>
>
>
> @Deprecated
> public Long getNdv()
> { return ndv; }
>
>
>
> public Long getGranularityNumber()
> { return granularityNumber; }
> }
>
> Best regards,
> --
>
> Jing
>

Reply via email to