The example of SQL is obviously dominating thoughts of NULL, but I think
that the example of R is probably better in terms of how things can work
fairly well.  NULL is a key concept and very helpful in a number of
settings.  With R's fairly simple functional nature it is easy to filter
data and most functions have options to adjust treatment of null.  R also
has the value NA which is intended for the representation of missing
values.  R also supports NaN (not-a-number) for the result of undefined
numerical operations.

My own thought is based on the fact that I am rarely surprised by NULL, NA
and NaN in R and often surprised by NULL in SQL.  I don't have strong
conclusions from that other than to think that R did something right and
SQL did something wrong (from my point of view).

None of this matters if compatibility with SQL is the primary requirement.
In that case, I say just do it.





On Mon, Jun 15, 2015 at 8:45 AM, Maximilian Michels <m...@apache.org> wrote:

> Hi everyone,
>
> I'm seeing a lot of null value related pull requests nowadays, like these:
>
> https://github.com/apache/flink/pull/780
> https://github.com/apache/flink/pull/831
> https://github.com/apache/flink/pull/834
>
> It used to be the case that null values were simply not supported by Flink.
> Recently, Flink supports null values for some components. Now I'm wondering
> what the current state of null values in Flink is. While ignoring null
> values might be a good for not crashing your programs, null values are
> generally a bad way of signaling empty values for which better strategies
> are available. My intuition would be that it is a bit evil to support them
> in DataSets.
>
> Just to give an idea what null values could cause in Flink: DataSet.count()
> returns the number of elements of all values in a Dataset (null or not)
> while #834 would ignore null values and aggregate the DataSet without them.
>
> Best,
> Max
>

Reply via email to