The example of SQL is obviously dominating thoughts of NULL, but I think that the example of R is probably better in terms of how things can work fairly well. NULL is a key concept and very helpful in a number of settings. With R's fairly simple functional nature it is easy to filter data and most functions have options to adjust treatment of null. R also has the value NA which is intended for the representation of missing values. R also supports NaN (not-a-number) for the result of undefined numerical operations.
My own thought is based on the fact that I am rarely surprised by NULL, NA and NaN in R and often surprised by NULL in SQL. I don't have strong conclusions from that other than to think that R did something right and SQL did something wrong (from my point of view). None of this matters if compatibility with SQL is the primary requirement. In that case, I say just do it. On Mon, Jun 15, 2015 at 8:45 AM, Maximilian Michels <m...@apache.org> wrote: > Hi everyone, > > I'm seeing a lot of null value related pull requests nowadays, like these: > > https://github.com/apache/flink/pull/780 > https://github.com/apache/flink/pull/831 > https://github.com/apache/flink/pull/834 > > It used to be the case that null values were simply not supported by Flink. > Recently, Flink supports null values for some components. Now I'm wondering > what the current state of null values in Flink is. While ignoring null > values might be a good for not crashing your programs, null values are > generally a bad way of signaling empty values for which better strategies > are available. My intuition would be that it is a bit evil to support them > in DataSets. > > Just to give an idea what null values could cause in Flink: DataSet.count() > returns the number of elements of all values in a Dataset (null or not) > while #834 would ignore null values and aggregate the DataSet without them. > > Best, > Max >