Github user Shiti commented on the pull request:

    https://github.com/apache/flink/pull/867#issuecomment-115936916
  
    @StephanEwen, Apologies, I didn't notice the earlier message in jira. 
Something wrong with my GMail settings, most of the messages from jira and 
mailing list went into Spam.
    
    Kindly excuse my limited understanding of this framework and the 
intention/drivers behind the decisions made. 
    
    Going through the mailing list and the ticket I realized that though there 
may be some valid cases of missing data types, it will not be desirable to 
change the `TupleTypeInfo` and the whole Tuple/Case Class Serialization 
code-base to support null and we should identify an alternative approach to 
handle this.
    
    From my limited understanding, the recommended way of working with missing 
values is to use `(Option[Int], Option[Int]])` instead of `(Int, Int)`, when we 
know there can be missing values in the data. Is that correct?
    
    If that is correct, I have a few doubts,
    
    1. Doesn't this push the handling of missing data to the application code 
(which may be good or bad), but makes the application code more verbose?
    2. Wouldn't the size of Option[Int] in memory (and also in serialization) 
be more than just Int?
    3. If Flink does not support null values except for in the Table API, 
won’t there be inconsistency when users try to convert a `Table` to a 
`DataSet[Tuple]`? 
    
    One alternative approach I can think of is introducing another TypeInfo 
which supports null values (say TupleTypeInfoWithNull) so users can choose to 
use that when they know/think that the data may contain null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to