[ https://issues.apache.org/jira/browse/ARROW-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192711#comment-15192711 ]
Wes McKinney commented on ARROW-62: ----------------------------------- Since we already have production code (i.e. Drill) using 0 as null, and it's consistent with Postgres, I'm inclined to stick with that. I expect that the null bitmap will also be used in practice in conjunction with evaluated predicates, so in aggregations you will include values that are included and not null. If nulls are 1, then you need to use {{included[i] & ~nulls[i]}} versus {{included[i] & valid[i]}} > Format: Are the nulls bits 0 or 1 for null values? > -------------------------------------------------- > > Key: ARROW-62 > URL: https://issues.apache.org/jira/browse/ARROW-62 > Project: Apache Arrow > Issue Type: Bug > Components: Format > Reporter: Wes McKinney > Assignee: Wes McKinney > > As brought up by Dan Robinson on the mailing list (thank you for catching > this!), there is an inconsistency in the format documents in the > representation of nulls with the ValueVectors code import -- since I drafted > these format documents initially I'll take the blame for the inconsistency, > but: > * Drill / ValueVectors uses the value 0 for null data, and 1 for non-null data > * The format document currently states the opposite (values are null if the > bit is set) > I can see arguments both ways, but one argument for the ValueVectors style is > that values must be explicitly set to be non-null, versus uninitialized > values being accidentally interpreted as being non-null. When initializing a > bitmap, one can {{memset}} the bits to 0, then set then to 1 when non-null > values are appended during construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)