[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429218#comment-15429218 ]
Julien Le Dem edited comment on ARROW-81 at 8/20/16 4:53 AM: ------------------------------------------------------------- [~wesmckinn] and [~emkornfi...@gmail.com]: the proposal of adding a categorical_type field to Field sounds good to me. when it is set then it is required that the field is dictionary encoded. was (Author: julienledem): [~wesmckinn] and [~emkornfi...@gmail.com]: the proposal of adding categorical_type fiel to Field sounds good to me. when it is set then it is required that the field is dictionary encoded. > [Format] Add a Category logical type (distinct from dictionary-encoding) > ------------------------------------------------------------------------ > > Key: ARROW-81 > URL: https://issues.apache.org/jira/browse/ARROW-81 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Wes McKinney > Assignee: Wes McKinney > > A Category (or "factor") is a dictionary-encoded array whose dictionary has > semantic meaning. The data consists of > - An array of integer "codes" > - A child array of some other type, known as the "categories" or "levels" of > the array. Typically there is an "ordered" boolean flag indicating whether > the order of the categories is meaningful. > Category/factor types are used in a number of common statistical analyses. > See, for example, > http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It is a > basic requirement for Python and R, at least, as Arrow C++ consumers, to have > this type. Separately, we should consider what is necessary to be able to > transmit category data in IPCs -- possible an expansion of the Arrow format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)