hi Pino, can you reply on JIRA for the sake of keeping the discussion
in one place?

>From what I know about sector categorization I think this is a
slightly separate question -- here we are only concerned with the
metadata and memory representation of data with a fixed number of
categories (where the categories have some semantic meaning in the the
analysis, e.g. ordering).

On Fri, Aug 19, 2016 at 8:18 AM, pino patera <pino.pat...@gmail.com> wrote:
> For the Financial World, category time series are very important (i.e.
> industry/sector categories are different over time). How would this
> structure look like in this scenario?
>
> On Fri, Aug 19, 2016 at 5:12 PM Jacques Nadeau (JIRA) <j...@apache.org>
> wrote:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428316#comment-15428316
>> ]
>>
>> Jacques Nadeau commented on ARROW-81:
>> -------------------------------------
>>
>> Can you guys provide two small example datasets in JSON format here?
>>
>> > C++: Add a Category nested type
>> > -------------------------------
>> >
>> >                 Key: ARROW-81
>> >                 URL: https://issues.apache.org/jira/browse/ARROW-81
>> >             Project: Apache Arrow
>> >          Issue Type: New Feature
>> >          Components: C++
>> >            Reporter: Wes McKinney
>> >            Assignee: Wes McKinney
>> >
>> > A Category (or "factor") is a dictionary-encoded array whose dictionary
>> has semantic meaning. The data consists of
>> > - An array of integer "codes"
>> > - A child array of some other type, known as the "categories" or
>> "levels" of the array. Typically there is an "ordered" boolean flag
>> indicating whether the order of the categories is meaningful.
>> > Category/factor types are used in a number of common statistical
>> analyses. See, for example,
>> http://www.voteview.com/R_Ordered_Logistic_or_Probit_Regression.htm. It
>> is a basic requirement for Python and R, at least, as Arrow C++ consumers,
>> to have this type. Separately, we should consider what is necessary to be
>> able to transmit category data in IPCs -- possible an expansion of the
>> Arrow format.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>

Reply via email to