[ 
https://issues.apache.org/jira/browse/IGNITE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov reassigned IGNITE-1549:
---------------------------------------

    Assignee: Vladimir Ozerov

> Optimize portable object fields write in non-raw mode.
> ------------------------------------------------------
>
>                 Key: IGNITE-1549
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1549
>             Project: Ignite
>          Issue Type: Task
>          Components: general
>    Affects Versions: 1.1.4
>            Reporter: Vladimir Ozerov
>            Assignee: Vladimir Ozerov
>            Priority: Blocker
>             Fix For: ignite-1.5
>
>
> Currently we write user fields as follows:
> 0 ,, 3 - field ID;
> 4 - field type;
> 5 ..8 - field len;
> 9 .. - the field itself.
> It can be optimized as follows:
> 1) Field len usually can be inferred from type. E.g., for int it is 4.
> 2) Frequently used constants can be written as separate types. E.g. INT - 
> normal int, INT_0 - zero, etc.
> 3) Last, but not least, values should be encoded using "variable bytes" (and 
> possibly ZigZag) algorithm. This will give us 2 bytes economy for ints and 
> longs on average (I assume here that longs are usually bigger than 4 bytes, 
> e.g. timestamps).
> *New types will be introduced:*
> 1) Booleans: BOOL_FALSE, BOOL_TRUE;
> 2) Bytes: BYTE_C0 => zero, BYTE_C1 => 1, BYTE_C1N => -1;
> 3) Shorts, chars: SHORT_C0, SHORT_C1, SHORT_C1N;
> 4) Ints: INT_C0, INT_C1, INT_C1N, INT_1 - int which fits into 1 byte, INT_1N 
> - same for negative value, INT_2, INT_2N, INT_3, INT_3N, INT_3, INT_3N, 
> INT_4, INT_4N.
> 5) Longs: same as ints, but have only 2, 4, 6 and 8 byte count discriminators 
> to avoid excessive calculations.
> It means that instead of 6 integer types previously, we will have 2 + 3 + 3 + 
> 3 + 11 + 11 = 32 types.
> To avoid excessive switches or (even worse) array/map lookups to understand 
> what the type is, we can divide all types space (256) into two parts: 
> optimized and non-optimized. Optimized space will have the MSB set to 1, and 
> mentioned ~30 optimized types (or some of them) are located there.
> For floats and doubles we simply infer length. 
> For primitive arrays we do not write field length and then arrya length, but 
> only array length.
> *Expected compaction*:
> bool: 10 -> 5 bytes (50%);
> byte: 10 -> 5-6 bytes (45%);
> short, char: 11 -> 5-7 bytes, 7 on average (35%);
> int: 13 -> 5-9 bytes, 7 on average (45%).
> long: 17 -> 5-13 bytes, 11 on average (35%).
> float: 13 -> 9 bytes (30%);
> double: 17 -> 13 bytes (25%);
> *Expected CPU overhead on writes:*
> Bool, float, double: -
> Byte, short, char: zero check, sign check;
> Int, long: two (shift + OR)s to understand bytes count, if small - "zero" and 
> "one" checks, if big - sign check,
> *Expected CPU overhead on reads:*
> One additional branch between optimzied and non-optimized spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to