[ https://issues.apache.org/jira/browse/IGNITE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Ozerov reassigned IGNITE-1549: --------------------------------------- Assignee: Vladimir Ozerov > Optimize portable object fields write in non-raw mode. > ------------------------------------------------------ > > Key: IGNITE-1549 > URL: https://issues.apache.org/jira/browse/IGNITE-1549 > Project: Ignite > Issue Type: Task > Components: general > Affects Versions: 1.1.4 > Reporter: Vladimir Ozerov > Assignee: Vladimir Ozerov > Priority: Blocker > Fix For: ignite-1.5 > > > Currently we write user fields as follows: > 0 ,, 3 - field ID; > 4 - field type; > 5 ..8 - field len; > 9 .. - the field itself. > It can be optimized as follows: > 1) Field len usually can be inferred from type. E.g., for int it is 4. > 2) Frequently used constants can be written as separate types. E.g. INT - > normal int, INT_0 - zero, etc. > 3) Last, but not least, values should be encoded using "variable bytes" (and > possibly ZigZag) algorithm. This will give us 2 bytes economy for ints and > longs on average (I assume here that longs are usually bigger than 4 bytes, > e.g. timestamps). > *New types will be introduced:* > 1) Booleans: BOOL_FALSE, BOOL_TRUE; > 2) Bytes: BYTE_C0 => zero, BYTE_C1 => 1, BYTE_C1N => -1; > 3) Shorts, chars: SHORT_C0, SHORT_C1, SHORT_C1N; > 4) Ints: INT_C0, INT_C1, INT_C1N, INT_1 - int which fits into 1 byte, INT_1N > - same for negative value, INT_2, INT_2N, INT_3, INT_3N, INT_3, INT_3N, > INT_4, INT_4N. > 5) Longs: same as ints, but have only 2, 4, 6 and 8 byte count discriminators > to avoid excessive calculations. > It means that instead of 6 integer types previously, we will have 2 + 3 + 3 + > 3 + 11 + 11 = 32 types. > To avoid excessive switches or (even worse) array/map lookups to understand > what the type is, we can divide all types space (256) into two parts: > optimized and non-optimized. Optimized space will have the MSB set to 1, and > mentioned ~30 optimized types (or some of them) are located there. > For floats and doubles we simply infer length. > For primitive arrays we do not write field length and then arrya length, but > only array length. > *Expected compaction*: > bool: 10 -> 5 bytes (50%); > byte: 10 -> 5-6 bytes (45%); > short, char: 11 -> 5-7 bytes, 7 on average (35%); > int: 13 -> 5-9 bytes, 7 on average (45%). > long: 17 -> 5-13 bytes, 11 on average (35%). > float: 13 -> 9 bytes (30%); > double: 17 -> 13 bytes (25%); > *Expected CPU overhead on writes:* > Bool, float, double: - > Byte, short, char: zero check, sign check; > Int, long: two (shift + OR)s to understand bytes count, if small - "zero" and > "one" checks, if big - sign check, > *Expected CPU overhead on reads:* > One additional branch between optimzied and non-optimized spaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)