> String encoding is a concept similar to "collation" in RDBMS. You can > define it either globally, or on per-table basis.
Or on per-column (per-field) basis. Though Oracle does not have per-column charset, some other databases provide this option. MySQL: - https://dev.mysql.com/doc/refman/5.7/en/create-table.html | CHAR[(length)] [BINARY] [CHARACTER SET charset_name] [COLLATE collation_name] | VARCHAR(length) [BINARY] [CHARACTER SET charset_name] [COLLATE collation_name] | TEXT [BINARY] [CHARACTER SET charset_name] [COLLATE collation_name] SQL Server: - https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql <column_definition> ::= column_name <data_type> [ FILESTREAM ] [ COLLATE collation_name ] Postgres: - https://www.postgresql.org/docs/9.6/static/sql-createtable.html CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [ { column_name data_type [ COLLATE collation ] > 1) I have a class Person with field "name". I have two caches/tables - one > for US persons, where name is in Latin, another for RU persons with > Cyrillic names. How can achieve optimal encoding formats for both tables? You have to have two classes in this case, maybe with a common parent. Or you have to select a common denominator and settle with one encoding for both of them. Like Java did with UTF-16 java.util.String-s. — Artem Schitow artem.schi...@gmail.com > On 28 Jul 2017, at 14:45, Vladimir Ozerov <voze...@gridgain.com> wrote: > > String encoding is a concept similar to "collation" in RDBMS. You can > define it either globally, or on per-table basis. The same should be done > for Ignite. We do not define behavior of a type. We define behavior of a > *storage*. > > Two cases when proposed approach with per-type and per-type-field approach > doesn't work: > 1) I have a class Person with field "name". I have two caches/tables - one > for US persons, where name is in Latin, another for RU persons with > Cyrillic names. How can achieve optimal encoding formats for both tables? > 2) I have an empty grid. Now I want to create a cache/table with custom > encoding. How can I do that without cluster restart? Nohow, because > BinaryTypeConfiguration configured statically, while caches/tables can be > created in runtime. > > On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <ptupit...@apache.org> > wrote: > >>> As Pavel mentioned, Marshaller should not be tied to cache >>> should be added to per-cache level >> Not sure if I follow. >> Marshalling and caching are two separate mechanisms. >> Defining binary format in CacheConfiguration violates separation of >> concerns. >> >>> Encoding *must not* be added to per-class or per-field level, this is >> wrong >> What is wrong with this? BinaryTypeConfiguration looks the right place for >> such a setting. >> Are we talking from SQL standpoint here, so you want this to be defined >> somehow via DDL in future? >> >> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <voze...@gridgain.com> >> wrote: >> >>> Encoding *must not* be added to per-class or per-field level, this is >>> wrong. >>> >>> It should be added to per-cache level, and to per-cache-column level in >>> future. >>> >>> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <stku...@gmail.com>: >>> >>>> We discussed this with Pavel and Anton just a moment ago. Summary >>> follows. >>>> >>>> - New byte "flag" is to be added (ENCODED_STRING) >>>> - 'Encoding' property is to be added at >>>> -- global level (BinaryConfiguration) >>>> -- per-class level (BinaryTypeConfiguration) >>>> -- per-field level (BinaryTypeConfiguration) >>>> >>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite >>> Developers] < >>>> ml+s2346864n20159...@n4.nabble.com>: >>>> >>>>> As Pavel mentioned, Marshaller should not be tied to cache, >>> BinaryObject >>>>> should be self-explanatory, i.e. containing all information necessary >>> for >>>>> unmarshalling. This is an absolute requirement. >>>>> >>>>> We will have one extra byte for in serialized form, meaning that >>>> advantage >>>>> of custom encoding will become evident for all strings with length >= >>> 1, >>>>> which is perfectly fine. I do not quite understand what are we >> arguing >>>>> about. >>>>> >>>>> As far as configuration, we can do it as follows: >>>>> >>>>> 1) Add global encoding, UTF8 by default. >>>>> 2) Add per-cache encoding. >>>>> 3) Add encoding to JDBC and ODBC driver properties. >>>>> >>>>> This should be enough. >>>>> >>>>> >>>> -- >>>> Best regards, >>>> Andrey Kuznetsov. >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-ignite-developers.2346864.n4.nabble. >>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller- >>> IGNITE-5655-tp20024p20161.html >>>> Sent from the Apache Ignite Developers mailing list archive at >>> Nabble.com. >>> >>