Val, of course other options should be available, such as BinaryTypeConfiguration, and maybe field-level and class-level annotations.
On Thu, Jul 27, 2017 at 9:07 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Pavel, > > This forces user to implement Binarylizable for whole type in case they > want to change encoding for one-two fields, right? I really don't like it, > why not add default encoding to BinaryTypeConfiguration? > > -Val > > On Thu, Jul 27, 2017 at 7:54 AM, Pavel Tupitsyn <ptupit...@apache.org> > wrote: > > > > 1 byte for every field just for this > > GridBinaryMarshaller.STRING data type remains untouched. > > We add GridBinaryMarshaller.STRING_ENCODED, which has additional byte > for > > encoding type. > > > > This means no overhead for existing code. > > I think the most common use case is English, which uses 1 byte per char > in > > UTF-8. > > This is already as fast and compact as possible, and we don't want to > > introduce any lookup overhead here. > > > > And when user knows that their data will be more compact in some specific > > encoding, > > they use some BinaryWriter.writeString overload, which writes a different > > type code. > > > > Yes, it also writes an extra byte, but you save a byte per char of the > > actual string > > (for example, when using Windows-1251 for Russian text), so this does not > > matter. > > > > On Thu, Jul 27, 2017 at 5:35 PM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > Pavel, what would be the size overhead? Are we adding 1 byte for every > > > field just for this? If you would like to have this info in the binary > > > object directly, can we in this case have some bitmap of > > field-to-encoding? > > > > > > D. > > > > > > On Thu, Jul 27, 2017 at 9:22 AM, Pavel Tupitsyn <ptupit...@apache.org> > > > wrote: > > > > > > > I'm not sure I uderstand how this "per field" configuration is > supposed > > > to > > > > be implemented. > > > > * Marshaller is not tied to a cache. It serializes all kinds of > things, > > > > like compute job parameters and results. > > > > * Raw mode does not involve field names. > > > > > > > > Also it seems like a complicated and expensive solution - looking up > > > string > > > > format somewhere in the metadata will be slow. > > > > > > > > "encoded string" data type suggestion from Vladimir looks better to > me > > > from > > > > performance and implementation standpoint. > > > > > > > > Thanks, > > > > Pavel > > > > > > > > > > > > > > > > On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan < > > > dsetrak...@apache.org> > > > > wrote: > > > > > > > > > On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego <isap...@apache.org> > > > wrote: > > > > > > > > > > > Just a note from the platforms guy: > > > > > > > > > > > > Solution with table-level configuration is going to be > > significantly > > > > > > harder to implement for platforms and ODBC then field-level one. > > > > > > > > > > > > > > > > Igor, it seems like you are advocating the per-cell configuration, > > not > > > > > per-field one. The per-field configuration can be defined at the > > > > > table/cache level. > > > > > > > > > > I see your point about C++ and .NET integrations however. Can't we > > > > provide > > > > > this info at node-join time or table-creation time? This way all > > nodes > > > > will > > > > > receive it and you will be able to grab it on different platforms. > > > > > > > > > > > > > > > > > > > > > > Also, what about binary objects, which are not stored in cache, > > > > > > but being marshalled? > > > > > > > > > > > > > > > > I think the default system encoding should be used here. If we > don't > > > have > > > > > configuration for default encoding, we should add it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards, > > > > > > Igor > > > > > > > > > > > > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan < > > > > > dsetrak...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur < > > > > > daradu...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Encoding must be set on per field basis. This will give us > as > > > > most > > > > > > > > flexible > > > > > > > > > solution at the cost of 1-byte overhead. > > > > > > > > > > > > > > > > > Vova, I agree that the encoding should be set on per-field > > > basis, > > > > > but > > > > > > > at > > > > > > > > > the table level, not at a cell level. > > > > > > > > > > > > > > > > Dmitriy, Vladimir, > > > > > > > > Let's use both approaches :-) > > > > > > > > We can add parameter to CacheConfiguration. > > > > > > > > If parameter specifie to use cache level encoding then > > marshaller > > > > > will > > > > > > > use > > > > > > > > encoding in a cache, > > > > > > > > otherwise marshaller will use per-field encoding. > > > > > > > > Of course only if it doesn't complicate the solution. > > > > > > > > > > > > > > > > > > > > > > > I think that it will complicate the solution and will > complicate > > > the > > > > > > > marshalling protocol. The advantage of specifying the encoding > at > > > > > > > table/cache level is that we don't need to add extra encoding > > bytes > > > > to > > > > > > the > > > > > > > marshalling protocol. > > > > > > > > > > > > > > I think Vova was suggesting encoding at the cell level, not at > > the > > > > > field > > > > > > > level, which seems to be redundant to me. > > > > > > > > > > > > > > Vova, do you agree? > > > > > > > > > > > > > > > > > > > > > > > > > > > >