Hi Tim!

You're probably refering to Kresten's talk "Riak on Drugs", where he mentions schema evolution using read repair. Being on the same project, allow me to elaborate.

To my knowledge, there is no schema evolution strategy defined by either JSON or Protocol Buffers when dealing with distributed data. I.e. how to declare schema versions, and upgrade data in distributed data stores. An XML Schema can specify a version attribute, or put different versions of a schema into different namespaces, which include dates or version numbers. But you probably don't want to use XML Schema for a number of reasons.

Protocol buffers provides a schema (.proto file), which can be compiled into serialization code for a plethora of languages. Including java, which we use in the health care system that we make for the Danish government. PB tries to be pragmatic about schema changes, which is nice - especially compared to the dreadful XML schemas and WSDL. You can add a new optional field to your .proto file, and still deserialize old data into the new schema. PB will barf if you deserialize fields not known in your .proto, or if "required" fields are missing (we make all PB fields "optional" to avoid this.)

You might as well use JSON or some other schema-defined serialization format. PB is very space efficient and fast to (de)serialize. On the other hand, JSON is easier to work with when it comes to Riak Map Reduce jobs. Using the schema evolution process below, you can even start out using JSON and then upgrade to for instance PB.

As Kresten mentions, we want to upgrade data to new schema versions using read repair. Our thoughts on how to do this, is to store either a header in the binary data or a metadata field on the Riak object with an id to the schema, specifying how the binary data was serialized. This schema id is read by the client, which determines if a newer schema version exists, converts the data to the new version and stores the data back.

Since we upgrade clients one at a time with no downtime, the entire change procedure would be something like this:

1) Make new schema version.
2) Upgrade all clients - one at a time - so they can read the new schema version, but still save data in the old version.
3) Enable clients to save data en the new schema version.

Data formatted in deprecated schemas, can be seen as a sort of entropy - something you want to remove to achieve consistency - and to make your clients simpler. Before we remove client support for the old schema version, we need to make sure that all data has been upgraded, so we need to rewrite the objects, that were not read repaired by normal daily operations. We would probably do this using a java batch job, which was fed all the keys.

4) Repair remaining keys.
5) Remove client support for old version schema.



If you like DomainBucket, which is part of the IRiak java client, you can still use it and have you java model in PB or whatever. Put you serialization code into a Converter like this:

Converter<PersonAggregate> personSerializer = new Converter<PersonAggregate>() {

            @Override
public IRiakObject fromDomain(PersonAggregate personAggregate, VClock vClock) throws ConversionException {
                String bucket = "person";
                String key = personAggregate.getKey();
byte[] bytes = personAggregate.getBytes(); // Serialize to bytes using PB or whatever return RiakObjectBuilder.newBuilder(bucket, key).withValue(bytes).build();
            }

            @Override
public PersonAggregate toDomain(IRiakObject riakObject) throws ConversionException {
                byte[] bytes = riakObject.getValue();
return PersonAggregate.deserialize(riakObjectValue); // Deserialize to java
            }
        };

DomainBucket<PersonAggregate> personBucket = new DomainBucketBuilder<PersonAggregate>(personBucket, PersonAggregate.class).withConverter().build();



- Rune

PS: I like the flexibility of your aggregate-root java model, and believe it will serialize nicely and efficiently into an Riak object.

--

Best regards / Venlig hilsen

*Rune Skou Larsen*
Trifork Public A/S / Team Riak
Margrethepladsen 4, 8000 Ã…rhus C, Denmark
Phone: +45 3160 2497    Skype: runeskoularsen   twitter: @RuneSkouLarsen



On 24-07-2012 08:49, Tim Pedersen wrote:
Hi All,

I'm working on a Riak-based Java project that involves creating a central repository of data (in Riak) aggregated from a number of other databases (SQL). Right now the approach I'm taking is to build a set of 'aggregate-root' style Java classes (e.g. a Person) that has lots of aggregate/list classes hanging off it (e.g. KnownAddresses, Offences, Charges, etc).

I'm using the Java client and am currently simply relying on the out-of-the-box DomainBuckets and POJO/Jackson/JSON serialisation when storing and retrieving these objects in Riak.

This app is likely to undergo (re)development over the next few years - with plenty of changes to the Java class 'schemas', and thus a need to deal with different versions of java classes over time.

From what I can see the standard DomainBucket/JSON deserialisation relies on the target class not changing (is this right?)

I watched a video with Kresten Krab Thorup talking about the work with the Danish Health dept; in his talk he mentioned they used Protocol Buffers with a line about this enabling them to deal with versions?? I'm a n00b wrt Protocol Buffers - is this perhaps the way we should be handling serialisation and versioning - instead of DomainBuckets??

Has anyone else tackled the problem of versioning of schemas/classes/POJOs? What approaches have you taken (and did it work)?

Cheers,

Tim


---------------------------------------------------------------------------------
Tim Pedersen

Senior Developer
Information Technology Services
Department of Police and Emergency Management

47 Liverpool St, Hobart, Tasmania, 7000

Phone 03 6230 2465
Mobile 0428 336 670
Email tim.peder...@police.tas.gov.au <mailto:tim.peder...@police.tas.gov.au>
----------------------------------------------------------------------------------





_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to