Re: Best way of versioning POJOs/'schemas'

Rune Skou Larsen Tue, 24 Jul 2012 06:39:42 -0700

Hi Tim!

You're probably refering to Kresten's talk "Riak on Drugs", where hementions schema evolution using read repair. Being on the same project,allow me to elaborate.

To my knowledge, there is no schema evolution strategy defined by eitherJSON or Protocol Buffers when dealing with distributed data. I.e. how todeclare schema versions, and upgrade data in distributed data stores.An XML Schema can specify a version attribute, or put different versionsof a schema into different namespaces, which include dates or versionnumbers. But you probably don't want to use XML Schema for a number ofreasons.

Protocol buffers provides a schema (.proto file), which can be compiledinto serialization code for a plethora of languages. Including java,which we use in the health care system that we make for the Danishgovernment. PB tries to be pragmatic about schema changes, which is nice- especially compared to the dreadful XML schemas and WSDL. You can adda new optional field to your .proto file, and still deserialize old datainto the new schema. PB will barf if you deserialize fields not known inyour .proto, or if "required" fields are missing (we make all PB fields"optional" to avoid this.)

You might as well use JSON or some other schema-defined serializationformat. PB is very space efficient and fast to (de)serialize. On theother hand, JSON is easier to work with when it comes to Riak Map Reducejobs. Using the schema evolution process below, you can even start outusing JSON and then upgrade to for instance PB.

As Kresten mentions, we want to upgrade data to new schema versionsusing read repair. Our thoughts on how to do this, is to store either aheader in the binary data or a metadata field on the Riak object with anid to the schema, specifying how the binary data was serialized. Thisschema id is read by the client, which determines if a newer schemaversion exists, converts the data to the new version and stores the databack.

Since we upgrade clients one at a time with no downtime, the entirechange procedure would be something like this:


1) Make new schema version.

2) Upgrade all clients - one at a time - so they can read the new schemaversion, but still save data in the old version.

3) Enable clients to save data en the new schema version.

Data formatted in deprecated schemas, can be seen as a sort of entropy -something you want to remove to achieve consistency - and to make yourclients simpler. Before we remove client support for the old schemaversion, we need to make sure that all data has been upgraded, so weneed to rewrite the objects, that were not read repaired by normal dailyoperations. We would probably do this using a java batch job, which wasfed all the keys.


4) Repair remaining keys.
5) Remove client support for old version schema.

If you like DomainBucket, which is part of the IRiak java client, youcan still use it and have you java model in PB or whatever. Put youserialization code into a Converter like this:

Converter<PersonAggregate> personSerializer = newConverter<PersonAggregate>() {


            @Override

public IRiakObject fromDomain(PersonAggregatepersonAggregate, VClock vClock) throws ConversionException {

                String bucket = "person";
                String key = personAggregate.getKey();

byte[] bytes = personAggregate.getBytes(); // Serializeto bytes using PB or whateverreturn RiakObjectBuilder.newBuilder(bucket,key).withValue(bytes).build();

            }

            @Override

public PersonAggregate toDomain(IRiakObject riakObject)throws ConversionException {

                byte[] bytes = riakObject.getValue();

return PersonAggregate.deserialize(riakObjectValue); //Deserialize to java

            }
        };

DomainBucket<PersonAggregate> personBucket = newDomainBucketBuilder<PersonAggregate>(personBucket,PersonAggregate.class).withConverter().build();




- Rune

PS: I like the flexibility of your aggregate-root java model, andbelieve it will serialize nicely and efficiently into an Riak object.


--

Best regards / Venlig hilsen

*Rune Skou Larsen*
Trifork Public A/S / Team Riak
Margrethepladsen 4, 8000 Århus C, Denmark
Phone: +45 3160 2497    Skype: runeskoularsen   twitter: @RuneSkouLarsen



On 24-07-2012 08:49, Tim Pedersen wrote:

Hi All,
I'm working on a Riak-based Java project that involves creating acentral repository of data (in Riak) aggregated from a number of otherdatabases (SQL). Right now the approach I'm taking is to build a setof 'aggregate-root' style Java classes (e.g. a Person) that has lotsof aggregate/list classes hanging off it (e.g. KnownAddresses,Offences, Charges, etc).
I'm using the Java client and am currently simply relying on theout-of-the-box DomainBuckets and POJO/Jackson/JSON serialisation whenstoring and retrieving these objects in Riak.
This app is likely to undergo (re)development over the next few years- with plenty of changes to the Java class 'schemas', and thus a needto deal with different versions of java classes over time.
From what I can see the standard DomainBucket/JSON deserialisationrelies on the target class not changing (is this right?)
I watched a video with Kresten Krab Thorup talking about the work withthe Danish Health dept; in his talk he mentioned they used ProtocolBuffers with a line about this enabling them to deal with versions??I'm a n00b wrt Protocol Buffers - is this perhaps the way we should behandling serialisation and versioning - instead of DomainBuckets??
Has anyone else tackled the problem of versioning ofschemas/classes/POJOs? What approaches have you taken (and did it work)?
Cheers,

Tim


---------------------------------------------------------------------------------
Tim Pedersen

Senior Developer
Information Technology Services
Department of Police and Emergency Management

47 Liverpool St, Hobart, Tasmania, 7000

Phone 03 6230 2465
Mobile 0428 336 670
Email tim.peder...@police.tas.gov.au<mailto:tim.peder...@police.tas.gov.au>
----------------------------------------------------------------------------------

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Best way of versioning POJOs/'schemas'

Reply via email to