Hi Tim!
You're probably refering to Kresten's talk "Riak on Drugs", where he
mentions schema evolution using read repair. Being on the same project,
allow me to elaborate.
To my knowledge, there is no schema evolution strategy defined by either
JSON or Protocol Buffers when dealing with distributed data. I.e. how to
declare schema versions, and upgrade data in distributed data stores.
An XML Schema can specify a version attribute, or put different versions
of a schema into different namespaces, which include dates or version
numbers. But you probably don't want to use XML Schema for a number of
reasons.
Protocol buffers provides a schema (.proto file), which can be compiled
into serialization code for a plethora of languages. Including java,
which we use in the health care system that we make for the Danish
government. PB tries to be pragmatic about schema changes, which is nice
- especially compared to the dreadful XML schemas and WSDL. You can add
a new optional field to your .proto file, and still deserialize old data
into the new schema. PB will barf if you deserialize fields not known in
your .proto, or if "required" fields are missing (we make all PB fields
"optional" to avoid this.)
You might as well use JSON or some other schema-defined serialization
format. PB is very space efficient and fast to (de)serialize. On the
other hand, JSON is easier to work with when it comes to Riak Map Reduce
jobs. Using the schema evolution process below, you can even start out
using JSON and then upgrade to for instance PB.
As Kresten mentions, we want to upgrade data to new schema versions
using read repair. Our thoughts on how to do this, is to store either a
header in the binary data or a metadata field on the Riak object with an
id to the schema, specifying how the binary data was serialized. This
schema id is read by the client, which determines if a newer schema
version exists, converts the data to the new version and stores the data
back.
Since we upgrade clients one at a time with no downtime, the entire
change procedure would be something like this:
1) Make new schema version.
2) Upgrade all clients - one at a time - so they can read the new schema
version, but still save data in the old version.
3) Enable clients to save data en the new schema version.
Data formatted in deprecated schemas, can be seen as a sort of entropy -
something you want to remove to achieve consistency - and to make your
clients simpler. Before we remove client support for the old schema
version, we need to make sure that all data has been upgraded, so we
need to rewrite the objects, that were not read repaired by normal daily
operations. We would probably do this using a java batch job, which was
fed all the keys.
4) Repair remaining keys.
5) Remove client support for old version schema.
If you like DomainBucket, which is part of the IRiak java client, you
can still use it and have you java model in PB or whatever. Put you
serialization code into a Converter like this:
Converter<PersonAggregate> personSerializer = new
Converter<PersonAggregate>() {
@Override
public IRiakObject fromDomain(PersonAggregate
personAggregate, VClock vClock) throws ConversionException {
String bucket = "person";
String key = personAggregate.getKey();
byte[] bytes = personAggregate.getBytes(); // Serialize
to bytes using PB or whatever
return RiakObjectBuilder.newBuilder(bucket,
key).withValue(bytes).build();
}
@Override
public PersonAggregate toDomain(IRiakObject riakObject)
throws ConversionException {
byte[] bytes = riakObject.getValue();
return PersonAggregate.deserialize(riakObjectValue); //
Deserialize to java
}
};
DomainBucket<PersonAggregate> personBucket = new
DomainBucketBuilder<PersonAggregate>(personBucket,
PersonAggregate.class).withConverter().build();
- Rune
PS: I like the flexibility of your aggregate-root java model, and
believe it will serialize nicely and efficiently into an Riak object.
--
Best regards / Venlig hilsen
*Rune Skou Larsen*
Trifork Public A/S / Team Riak
Margrethepladsen 4, 8000 Ã…rhus C, Denmark
Phone: +45 3160 2497 Skype: runeskoularsen twitter: @RuneSkouLarsen
On 24-07-2012 08:49, Tim Pedersen wrote:
Hi All,
I'm working on a Riak-based Java project that involves creating a
central repository of data (in Riak) aggregated from a number of other
databases (SQL). Right now the approach I'm taking is to build a set
of 'aggregate-root' style Java classes (e.g. a Person) that has lots
of aggregate/list classes hanging off it (e.g. KnownAddresses,
Offences, Charges, etc).
I'm using the Java client and am currently simply relying on the
out-of-the-box DomainBuckets and POJO/Jackson/JSON serialisation when
storing and retrieving these objects in Riak.
This app is likely to undergo (re)development over the next few years
- with plenty of changes to the Java class 'schemas', and thus a need
to deal with different versions of java classes over time.
From what I can see the standard DomainBucket/JSON deserialisation
relies on the target class not changing (is this right?)
I watched a video with Kresten Krab Thorup talking about the work with
the Danish Health dept; in his talk he mentioned they used Protocol
Buffers with a line about this enabling them to deal with versions??
I'm a n00b wrt Protocol Buffers - is this perhaps the way we should be
handling serialisation and versioning - instead of DomainBuckets??
Has anyone else tackled the problem of versioning of
schemas/classes/POJOs? What approaches have you taken (and did it work)?
Cheers,
Tim
---------------------------------------------------------------------------------
Tim Pedersen
Senior Developer
Information Technology Services
Department of Police and Emergency Management
47 Liverpool St, Hobart, Tasmania, 7000
Phone 03 6230 2465
Mobile 0428 336 670
Email tim.peder...@police.tas.gov.au
<mailto:tim.peder...@police.tas.gov.au>
----------------------------------------------------------------------------------
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com