Re: [appengine-java] Objectify - Twig - approaches to persistence

Scott Hernandez Fri, 12 Mar 2010 01:36:26 -0800

I hope the general public is enjoying this discussion. There *are*
lots of useful points in this thread, really :) It isn't just about
Twig and Objectify; it is clearly coming down to "philosophical
differences".

On Fri, Mar 12, 2010 at 6:52 AM, John Patterson <[email protected]> wrote:
>
> On 12 Mar 2010, at 13:01, Scott Hernandez wrote:
>>
>> We have a different idea about live systems and managing
>> upgrades/deployments. To answer the question below, you can always
>> upgrade the data in place because you will always need a way to load
>> that data into the current object representation. It just may be that
>> you need to take the system offline to migrate the data.
>
> In my app taking the system off-line while re-processing the data is not an
> option.  The actual reprocessing can take days.
>
> I see your point though.  My data is mainly static so I don't have the
> issues you describe of keeping the new version in sync with the live
> version.  However, I can think of several solutions to this while still
> having the safety advantage of independent "tables" in the datastore.
>
> The simple way would be to, during the migration period, create both a
> Person and a PersonNew whenever a user makes an edit.
>
> Another way would be to do this transparently by creating a simple "forked
> translator" that stored the Person as two Entities instead of one.  I won't
> go into details but it is certainly not a major extension.
>
> I would probably go for the first method if it was just for my app - but
> this is the kind of feature that could be pushed into the framework to make
> it easier for others to benefit too.

Right, that brings up another issue. Code maintenance. You are now
changing the types (two versions are represented by two different java
classes in your project at the same time) of your pojos. If you pass
those objects through many layers of code (Entities -> DAO ->
Service-Framework -> GWT/JS) you might need to either abstract the
implementation or change the objects across all the layers of your
application. This may be a good practice in large, versioned systems,
but in small apps requiring that schema changes may require doubling
your entity classes, and possibly changing your interfaces with a new
(temporary) classes will probably only make things more complicated.
The code is going to get more complicated, and possibly in the wrong
places.

I would assume you do something like this when versioning an object:
1) Duplicate the class code 2) renaming it to something temporary
(with the old version number), and 3) incrementing the version on the
original class. Now what? It gets complicated. You now have two
versions of an object, where only the old one has data until you do a
mass migration. If you do just in time migration you have all kinds of
issues and you are back to a solution like we have come up with in
Objectify. If you wait and do a full data migration then you have your
data offline until that migration is done.

It seems like the better solution if you are concerned about
data-loss, and creating backups, would be having a goal of in-place
migration with a backup option. The framework could create a temporary
(backup) copy of the old state (call it <kind>-migration-x, or
something else unlikely to be used). Then you can clean-up your backup
data when things go well, or you can reprocess the backup data if
there are problems (the first n times).

Also, it isn't that there is no testing for me, using Objectify. You
can easily write unit tests that populate the local datastore with the
old version of your data and then runs tests for the migration
(in-place, or otherwise). Either way you need to be careful when
migrating data from one schema to another. It is the same in sql, or
any database. It would be much more helpful if app-engine had some
sort of backup, or snapshot, for us to leverage.

>>> The versions remain completely separate.  Modifying live data in place
>>> gives
>>> me heart pains.
>>
>> Duplicating data (during upgrades) is unacceptable, for my app. It may
>> be safer to leave the old data, but it is not always possible.
>
> There is no maximum stored data on App Engine.  If you delete the data after
> one day is that still a problem?
>

There may be no limit, but there is a cost. The more data you have to
move, the slower it will be. I don't disagree, full data migration is
probably best. If you can take the system down for a short enough
period, why not reprocess and migrate all the data?

>> I have
>> a *lot* of data, that was costly to generate (both in terms of network
>> and cpu) and to store. Jeff, and others on the Objectify list, have
>> spent a lot of time working on a solution with the goal of keeping the
>> app up during an upgrade (in simple upgrades), without needing to
>> migrate all the existing data first, which would require downtime.
>
> Ok I understand.  But that is a big trade-off - saving a day or two worth of
> storage costs in exchange for testability, ability-to-revert and the peace
> of mind that a coding error might corrupt your live data.

Unfortunately when the user comes to my site I need that data
available (and migrated) immediately. If I don't have the data then my
app is broken. In fact, when I migrate data I sometimes have to do
multiple queries if I've moved indexed data (renaming a property, or
if I've moved it to a new/another entity), until my full data
migration is done. These are the costs to keeping the system working,
and the data live, throughout schema changes.

But you have those issues either way (in-place migration, taking the
system off-line to migrate data, or keeping a backup and starting from
scratch). A coding error can corrupt your data, period.

I chose to test locally (in development mode), and then I usually
limit my live (production) testing to a few accounts on different
deployed version, if I can, at first. Some external services depend on
a specific domain name and I can't run that against a different
deployed version because app-engine doesn't support mapping custom
domain names to anything but the default version. (Please star this if
you want that to change,
http://code.google.com/p/googleappengine/issues/detail?id=2878). There
doesn't seem to be any simple and clean answer to data migration, ever
:)

>> If the documentation was *very* clear about this maybe, but as Jeff
>> said, Hibernates biggest complaint was from new users who got LazyInit
>> exceptions. This is the same thing you have created, without the
>> benefit of having an exception. You may just get passed an instance of
>> an uninitialized object if you aren't careful.
>
> Seems we have both claimed that in Objectify and Twig a careless user could
> delete their own data.  At the end of the day, you can never guard against
> that completely in either framework.

What was it with Objectify again? (I seem to have lost that part of
the thread... ya'know, in the past 40 messages ;)

Was it if you have two different instances of the same object - pulled
from the datastore at different times? - that have both been edited
and the last one saved wins?

> That still does not mean that object activation is not an elegant solution
> to the potential problem of loading too much data.  But keep in mind that
> this is not on by default - by default all data will be activated.

That is good.

I, unlike Jeff, see possibilities for some of Twigs features (like
activation, or id/key-less entities, or skipping the key references
all together and returning the real object - and these features take
you into dirty detection and much more complicated rules and states,
which is yet another concern).

But ...  we walk a fine line in offering tools to shoot yourself in
the foot; Objectify requires you to try harder before letting you
(accidentally) shoot your own foot; (at least) we try to not hide
where are  aiming your gun.

> I expect that most apps will not even use this feature but it is there as an
> optimisation if needed.

Yep, we feel the same way about using Keys. ;)

With activation I think people will be using it. When you have a
system where you have to get the object graph (by default), well, you
will have plenty of cases you just want one/two levels deep
(activated). That is the choice we made in Objectify. Everything is
very clear. Keys are references; if you want the referenced object,
you have to get it. It is a difference in philosophy, and predicted
use case. In Objectify you always get whole objects and you chose when
you get them.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Re: [appengine-java] Objectify - Twig - approaches to persistence

Reply via email to