On Fri, Mar 12, 2010 at 8:35 PM, John Patterson <[email protected]> wrote:
>
> On 12 Mar 2010, at 16:28, Jeff Schnitzer wrote:
>
> Look at these graphs:
>
> http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-get-latency
> http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-query-latency
>
> Notice that a get()'s average latency is 50ms and a query()'s average
> latency is 500ms.  Last week the typical query was averaging
> 800-1000ms with frequent spikes into 1200ms or so.
>
> "You are increasing my suspicion that you have never worked" with an
> application that queries large amounts of data.  If your queries are taking
> anywhere near 1000 ms then you must be doing something seriously wrong.
> One of my apps query times are generally in the 200 ms range over 2 million
> records.  A keys-only query can return in 50ms.

Are you debating the validity of google's statistics?  Or the loud
complaints posted to this mailing list last week?

Some queries will certainly return faster than others, and from what
I've read/watched, keys-only queries should have performance profiles
roughly similar to simple gets.  But there can be no doubt that real
queries are quite slow compared to simple gets.

But you're arguing with a straw man here.  I've never suggested that
queries are not useful.

However, you *have* suggested that batch gets aren't important.
"Batch gets are really only useful in apps that need to take a load of
ids from an external source and do something with them."  That's
absolute rubbish.  A very large (and growing) number of applications
are being built on NoSQL databases that are effectively key-value
stores.  Cassandra, Tokyo Cabinet, HBase, Voldemort, and *dozens* of
other tools are being developed because they can do something that
relational systems can't:  get() and put() vast quantities of data
quickly.

There are a growing number of applications (largely defined by
staggeringly large user bases) in which the cost of maintaining
traditional indexes is not practical.  You aren't going to implement
Twitter or Facebook with a bunch of appengine queries!  But apparently
Cassandra works great.

> This is the time required to execute 9 parallel queries on geospatial data
> and OR merge them together.  Keep in mind that with Twig I could execute 90
> parallel queries and expect the time to be about the same.

You have the luxury of relatively static data, which colors your view
of the world.  I work with data that has a high churn rate, which
colors mine.

I have to ask you something though - would you need to do 9 parallel
queries if you were working with a datastore that has proper spatial
indexes?  Not that doing parallel queries isn't cool, but is it
actually necessary for your app?

I'm not doing spatial queries right now, but it's on the horizon.
I've done the research.  For my application, it's much easier and more
efficient to push my spatial queries off to a cluster of PostGIS
instances running elsewhere in the cloud.  It's also much, much
cheaper.

> * Fire off a batch job at your leisure to finish it off.
>
> This "partial update" approach only works in cases where you are not adding
> a field that you will query on.  That needs to be an all-or-nothing batch
> job.

Nonsense, this is totally dependent on the specific logic of your application.

Simple example:  You're adding a loginCount to your User entity, and
you want to add a query that selects out users that have logged in
more than N times.  No reason you can't start running those queries
right away.

You're trying to dismiss the utility of upgrading the dataset in-place
by saying that *some* application features require the dataset to be
completely transitioned before being enabled.  Ok, some do some don't.
 Your claim is still absurd.

> It probably explains why you don't think that OR queries are so important.

The reason OR queries aren't high on our priority list is because
nobody has been asking for them.  There doesn't even seem to be an
issue for it in GAE's issue tracker - or if there is, it's *pages*
down the list of priorities.

> They were one of the first things I tried on App Engine and one of the
> reasons Twig was written.  I would bet that most developers could not
> imagine working with an RDBMS that did not support OR and AND queries (on
> more than one property).  Twigs support for these saves time and reduces the
> complexity of the developers app.  With Objectify they are left on their own
> to re-invent the wheel every time.

Our conceptual model of the datastore is not an RDBMS.  It's a
key-value store that also allows limited queryability.  If you really
want an RDBMS, I'm sure the Cloud2db guys will be happy to chime in
again.

Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to