Re: Call for masters thesis ideas (possibly related to Clojure)

Niels Mayer Fri, 18 Dec 2009 13:18:40 -0800

(0)  Create a toolkit to run multiple parallel, tightly communicating
clojure apps on google-app engine, simulating a single, long-running,
multithreaded JVM instance that does not appear, to the user, to be limited
by the constraints of GAE's java implementation (e.g. single threading,
shared refs work despite being distributed); at the same time, the toolkit
would minimize resources consumed in GAE, persisting threads/continuations
waiting for data, and heuristically determining lowest cost for long-term
storage versus memory and runtime consumption.


Note: http://elhumidor.blogspot.com/2009/04/clojure-on-google-appengine.html

THE BIG CAVEAT


> > Two unusual aspects of the Google AppEngine environment create pretty major

> constraints on your ability to write idiomatic Clojure.


> > First, an AppEngine application runs in a security context that doesn't

> permit spawning threads, so you won't be able to use Agents, the

> clojure.parallel library, or Futures.


> > Second, one of the most exciting features of AppEngine is that your

> application will be deployed on Google's huge infrastructure, dynamically

> changing its footprint depending on demand. That means you'll potentially be

> running on many JVMs at once. Unfortunately this is a strange fit for

> Clojure's concurrency features, which are most useful when you have precise

> control over what lives on what JVM (and simplest when everything runs on

> one JVM). Since shared references (Vars, Refs, and Atoms) are shared only

> within a single JVM, they are not suitable for many of their typical uses

> when running on AppEngine. You should still use Clojure's atomic references

> (and their associated means of modification) for any state that it makes

> sense to keep global per-JVM, since there may be multiple threads serving

> requests in one JVM. But remember JVMs will come and go during the lifetime

> of your application, so anything truly global should go in the Datastore or

> Memcache.


(1)  a clojure implementation of Yahoo's PNUTs, using STM's and all the cool
facilities clojure provides: http://research.yahoo.com/files/pnuts.pdf
(interesting
to have a writeup of a real-world impl alongside comparisons to Google
Bigtable and Amazon Dynamo)

We describe PNUTS, a massively parallel and geographically distributed
> database system for Yahoo!'s web applications.


> The foremost requirements of web applications are scalability, consistently
> good response time for geographically dispersed users, and high
> availability. At the same time, web applications can frequently tolerate
> relaxed consistency guarantees.


> For example, if a user changes an avatar ... little harm is done if the new
> avatar is not initially visible to one friend .... It is often acceptable to
> read (slightly) stale data, but occasionally stronger guarantees are
> required by applications.


> PNUTS provides a consistency model that is between the two extremes of
> general serializability and eventual consistency ... We provide per-record
> timeline consistency: all replicas of a given record apply all updates to
> the record in the same order .... The application [can] indicate cases where
> it can do with some relaxed consistency for higher performance .... [such as
> reading] a possibly stale version of the record.


Some interesting commentary from
http://glinden.blogspot.com/2009/02/details-on-yahoos-distributed-database.html

<http://glinden.blogspot.com/2009/02/details-on-yahoos-distributed-database.html>
..................
*
*
*When reading the paper, a couple things about PNUTS struck me as
surprising:

First, the system is layered on top of the guarantees of a reliable pub-sub
message broker which acts "both as our replacement for a redo log and our
replication mechanism." I have to wonder if the choice to not build these
pieces of the database themselves could lead to missed opportunities for
improving performance and efficiency.

Second, as figures 3 and 4 show, the average latency of requests to their
database seems quite high, roughly 100 ms. This is high enough that web
applications probably would incur too much total latency if they made a few
requests serially (e.g. ask for some data, then, depending on what the data
looks like, ask for some other data). That seems like a problem.

Please see also my August 2006 post, "Google Bigtable
paper<http://glinden.blogspot.com/2006/08/google-bigtable-paper.html>",
which discusses the distributed database behind many products at Google.

Please see also my earlier post, "Highly available distributed hash store at
Amazon<http://glinden.blogspot.com/2007/10/highly-available-distributed-hash.html>",
on the distributed database behind some features at Amazon.com.

Please see also my earlier posts, "Cassandra data store at
Facebook<http://glinden.blogspot.com/2008/08/cassandra-data-store-at-facebook.html>"
and "HBase: A Google Bigtable
clone<http://glinden.blogspot.com/2007/07/hbase-google-bigtable-clone.html>
".

Update: One of the developers of PNUTS
commented<http://glinden.blogspot.com/2009/02/details-on-yahoos-distributed-database.html?showComment=1233884340000#c1254841206330803677>
on
this post, pointing out that PNUTS performance is much better in practice
(1-10ms/request) when caching layers are in place and making a few
comparisons to Bigtable.*
*
*
*..................
*
*
*

-- Niels
http://nielsmayer.com

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Call for masters thesis ideas (possibly related to Clojure)

Reply via email to