On Sep 13, 2011, at 7:20 AM, Steve Loughran wrote: > > I missed a talk at the local university by a Platform sales rep last month, > though I did get to offend one of the authors of condor team instead [1]. by > pointing out that all grid schedulers contain a major assumption: that > storage access times are constant across your cluster. It is if you can pay > for something like GPFS, but you don't get 50TB of GPFS storage for $2500, > which is what adding 25*2TB SATA drives would cost if you stuck them on your > compute nodes; $7500 for a fully replicated 50TB. That's why I'm not a fan of > grid systems -cost of storage and networking aren't taken into account. Then > there's the availablity issues with the larger filesystems, that are a topic > for another day.
For what it's worth - I do know folks who have done (are doing) data locality with Condor. Condor is wonderfully flexible, easily flexible enough to shoot yourself in the foot. There was also a grad student who did work in allowing Condor to fire up Hadoop datanodes and job trackers directly. For the most part you are right though - all these systems have long treated nodes as individual, independent units (either because the systems were job-oriented, not data oriented, or because they ran at supercomputing centers where money was no concern). This is starting to change, but change is always frustratingly slow. On the upside, we now have single Condor pools that span 80 sites around the globe and it is easy to have two Condor pools interoperate and exchange jobs. So, each system has its own strengths and weaknesses. Brian
smime.p7s
Description: S/MIME cryptographic signature