On Sep 13, 2011, at 7:20 AM, Steve Loughran wrote:

> 
> I missed a talk at the local university by a Platform sales rep last month, 
> though I did get to offend one of the authors of condor team instead [1]. by 
> pointing out that all grid schedulers contain a major assumption: that 
> storage access times are constant across your cluster. It is if you can pay 
> for something like GPFS, but you don't get 50TB of GPFS storage for $2500, 
> which is what adding 25*2TB SATA drives would cost if you stuck them on your 
> compute nodes; $7500 for a fully replicated 50TB. That's why I'm not a fan of 
> grid systems -cost of storage and networking aren't taken into account. Then 
> there's the availablity issues with the larger filesystems, that are a topic 
> for another day.

For what it's worth - I do know folks who have done (are doing) data locality 
with Condor.  Condor is wonderfully flexible, easily flexible enough to shoot 
yourself in the foot.  There was also a grad student who did work in allowing 
Condor to fire up Hadoop datanodes and job trackers directly.

For the most part you are right though - all these systems have long treated 
nodes as individual, independent units (either because the systems were 
job-oriented, not data oriented, or because they ran at supercomputing centers 
where money was no concern).

This is starting to change, but change is always frustratingly slow.  On the 
upside, we now have single Condor pools that span 80 sites around the globe and 
it is easy to have two Condor pools interoperate and exchange jobs.  So, each 
system has its own strengths and weaknesses.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to