Alec Ten Harmsel <alec <at> alectenharmsel.com> writes:
> As far as HDFS goes, I would only set that up if you will use it for > Hadoop or related tools. It's highly specific, and the performance is > not good unless you're doing a massively parallel read (what it was > designed for). I can elaborate why if anyone is actually interested. Acutally, from my research and my goal (one really big scientific simulation running constantly). Many folks are recommending to skip Hadoop/HDFS all together and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do..... > We use Lustre for our high performance general storage. I don't have any > numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB > sounds familiar, but don't quote me on that). AT Umich, you guys should test the FhGFS/btrfs combo. The folks at UCI swear about it, although they are only publishing a wee bit. (you know, water cooler gossip)...... Surely the Wolverines do not want those californians getting up on them? Are you guys planning a mesos/spark test? > > Personally, I would read up on these and see how they work. Then, > > based on that, decide if they are likely to assist in the specific > > situation you are interested in. It's a ton of reading. It's not apples-to-apple_cider type of reading. My head hurts..... I'm leaning to DFS/LFS (2) Luster/btrfs and FhGFS/btrfs Thoughts/comments? James