Cloud adds another dimension: The fact that in cloud compute and storage is decoupled, s3-emr or blob-hdisight, means in cloud Hadoop ends up being more of a compute engine and a lot of the governance, security features are irrelevant or less important because data at rest is out of Hadoop. Currently the biggest reason to run Spark in Hadoop is Yarn (in cloud), but if you decide to use Mesos/Standalone then again you may not need Hadoop. Databrick adds another dimension to this in cloud which I won't comment on.
But on-premise I think you can argue that HDFS is here to stay in many forms, e.g. Isilon, object stores and other storage types not just local disk. HDFS API actually works over Azure's Data Lake Store completely independent of Hadoop! On Thu, Apr 14, 2016 at 1:29 PM, Cody Koeninger <c...@koeninger.org> wrote: > I've been using spark for years and have (thankfully) been able to > avoid needing HDFS, aside from one contract where it was already in > use. > > At this point, many of the people I know would consider Kafka to be > more important than HDFS. > > On Thu, Apr 14, 2016 at 3:11 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > I do not think so. Hadoop provides an ecosystem in which you can deploy > > different engines, such as MR, HBase, TEZ, Spark, Flink, titandb, hive, > > solr... I observe also that commercial analytical tools use one or more > of > > these engines to execute their code in a distributed fashion. You need > this > > flexibility to have an ecosystem suitable for your needs -especially In > the > > area of security. HDFS is one key element for the storage and locality. > > Spark itself cannot provide such a complete ecosystem but is part of > > ecosystems. > > > > On 14 Apr 2016, at 21:13, Ashok Kumar <ashok34...@yahoo.com.INVALID> > wrote: > > > > Hi, > > > > I hear that some saying that Hadoop is getting old and out of date and > will > > be replaced by Spark! > > > > Does this make sense and if so how accurate is it? > > > > Best > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >