Spark is a parallel computing framework. There are many ways to give it data to chomp down on. If you don't know why you would need HDFS, then you don't need it. Same goes for Zookeeper. Spark works fine without either.
Much of what we read online comes from people with specialized problems and requirements (such as maintaining a 'highly available' service, or accessing an existing HDFS). It can be extremely confusing to the dude who just needs to do some parallel computing. Pete On Wed, Aug 24, 2016 at 3:54 PM, kant kodali <kanth...@gmail.com> wrote: > What do I loose if I run spark without using HDFS or Zookeper ? which of > them is almost a must in practice? >