Hi Ben,
You can replace HDFS with a number of storage systems since Spark is
compatible with other storage like S3. This would allow you to scale your
compute nodes solely for the purpose of adding compute power and not disk
space. You can deploy Alluxio on your compute nodes to offset the
perform
IIUC Spark doesn't strongly bind to HDFS, it uses a common FileSystem layer
which supports different FS implementations, HDFS is just one option. You
could also use S3 as a backend FS, from Spark's point it is transparent to
different FS implementations.
On Sun, Feb 12, 2017 at 5:32 PM, ayan guh
How about adding more NFS storage?
On Sun, 12 Feb 2017 at 8:14 pm, Sean Owen wrote:
> Data has to live somewhere -- how do you not add storage but store more
> data? Alluxio is not persistent storage, and S3 isn't on your premises.
>
> On Sun, Feb 12, 2017 at 4:29 AM Benjamin Kim wrote:
>
> Ha
Data has to live somewhere -- how do you not add storage but store more
data? Alluxio is not persistent storage, and S3 isn't on your premises.
On Sun, Feb 12, 2017 at 4:29 AM Benjamin Kim wrote:
> Has anyone got some advice on how to remove the reliance on HDFS for
> storing persistent data. W
You're have to carefully choose if your strategy makes sense given your users
workloads. Hence, I am not sure your reasoning makes sense.
However, You can , for example, install openstack swift as an object store and
use this as storage. HDFS in this case can be used as a temporary store and/or
Has anyone got some advice on how to remove the reliance on HDFS for storing
persistent data. We have an on-premise Spark cluster. It seems like a waste of
resources to keep adding nodes because of a lack of storage space only. I would
rather add more powerful nodes due to the lack of processing