Hello,

I am Seung-ho and I work as a Data Engineer in Korea. I need some advice.

My company recently consider replacing RDMBS-based system with Cassandra and 
Hadoop.
The purpose of this system is to analyze Cadssandra and HDFS data with Spark.

It seems many user cases put emphasis on data locality, for instance, both 
Cassandra and Spark executor should be on the same node.

The thing is, my company's data analyst team wants to analyze heterogeneous 
data source, Cassandra and HDFS, using Spark.
So, I wonder what would be the best practices of using Cassandra and Hadoop in 
such case.

Plan A: Both HDFS and Cassandra with NodeManager(Spark Executor) on the same 
node

Plan B: Cassandra + Node Manager / HDFS + NodeManager in each node separately 
but the same cluster


Which would be better or correct, or would be a better way?

I appreciate your advice in advance :)

Best Regards,
Seung-Ho Han


Windows 10용 메일<https://go.microsoft.com/fwlink/?LinkId=550986>에서 보냄

Reply via email to