Hello, I am Seung-ho and I work as a Data Engineer in Korea. I need some advice.
My company recently consider replacing RDMBS-based system with Cassandra and Hadoop. The purpose of this system is to analyze Cadssandra and HDFS data with Spark. It seems many user cases put emphasis on data locality, for instance, both Cassandra and Spark executor should be on the same node. The thing is, my company's data analyst team wants to analyze heterogeneous data source, Cassandra and HDFS, using Spark. So, I wonder what would be the best practices of using Cassandra and Hadoop in such case. Plan A: Both HDFS and Cassandra with NodeManager(Spark Executor) on the same node Plan B: Cassandra + Node Manager / HDFS + NodeManager in each node separately but the same cluster Which would be better or correct, or would be a better way? I appreciate your advice in advance :) Best Regards, Seung-Ho Han Windows 10용 메일<https://go.microsoft.com/fwlink/?LinkId=550986>에서 보냄