If you use Containers like Docker Plan A can work provided you do the
resource and capacity planning. I tend to think that Plan B is more
Standard and easier Although you can wait to hear from others for a second
opinion.

Caution: Data Locality will make sense if the Disk throughput is
significantly higher than Network Throughput (Not all, have the same
scenario)


On Thu, Jun 8, 2017 at 1:25 AM, 한 승호 <shha...@outlook.com> wrote:

> Hello,
>
>
>
> I am Seung-ho and I work as a Data Engineer in Korea. I need some advice.
>
>
>
> My company recently consider replacing RDMBS-based system with Cassandra
> and Hadoop.
>
> The purpose of this system is to analyze Cadssandra and HDFS data with
> Spark.
>
>
>
> It seems many user cases put emphasis on data locality, for instance, both
> Cassandra and Spark executor should be on the same node.
>
>
>
> The thing is, my company's data analyst team wants to analyze
> heterogeneous data source, Cassandra and HDFS, using Spark.
>
> So, I wonder what would be the best practices of using Cassandra and
> Hadoop in such case.
>
>
>
> Plan A: Both HDFS and Cassandra with NodeManager(Spark Executor) on the
> same node
>
>
>
> Plan B: Cassandra + Node Manager / HDFS + NodeManager in each node
> separately but the same cluster
>
>
>
>
>
> Which would be better or correct, or would be a better way?
>
>
>
> I appreciate your advice in advance :)
>
>
>
> Best Regards,
>
> Seung-Ho Han
>
>
>
>
>
> Windows 10용 메일 <https://go.microsoft.com/fwlink/?LinkId=550986>에서 보냄
>
>
>

Reply via email to