Hello All, I'm a newbie to Spark and Cassandra. I try to run the spark demo within dse-cassandra Portfoliodemo in a cluster env but cannot succeed.
This issue may not really coming from spark, but I am really not sure how to investigate more on this. Please help me. There are 5 centos servers in my cluster (all installed dse by yum). Here is the status by nodetool: - server82 server80 server106 act as cassandra nodes. - server134 server136 act as analytics nodes with spark enable. Datacenter: Cassandra > ===================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN xxx.xxx.xxx.82 148.85 KB 256 27.9% > ee04410c-eea1-4016-a6d3-7b65dd599689 rack1 > UN xxx.xxx.xxx.80 109.33 KB 256 36.5% > 06ee3d5c-2e85-4231-89e2-3789f37bfce5 rack1 > UN xxx.xxx.xxx.106 132.78 KB 256 35.3% > 33c7c212-c528-4afe-abe1-197aa86dfc01 rack1 > Datacenter: Analytics > ===================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN xxx.xxx.xxx.136 150.22 KB 1 0.0% > d33a8c3a-1d15-4d8e-8d2b-b0324c4aafe5 rack1 > UN xxx.xxx.xxx.134 160.02 KB 1 0.3% > 89c53d92-f54f-4bea-aa1e-e93777983b4d rack1 I've execute the data simulated steps by trigger program price on one of my Cassandra node (server80) Then I login to server136 to execute the spark program *"10-day-loss-java.sh" *at analyze node, but the following error msg appears: 14/12/25 16:18:51 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 2, > 135.252.169.134): java.io.IOException: Exception during execution of SELECT > "key", "column1", "value" FROM "PortfolioDemo"."Portfolios" WHERE > token("key") > ? ALLOW FILTERING: Not enough replica available for query at > consistency LOCAL_ONE (1 required but only 0 alive) I then suspect that the keyspace "PortfolioDemo" replication strategy may be incorrect. So I change it in cqlsh *ALTER KEYSPACE "PortfolioDemo" WITH REPLICATION = { 'class' : > 'NetworkTopologyStrategy', 'Cassandra' : 1 }; * Then I execute the nodetool repair all the nodes although nodetool told me Nothing to repaire with keyspace "PortfolioDemo". $ *nodetool -h xxx.xxx.xxx.80 repair "PortfolioDemo"* [2014-12-26 09:35:59,995] Nothing to repair for keyspace 'PortfolioDemo' $* nodetool -h **xxx.xxx.xxx**.82 repair "PortfolioDemo"* > [2014-12-26 09:36:10,983] Nothing to repair for keyspace 'PortfolioDemo' $ *nodetool -h **xxx.xxx.xxx**.106 repair "PortfolioDemo"* > [2014-12-26 09:36:18,917] Nothing to repair for keyspace 'PortfolioDemo' $ *nodetool -h **xxx.xxx.xxx**.136 repair "PortfolioDemo"* > [2014-12-26 09:36:26,155] Nothing to repair for keyspace 'PortfolioDemo' $ *nodetool -h **xxx.xxx.xxx.**134 repair "PortfolioDemo"* > [2014-12-26 09:36:32,519] Nothing to repair for keyspace 'PortfolioDemo' Anyway, now I can access the record in "PortfolioDemo.Portfolios" by select statement on server136. I execute the *"10-day-loss-java.sh" *again on server136. Then the following error msg appears instead: Exception in thread "main" scala.collection.parallel.CompositeThrowable: > Multiple exceptions thrown during a parallel computation: > java.io.IOException: Failed to fetch splits of > TokenRange(9190631453255400980,9149489230329032117,Set(),None) because > there are no replicas for the keyspace in the current datacenter. com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$split$2.apply(ServerSideTokenRangeSplitter.scala:53) com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter$$anonfun$split$2.apply(ServerSideTokenRangeSplitter.scala:49) scala.Option.getOrElse(Option.scala:120) com.datastax.spark.connector.rdd.partitioner.ServerSideTokenRangeSplitter.split(ServerSideTokenRangeSplitter.scala:49) Now I am not sure how to do any further investigation next. Would you please help me on this ? Merry Christmas to everyone. Best regards Zhang JiaQiang