On 10 Jul 2015, at 23:10, algermissen1971 <algermissen1...@icloud.com> wrote:
> Hi, > > initially today when moving my streaming application to the cluster the first > time I ran in to newbie error of using a local file system for checkpointing > and the RDD partition count differences (see exception below). > > Having neither HDFS nor S3 (and the Cassandra-Connector not yet supporting > checkpointing[1]) I turned to Swift (which is already available in our > architecture). > > I mounted Swift using cloudfuse[2] I see the checkpoint files on all three > cluster nodes - but still the job fails with the mentioned exception. > > I experimented with cloudfuse caching settings but that does not *seem* to > help. > > Can anyone shed some light on this issue and provide a hint what I might be > doing wrong here? In case this helps somebody else, here is what made it work for me after playing with all the options: cloudfuse -o username=xxxx,api_key=xxxx,direct_io,hard_remove,entry_timeout=1,big_writes,cache_timeout=1,use_snet=True /swift It seems I also had to remove the ~/.cloudfuse files I created previously for the options to take effect. Jan > > Jan > > [1] https://datastax-oss.atlassian.net/browse/SPARKC-13 > [2] https://github.com/redbo/cloudfuse > > > > Exception: > > org.apache.spark.SparkException: Checkpoint RDD CheckpointRDD[72] at print at > App.scala:47(0) has different number of partitions than original RDD > MapPartitionsRDD[70] at updateStateByKey at App.scala:47(2) > at > org.apache.spark.rdd.RDDCheckpointData.doCheckpoint(RDDCheckpointData.scala:103) > at > org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply$mcV$sp(RDD.scala:1538) > at > org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1535) > at > org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1535) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at org.apache.spark.rdd.RDD.doCheckpoint(RDD.scala:1534) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1735) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1750) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1765) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1272) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apac.... > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org