Efficient for loops in Spark

2016-05-12 Thread flyinggip
Hi there, I'd like to write some iterative computation, i.e., computation that can be done via a for loop. I understand that in Spark foreach is a better choice. However, foreach and foreachPartition seem to be for self-contained computation that only involves the corresponding Row or Partition,

Possible bug involving Vectors with a single element

2016-05-24 Thread flyinggip
Hi there, I notice that there might be a bug in pyspark.mllib.linalg.Vectors when dealing with a vector with a single element. Firstly, the 'dense' method says it can also take numpy.array. However the code uses 'if len(elements) == 1' and when a numpy.array has only one element its length is u