Equally split a RDD partition into two partition at the same node

2017-01-14 Thread Fei Hu
Dear all, I want to equally divide a RDD partition into two partitions. That means, the first half of elements in the partition will create a new partition, and the second half of elements in the partition will generate another new partition. But the two new partitions are required to be at the sa

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
Hi Sean, Can you elaborate on " it's actually used by Spark"? Where exactly? I'd like to be corrected. What about the scaladoc? Since the method's a public API, I think it should be fixed, shouldn't it? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2

Re: [PYSPARK] Python tests organization

2017-01-14 Thread Saikat Kanjilal
https://issues.apache.org/jira/browse/SPARK-19224 Maciej/Holden, If its ok for I can come up with a proposal for reorganization and add the proposal to the JIRA as next steps before we break up the work? Thanks From: Maciej Szymkiewicz Sent: Thursday, Janua

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-14 Thread Marcelo Vanzin
scala> org.apache.hadoop.fs.FileSystem.getLocal(sc.hadoopConfiguration) res0: org.apache.hadoop.fs.LocalFileSystem = org.apache.hadoop.fs.LocalFileSystem@3f84970b scala> res0.delete(new org.apache.hadoop.fs.Path("/tmp/does-not-exist"), true) res3: Boolean = false Does that explain your confusion?

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-14 Thread Marcelo Vanzin
Are you actually seeing a problem or just questioning the code? I have never seen a situation where there's a failure because of that part of the current code. On Fri, Jan 13, 2017 at 3:24 AM, Rostyslav Sotnychenko wrote: > Hi all! > > I am a bit confused why Spark AM and Client are both trying

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Sean Owen
It doesn't strike me as something that's problematic to use. There are a thousand things in the API that, maybe in hindsight, could have been done differently, but unless something is bad practice or superseded by another superior mechanism, it's probably not worth the bother for maintainers or use

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
Hi, Yes, correct. I was too forceful in discouraging people using it. I think @deprecated would be a right direction. What should be the next step? I think I should file an JIRA so it's in a release notes. Correct? I was very surprised to have noticed its resurrection in the very latest module o

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Mridul Muralidharan
Since TaskContext.getPartitionId is part of the public api, it cant be removed as user code can be depending on it (unless we go through a deprecation process for it). Regards, Mridul On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski wrote: > Hi, > > Just noticed that TaskContext#getPartitionId

What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
Hi, Just noticed that TaskContext#getPartitionId [1] is not used and moreover the scaladoc is incorrect: "It will return 0 if there is no active TaskContext for cases like local execution." since there are no local execution. (I've seen the comment in the code before but can't find it now). The