Re: JIRA content request

2014-07-29 Thread Henry Saputra
Yup, I am seeing this in some other Apache projects as well and usually if being asked to add more information more reporter gladly comply as requested. Have to diligently nudge some JIRA filers at the beginning but usually people see that more description are better and the habit get pick up by n

Re: JIRA content request

2014-07-29 Thread Matei Zaharia
I agree as well. FWIW sometimes I've seen this happen due to language barriers, i.e. contributors whose primary language is not English, but we need more motivation for each change. On July 29, 2014 at 5:12:01 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: +1 on using JIRA workflows

Re: JIRA content request

2014-07-29 Thread Nicholas Chammas
+1 on using JIRA workflows to manage the backlog, and +9000 on having decent descriptions for all JIRA issues. On Tue, Jul 29, 2014 at 7:48 PM, Sean Owen wrote: > How about using a JIRA status like "Documentation Required" to mean > "burden's on you to elaborate with a motivation and/or PR". Th

Re: RFC: Supporting the Scala drop Method for Spark RDDs

2014-07-29 Thread Erik Erlandson
- Original Message - > Sure, drop() would be useful, but breaking the "transformations are lazy; > only actions launch jobs" model is abhorrent -- which is not to say that we > haven't already broken that model for useful operations (cf. > RangePartitioner, which is used for sorted RDDs),

Re: JIRA content request

2014-07-29 Thread Sean Owen
How about using a JIRA status like "Documentation Required" to mean "burden's on you to elaborate with a motivation and/or PR". This could both prompt people to do so, and also let one see when a JIRA has been waiting on the reporter for months, rather than simply never been looked at, and should t

Re: JIRA content request

2014-07-29 Thread Reynold Xin
+1 on this. On Tue, Jul 29, 2014 at 4:34 PM, Mark Hamstra wrote: > Of late, I've been coming across quite a few pull requests and associated > JIRA issues that contain nothing indicating their purpose beyond a pretty > minimal description of what the pull request does. On the pull request > it

JIRA content request

2014-07-29 Thread Mark Hamstra
Of late, I've been coming across quite a few pull requests and associated JIRA issues that contain nothing indicating their purpose beyond a pretty minimal description of what the pull request does. On the pull request itself, a reference to the corresponding JIRA in the title combined with a desc

RE: pre-filtered hadoop RDD use case

2014-07-29 Thread Yan Zhou.sc
Hi Reynold, I agree that we should not hurry right now to modify/enhance APIs and could be satisfied with extending existing ones as much as possible. On the other hand, more intelligent data stores like HBase or Cassendra do support complex pushdowns, often more complex than their MR interface

Re: pre-filtered hadoop RDD use case

2014-07-29 Thread Reynold Xin
I am not sure if I agree that it lacks the mechanism to do pushdowns. Hadoop InputFormat itself provides some basic mechanism to push down predicates already. The HBase InputFormat already implements it. In Spark, you can also run arbitrary user code, and you can decide what to do. You can also ju

RE: pre-filtered hadoop RDD use case

2014-07-29 Thread Yan Zhou.sc
PartitionPruningRDD.scala still only handles, as said, the partition portion of the issue. On the "record pruning" portion, although cheap fixes could be available for this issue as reported, but I believe a fundamental issue is lack of a mechanism of processing merging/pushdown. Given the pop

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-29 Thread Nicholas Chammas
- spun up an EC2 cluster successfully using spark-ec2 - tested S3 file access from that cluster successfully +1 ​ On Tue, Jul 29, 2014 at 1:46 AM, Henry Saputra wrote: > NOTICE and LICENSE files look good > Hashes and sigs look good > No executable in the source distribution > Compile so

Re: pre-filtered hadoop RDD use case

2014-07-29 Thread Reynold Xin
Would something like this help? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PartitionPruningRDD.scala On Thu, Jul 24, 2014 at 8:40 AM, Eugene Cheipesh wrote: > Hello, > > I have an interesting use case for a pre-filtered RDD. I have two solutions > th