Re: JIRA: Wrong dates from imported JIRAs

2015-12-11 Thread Reynold Xin
Thanks for looking at this. Is it worth fixing? Is there a risk (although small) that the re-import would break other things? Most of those are done and I don't know how often people search JIRAs by date across projects. On Fri, Dec 11, 2015 at 3:40 PM, Lars Francke wrote: > Hi, > > I've been d

A very Minor typo in the Spark paper

2015-12-11 Thread Fengdong Yu
Hi, I found a very minor typo in: http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf Page 4: We complement the data mining example in Section 2.2.1 with two iterative applications: logistic regression and PageRank. I read back to section 2.2.1, there is no these two examples. actua

Re: coalesce at DataFrame missing argument for shuffle.

2015-12-11 Thread Reynold Xin
I am not sure if we need it. The RDD API has way too many methods and parameters. As you said, it is simply "repartition". On Fri, Dec 11, 2015 at 2:56 PM, Hyukjin Kwon wrote: > Hi all, > > I accidentally met coalesce() function and found this taking arguments > different for RDD and DataFrame.

Re: JIRA: Wrong dates from imported JIRAs

2015-12-11 Thread Lars Francke
That's a good point. I assume there's always a small risk but it's at least the documented way from Atlassian to change the creation date so I'd hope it should be okay. I'd build the minimal CSV file. I agree that probably not a lot of people are going to search across projects but on the other ha

Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Nick Pentreath
cc'ing dev list Ok, looks like when the KCL version was updated in https://github.com/apache/spark/pull/8957, the AWS SDK version was not, probably leading to dependency conflict, though as Burak mentions its hard to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally and on my

Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Nick Pentreath
Is that PR against master branch? S3 read comes from Hadoop / jet3t afaik — Sent from Mailbox On Fri, Dec 11, 2015 at 5:38 PM, Brian London wrote: > That's good news I've got a PR in to up the SDK version to 1.10.40 and the > KCL to 1.6.1 which I'm running tests on locally now. > Is the

Maven build against Hadoop 2.4 times out

2015-12-11 Thread Ted Yu
Hi, You may have noticed that maven build against Hadoop 2.4 times out on Jenkins. The last module is spark-hive-thriftserver This seemed to start with build #4440 FYI - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
Hi Folks, Is it possible to assign multiple core per task and how? Suppose we have some scenario, in which some tasks are really heavy processing each record and require multi-threading, and we want to avoid similar tasks assigned to the same executors/hosts. If it is not supported, does it m

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-11 Thread Michael Armbrust
Trying again now that eec36607 is merged. On Thu, Dec 10, 2015 at 6:44 PM, Michael Armbrust wrote: > Cutting RC2 now. > > On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust > wrote: > >> We are getting close to me

Re: Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
I noticed that it is configurable in job level spark.task.cpus. Anyway to support on task level? Thanks. Zhan Zhang On Dec 11, 2015, at 10:46 AM, Zhan Zhang wrote: > Hi Folks, > > Is it possible to assign multiple core per task and how? Suppose we have some > scenario, in which some tasks