Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Gil Vernik
Thanks a lot for the info on it. Does this explains 2 temp file generation per each task ( one temp that is renamed to another )? I understand why there is one temp file per task, but still not sure why there were 2 per each task, Thanks Gil. From: Imran Rashid To: Gil Vernik/Haifa/

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-15 Thread Sean McNamara
Ran tests on OS X +1 Sean > On Apr 14, 2015, at 10:59 PM, Patrick Wendell wrote: > > I'd like to close this vote to coincide with the 1.3.1 release, > however, it would be great to have more people test this release > first. I'll leave it open for a bit longer and see if others can give > a +

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-15 Thread Joseph Bradley
+1 On Wed, Apr 15, 2015 at 5:40 PM, Tom Graves wrote: > +1 tested on spark on yarn on hadoop 2.6 cluster with security. > Tom > > > On Sunday, April 5, 2015 6:25 PM, Patrick Wendell > wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 1.2.2! > > The ta

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-15 Thread Tom Graves
+1 tested on spark on yarn on hadoop 2.6 cluster with security. Tom On Sunday, April 5, 2015 6:25 PM, Patrick Wendell wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.2! The tag to be voted on is v1.2.2-rc1 (commit 7531b50): https://git-wip-us.apa

Re: Query regarding infering data types in pyspark

2015-04-15 Thread Davies Liu
It does not work now, could you file a jira for it? On Wed, Apr 15, 2015 at 9:29 AM, Suraj Shetiya wrote: > Thank you :) > > That worked. I had another query regarding date being used as filter. > > With the new df which has the column cast as date I am unable to apply a > filter that compares th

Re: Query regarding infering data types in pyspark

2015-04-15 Thread Suraj Shetiya
Thank you :) That worked. I had another query regarding date being used as filter. With the new df which has the column cast as date I am unable to apply a filter that compares the dates. The query I am using is : df.filter(df.Datecol > datetime.date(2015,1,1)).show() I do not want to use date a

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Imran Rashid
The temp file creation is controlled by a hadoop OutputCommitter, which is normally FileOutputCommitter by default. Its used in SparkHadoopWriter (which in turn is used by PairRDDFunctions.saveAsHadoopDataset). You could change the output committer to not use tmp files (eg. use this from Aaron Da