Re: year out of range

2016-09-09 Thread Daniel Lopes
Thanks Ayan! *Daniel Lopes* Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br On Thu, Sep 8, 2016 at 7:54 PM, ayan guha wrote: > Another way

Re: year out of range

2016-09-08 Thread ayan guha
Another way of debugging would be writing another UDF, returning string. Also, in that function, put something useful in catch block, so you can filter those records from df. On 9 Sep 2016 03:41, "Daniel Lopes" wrote: > Thanks Mike, > > A good way to debug! Was that already! > > Best, > > *Daniel

Re: year out of range

2016-09-08 Thread Daniel Lopes
Thanks Mike, A good way to debug! Was that already! Best, *Daniel Lopes* Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br On Thu, Sep 8, 201

Re: year out of range

2016-09-08 Thread Mike Metzger
My guess is there's some row that does not match up with the expected data. While slower, I've found RDDs to be easier to troubleshoot this kind of thing until you sort out exactly what's happening. Something like: raw_data = sc.textFile("") rowcounts = raw_data.map(lambda x: (len(x.split(",")),

Re: year out of range

2016-09-08 Thread Daniel Lopes
Thanks, I *tested* the function offline and works Tested too with select * from after convert the data and see the new data good *but* if I *register as temp table* to *join other table* stilll shows *the same error*. ValueError: year out of range Best, *Daniel Lopes* Chief Data and Analytics O

Re: year out of range

2016-09-08 Thread Marco Mistroni
Daniel Test the parse date offline to make sure it returns what you expect If it does in spark shell create a df with 1 row only and run ur UDF. U should b able to see issue If not send me a reduced CSV file at my email and I give it a try this eve hopefully someone else will b able to assist

Re: year out of range

2016-09-08 Thread Daniel Lopes
Thanks Marco for your response. The field came encoded by SQL Server in locale pt_BR. The code that I am formating is: -- def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'): try: locale.setlocale(locale.LC_TIME, 'pt_BR.utf8') return datetime.strp

Re: year out of range

2016-09-08 Thread Marco Mistroni
Pls paste code and sample CSV I m guessing it has to do with formatting time? Kr On 8 Sep 2016 12:38 am, "Daniel Lopes" wrote: > Hi, > > I'm* importing a few CSV*s with spark-csv package, > Always when I give a select at each one looks ok > But when i join then with sqlContext.sql give me this e