Daniel Test the parse date offline to make sure it returns what you expect If it does in spark shell create a df with 1 row only and run ur UDF. U should b able to see issue If not send me a reduced CSV file at my email and I give it a try this eve ....hopefully someone else will b able to assist in meantime U don't need to run a full spark app to debug issue Ur problem. Is either in the parse date or in what gets passed to the UDF Hth
On 8 Sep 2016 1:31 pm, "Daniel Lopes" <dan...@onematch.com.br> wrote: > Thanks Marco for your response. > > The field came encoded by SQL Server in locale pt_BR. > > The code that I am formating is: > > -------------------------- > def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'): > try: > locale.setlocale(locale.LC_TIME, 'pt_BR.utf8') > return datetime.strptime(argument, format_date) > except: > return None > > convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'), > TimestampType()) > > transacoes = transacoes.withColumn('tr_Vencimento', > convert_date(transacoes.*tr_Vencimento*)) > > -------------------------- > > the sample is > > ------------------------- > +-----------------+----------------+-----------------+------ > --+------------------+-----------+-----------------+-------- > -------------+------------------+--------------+------------ > ----+-------------+-------------+----------------------+---- > ------------------------+--------------------+--------+----- > ---+------------------+----------------+--------+----------+ > -----------------+----------+ > |tr_NumeroContrato|tr_TipoDocumento| *tr_Vencimento*|tr_Valor|tr_ > DataRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximoCorr|tr_ > ValorAtualizado|tr_ComGarantia|tr_ValorDesconto| > tr_ValorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante| > tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea| > tr_Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia| > +-----------------+----------------+-----------------+------ > --+------------------+-----------+-----------------+-------- > -------------+------------------+--------------+------------ > ----+-------------+-------------+----------------------+---- > ------------------------+--------------------+--------+----- > ---+------------------+----------------+--------+----------+ > -----------------+----------+ > | 0000992600153001| |*Jul 20 2015 12:00*| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35|2015-07-20 12:00:...| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |*Abr 20 2015 12:00*| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Nov 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35|2015-11-20 12:00:...| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Dez 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Fev 20 2016 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Fev 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Jun 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35|2015-06-20 12:00:...| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Ago 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Jan 20 2016 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35|2016-01-20 12:00:...| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Jan 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35|2015-01-20 12:00:...| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Set 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Mai 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Out 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35| null| null| > null| null| null| null| null| > null| null| > | 0000992600153001| |Mar 20 2015 12:00| 254.35| > null| null| null| null| > null| 0| null| null| null| > null| 254.35|2015-03-20 12:00:...| null| > null| null| null| null| null| > null| null| > +-----------------+----------------+-----------------+------ > --+------------------+-----------+-----------------+-------- > -------------+------------------+--------------+------------ > ----+-------------+-------------+----------------------+---- > ------------------------+--------------------+--------+----- > ---+------------------+----------------+--------+----------+ > -----------------+----------+ > > ------------------------- > > *Daniel Lopes* > Chief Data and Analytics Officer | OneMatch > c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes > > www.onematch.com.br > <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> > > On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mmistr...@gmail.com> > wrote: > >> Pls paste code and sample CSV >> I m guessing it has to do with formatting time? >> Kr >> >> On 8 Sep 2016 12:38 am, "Daniel Lopes" <dan...@onematch.com.br> wrote: >> >>> Hi, >>> >>> I'm* importing a few CSV*s with spark-csv package, >>> Always when I give a select at each one looks ok >>> But when i join then with sqlContext.sql give me this error >>> >>> all tables has fields timestamp >>> >>> joins are not with this dates >>> >>> >>> *Py4JJavaError: An error occurred while calling o643.showString.* >>> : org.apache.spark.SparkException: Job aborted due to stage failure: >>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9 >>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036): >>> org.apache.spark.api.python.PythonException: Traceback (most recent >>> call last): >>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>> lib/pyspark.zip/pyspark/worker.py", line 111, in main >>> process() >>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>> lib/pyspark.zip/pyspark/worker.py", line 106, in process >>> serializer.dump_stream(func(split_index, iterator), outfile) >>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream >>> vs = list(itertools.islice(iterator, batch)) >>> File >>> "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py", >>> line 1563, in <lambda> >>> func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it) >>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal >>> else time.mktime(dt.timetuple())) >>> *ValueError: year out of range * >>> >>> Any one knows this problem? >>> >>> Best, >>> >>> *Daniel Lopes* >>> Chief Data and Analytics Officer | OneMatch >>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes >>> >>> www.onematch.com.br >>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> >>> >> >