Thanks Marco for your response.
The field came encoded by SQL Server in locale pt_BR.
The code that I am formating is:
--------------------------
def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
try:
locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
return datetime.strptime(argument, format_date)
except:
return None
convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
TimestampType())
transacoes = transacoes.withColumn('tr_Vencimento', convert_date(transacoes.
*tr_Vencimento*))
--------------------------
the sample is
-------------------------
+-----------------+----------------+-----------------+--------+------------------+-----------+-----------------+---------------------+------------------+--------------+----------------+-------------+-------------+----------------------+----------------------------+--------------------+--------+--------+------------------+----------------+--------+----------+-----------------+----------+
|tr_NumeroContrato|tr_TipoDocumento|
*tr_Vencimento*|tr_Valor|tr_DataRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximoCorr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr_ValorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
+-----------------+----------------+-----------------+--------+------------------+-----------+-----------------+---------------------+------------------+--------------+----------------+-------------+-------------+----------------------+----------------------------+--------------------+--------+--------+------------------+----------------+--------+----------+-----------------+----------+
| 0000992600153001| |*Jul 20 2015 12:00*| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35|2015-07-20 12:00:...| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |*Abr 20 2015 12:00*| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Nov 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35|2015-11-20 12:00:...| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Dez 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Fev 20 2016 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Fev 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Jun 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35|2015-06-20 12:00:...| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Ago 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Jan 20 2016 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35|2016-01-20 12:00:...| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Jan 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35|2015-01-20 12:00:...| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Set 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Mai 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Out 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35| null| null|
null| null| null| null| null|
null| null|
| 0000992600153001| |Mar 20 2015 12:00| 254.35|
null| null| null| null|
null| 0| null| null| null|
null| 254.35|2015-03-20 12:00:...| null|
null| null| null| null| null|
null| null|
+-----------------+----------------+-----------------+--------+------------------+-----------+-----------------+---------------------+------------------+--------------+----------------+-------------+-------------+----------------------+----------------------------+--------------------+--------+--------+------------------+----------------+--------+----------+-----------------+----------+
-------------------------
*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <[email protected]> wrote:
> Pls paste code and sample CSV
> I m guessing it has to do with formatting time?
> Kr
>
> On 8 Sep 2016 12:38 am, "Daniel Lopes" <[email protected]> wrote:
>
>> Hi,
>>
>> I'm* importing a few CSV*s with spark-csv package,
>> Always when I give a select at each one looks ok
>> But when i join then with sqlContext.sql give me this error
>>
>> all tables has fields timestamp
>>
>> joins are not with this dates
>>
>>
>> *Py4JJavaError: An error occurred while calling o643.showString.*
>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9
>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>> process()
>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>> serializer.dump_stream(func(split_index, iterator), outfile)
>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>> vs = list(itertools.islice(iterator, batch))
>> File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>> line 1563, in <lambda>
>> func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>> else time.mktime(dt.timetuple()))
>> *ValueError: year out of range *
>>
>> Any one knows this problem?
>>
>> Best,
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>