Thanks Mike,

A good way to debug! Was that already!

Best,

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Thu, Sep 8, 2016 at 2:26 PM, Mike Metzger <m...@flexiblecreations.com>
wrote:

> My guess is there's some row that does not match up with the expected
> data.  While slower, I've found RDDs to be easier to troubleshoot this kind
> of thing until you sort out exactly what's happening.
>
> Something like:
>
> raw_data = sc.textFile("<path to text file(s)>")
> rowcounts = raw_data.map(lambda x: (len(x.split(",")),
> 1)).reduceByKey(lambda x,y: x+y)
> rowcounts.take(5)
>
> badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected number
> of columns>)
> if badrows.count() > 0:
>     badrows.saveAsTextFile("<path to malformed.csv>")
>
>
> You should be able to tell if there are any rows with column counts that
> don't match up (the thing that usually bites me with CSV conversions).
> Assuming these all match to what you want, I'd try mapping the unparsed
> date column out to separate fields and try to see if a year field isn't
> matching the expected values.
>
> Thanks
>
> Mike
>
>
> On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <dan...@onematch.com.br>
> wrote:
>
>> Thanks,
>>
>> I *tested* the function offline and works
>> Tested too with select * from after convert the data and see the new data
>> good
>> *but* if I *register as temp table* to *join other table* stilll shows *the
>> same error*.
>>
>> ValueError: year out of range
>>
>> Best,
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> Daniel
>>> Test the parse date offline to make sure it returns what you expect
>>> If it does   in spark shell create a df with 1 row only and run ur UDF.
>>> U should b able to see issue
>>> If not send me a reduced CSV file at my email and I give it a try this
>>> eve ....hopefully someone else will b able to assist in meantime
>>> U don't need to run a full spark app to debug issue
>>> Ur problem. Is either in the parse date or in what gets passed to the UDF
>>> Hth
>>>
>>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <dan...@onematch.com.br> wrote:
>>>
>>>> Thanks Marco for your response.
>>>>
>>>> The field came encoded by SQL Server in locale pt_BR.
>>>>
>>>> The code that I am formating is:
>>>>
>>>> --------------------------
>>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>>>     try:
>>>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>>>         return datetime.strptime(argument, format_date)
>>>>     except:
>>>>         return None
>>>>
>>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>>>> TimestampType())
>>>>
>>>> transacoes = transacoes.withColumn('tr_Vencimento',
>>>> convert_date(transacoes.*tr_Vencimento*))
>>>>
>>>> --------------------------
>>>>
>>>> the sample is
>>>>
>>>> -------------------------
>>>> +-----------------+----------------+-----------------+------
>>>> --+------------------+-----------+-----------------+--------
>>>> -------------+------------------+--------------+------------
>>>> ----+-------------+-------------+----------------------+----
>>>> ------------------------+--------------------+--------+-----
>>>> ---+------------------+----------------+--------+----------+
>>>> -----------------+----------+
>>>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*|tr_Valor|tr_Dat
>>>> aRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximo
>>>> Corr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr_V
>>>> alorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
>>>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>>>> +-----------------+----------------+-----------------+------
>>>> --+------------------+-----------+-----------------+--------
>>>> -------------+------------------+--------------+------------
>>>> ----+-------------+-------------+----------------------+----
>>>> ------------------------+--------------------+--------+-----
>>>> ---+------------------+----------------+--------+----------+
>>>> -----------------+----------+
>>>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>>>          null|       null|             null|                 null|
>>>>      null|             0|            null|         null|         null|
>>>>              null|                      254.35|2015-07-20 12:00:...|
>>>>  null|    null|              null|            null|    null|      null|
>>>>         null|      null|
>>>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>>>          null|       null|             null|                 null|
>>>>      null|             0|            null|         null|         null|
>>>>              null|                      254.35|                null|
>>>>  null|    null|              null|            null|    null|      null|
>>>>         null|      null|
>>>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-11-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-06-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2016-01-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-01-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-03-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> +-----------------+----------------+-----------------+------
>>>> --+------------------+-----------+-----------------+--------
>>>> -------------+------------------+--------------+------------
>>>> ----+-------------+-------------+----------------------+----
>>>> ------------------------+--------------------+--------+-----
>>>> ---+------------------+----------------+--------+----------+
>>>> -----------------+----------+
>>>>
>>>> -------------------------
>>>>
>>>> *Daniel Lopes*
>>>> Chief Data and Analytics Officer | OneMatch
>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>
>>>> www.onematch.com.br
>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>
>>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Pls paste code and sample CSV
>>>>> I m guessing it has to do with formatting time?
>>>>> Kr
>>>>>
>>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <dan...@onematch.com.br> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm* importing a few CSV*s with spark-csv package,
>>>>>> Always when I give a select at each one looks ok
>>>>>> But when i join then with sqlContext.sql give me this error
>>>>>>
>>>>>> all tables has fields timestamp
>>>>>>
>>>>>> joins are not with this dates
>>>>>>
>>>>>>
>>>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 
>>>>>> 54.9
>>>>>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>>> call last):
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>>>     process()
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>>>     vs = list(itertools.islice(iterator, batch))
>>>>>>   File 
>>>>>> "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>>>> line 1563, in <lambda>
>>>>>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)),
>>>>>> it)
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>>>     else time.mktime(dt.timetuple()))
>>>>>> *ValueError: year out of range  *
>>>>>>
>>>>>> Any one knows this problem?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> *Daniel Lopes*
>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>
>>>>>> www.onematch.com.br
>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>
>>>>>
>>>>
>>
>

Reply via email to