Thanks Mike, A good way to debug! Was that already!
Best, *Daniel Lopes* Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> On Thu, Sep 8, 2016 at 2:26 PM, Mike Metzger <m...@flexiblecreations.com> wrote: > My guess is there's some row that does not match up with the expected > data. While slower, I've found RDDs to be easier to troubleshoot this kind > of thing until you sort out exactly what's happening. > > Something like: > > raw_data = sc.textFile("<path to text file(s)>") > rowcounts = raw_data.map(lambda x: (len(x.split(",")), > 1)).reduceByKey(lambda x,y: x+y) > rowcounts.take(5) > > badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected number > of columns>) > if badrows.count() > 0: > badrows.saveAsTextFile("<path to malformed.csv>") > > > You should be able to tell if there are any rows with column counts that > don't match up (the thing that usually bites me with CSV conversions). > Assuming these all match to what you want, I'd try mapping the unparsed > date column out to separate fields and try to see if a year field isn't > matching the expected values. > > Thanks > > Mike > > > On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <dan...@onematch.com.br> > wrote: > >> Thanks, >> >> I *tested* the function offline and works >> Tested too with select * from after convert the data and see the new data >> good >> *but* if I *register as temp table* to *join other table* stilll shows *the >> same error*. >> >> ValueError: year out of range >> >> Best, >> >> *Daniel Lopes* >> Chief Data and Analytics Officer | OneMatch >> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes >> >> www.onematch.com.br >> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> >> >> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mmistr...@gmail.com> >> wrote: >> >>> Daniel >>> Test the parse date offline to make sure it returns what you expect >>> If it does in spark shell create a df with 1 row only and run ur UDF. >>> U should b able to see issue >>> If not send me a reduced CSV file at my email and I give it a try this >>> eve ....hopefully someone else will b able to assist in meantime >>> U don't need to run a full spark app to debug issue >>> Ur problem. Is either in the parse date or in what gets passed to the UDF >>> Hth >>> >>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <dan...@onematch.com.br> wrote: >>> >>>> Thanks Marco for your response. >>>> >>>> The field came encoded by SQL Server in locale pt_BR. >>>> >>>> The code that I am formating is: >>>> >>>> -------------------------- >>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'): >>>> try: >>>> locale.setlocale(locale.LC_TIME, 'pt_BR.utf8') >>>> return datetime.strptime(argument, format_date) >>>> except: >>>> return None >>>> >>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'), >>>> TimestampType()) >>>> >>>> transacoes = transacoes.withColumn('tr_Vencimento', >>>> convert_date(transacoes.*tr_Vencimento*)) >>>> >>>> -------------------------- >>>> >>>> the sample is >>>> >>>> ------------------------- >>>> +-----------------+----------------+-----------------+------ >>>> --+------------------+-----------+-----------------+-------- >>>> -------------+------------------+--------------+------------ >>>> ----+-------------+-------------+----------------------+---- >>>> ------------------------+--------------------+--------+----- >>>> ---+------------------+----------------+--------+----------+ >>>> -----------------+----------+ >>>> |tr_NumeroContrato|tr_TipoDocumento| *tr_Vencimento*|tr_Valor|tr_Dat >>>> aRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximo >>>> Corr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr_V >>>> alorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante| >>>> tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_ >>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia| >>>> +-----------------+----------------+-----------------+------ >>>> --+------------------+-----------+-----------------+-------- >>>> -------------+------------------+--------------+------------ >>>> ----+-------------+-------------+----------------------+---- >>>> ------------------------+--------------------+--------+----- >>>> ---+------------------+----------------+--------+----------+ >>>> -----------------+----------+ >>>> | 0000992600153001| |*Jul 20 2015 12:00*| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35|2015-07-20 12:00:...| >>>> null| null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |*Abr 20 2015 12:00*| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| >>>> null| null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Nov 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35|2015-11-20 12:00:...| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Dez 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Fev 20 2016 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Fev 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Jun 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35|2015-06-20 12:00:...| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Ago 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Jan 20 2016 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35|2016-01-20 12:00:...| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Jan 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35|2015-01-20 12:00:...| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Set 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Mai 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Out 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35| null| null| >>>> null| null| null| null| null| >>>> null| null| >>>> | 0000992600153001| |Mar 20 2015 12:00| 254.35| >>>> null| null| null| null| >>>> null| 0| null| null| null| >>>> null| 254.35|2015-03-20 12:00:...| null| >>>> null| null| null| null| null| >>>> null| null| >>>> +-----------------+----------------+-----------------+------ >>>> --+------------------+-----------+-----------------+-------- >>>> -------------+------------------+--------------+------------ >>>> ----+-------------+-------------+----------------------+---- >>>> ------------------------+--------------------+--------+----- >>>> ---+------------------+----------------+--------+----------+ >>>> -----------------+----------+ >>>> >>>> ------------------------- >>>> >>>> *Daniel Lopes* >>>> Chief Data and Analytics Officer | OneMatch >>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes >>>> >>>> www.onematch.com.br >>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> >>>> >>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mmistr...@gmail.com> >>>> wrote: >>>> >>>>> Pls paste code and sample CSV >>>>> I m guessing it has to do with formatting time? >>>>> Kr >>>>> >>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <dan...@onematch.com.br> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm* importing a few CSV*s with spark-csv package, >>>>>> Always when I give a select at each one looks ok >>>>>> But when i join then with sqlContext.sql give me this error >>>>>> >>>>>> all tables has fields timestamp >>>>>> >>>>>> joins are not with this dates >>>>>> >>>>>> >>>>>> *Py4JJavaError: An error occurred while calling o643.showString.* >>>>>> : org.apache.spark.SparkException: Job aborted due to stage failure: >>>>>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task >>>>>> 54.9 >>>>>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036): >>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent >>>>>> call last): >>>>>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main >>>>>> process() >>>>>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process >>>>>> serializer.dump_stream(func(split_index, iterator), outfile) >>>>>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream >>>>>> vs = list(itertools.islice(iterator, batch)) >>>>>> File >>>>>> "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py", >>>>>> line 1563, in <lambda> >>>>>> func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), >>>>>> it) >>>>>> File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/ >>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal >>>>>> else time.mktime(dt.timetuple())) >>>>>> *ValueError: year out of range * >>>>>> >>>>>> Any one knows this problem? >>>>>> >>>>>> Best, >>>>>> >>>>>> *Daniel Lopes* >>>>>> Chief Data and Analytics Officer | OneMatch >>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes >>>>>> >>>>>> www.onematch.com.br >>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> >>>>>> >>>>> >>>> >> >