Hi Team,
I am getting the below error when reading a column with a value with JSON
string.
json_schema_ctx_rdd = record_df.rdd.map(lambda row: row.contexts_parsed)
spark.read.option("mode", "PERMISSIVE").option("inferSchema",
"true").option("inferTimestamp", "false").json(json_schema_ctx_rdd)
The contexts_parsed json string contains dynamic columns so not sure
which timestamp column is bad. How to identify the bad record and resolve
this issue?
File "/usr/lib/spark/python/pyspark/worker.py", line 686, in main
process()
File "/usr/lib/spark/python/pyspark/worker.py", line 678, in process
serializer.dump_stream(out_iter, outfile)
File "/usr/lib/spark/python/pyspark/serializers.py", line 145, in
dump_stream
for obj in iterator:
File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 288, in func
for x in iterator:
File "/usr/lib/spark/python/pyspark/serializers.py", line 151, in
load_stream
yield self._read_with_length(stream)
File "/usr/lib/spark/python/pyspark/serializers.py", line 173, in
_read_with_length
return self.loads(obj)
File "/usr/lib/spark/python/pyspark/serializers.py", line 452, in loads
return pickle.loads(obj, encoding=encoding)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1729, in <lambda>
return lambda *a: dataType.fromInternal(a)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 823, in
fromInternal
for f, v, c in zip(self.fields, obj, self._needConversion)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 823, in <listcomp>
for f, v, c in zip(self.fields, obj, self._needConversion)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 594, in
fromInternal
return self.dataType.fromInternal(obj)
File "/usr/lib/spark/python/pyspark/sql/types.py", line 223, in
fromInternal
return datetime.datetime.fromtimestamp(ts //
1000000).replace(microsecond=ts % 1000000)
ValueError: year -1976 is out of range
Appreciate any guidance.
Cheers!
Manoj.