bogao007 commented on code in PR #50349: URL: https://github.com/apache/spark/pull/50349#discussion_r2011036886
########## python/pyspark/sql/pandas/types.py: ########## @@ -1424,6 +1424,12 @@ def _to_numpy_type(type: DataType) -> Optional["np.dtype"]: return np.dtype("float32") elif type == DoubleType(): return np.dtype("float64") + elif type == TimestampType(): Review Comment: @HyukjinKwon It seems [spark_type_to_pandas_dtype](https://github.com/apache/spark/blob/b2290444e9c1430c18efb5c8de1dce264034dd4d/python/pyspark/pandas/typedef/typehints.py#L296-L297) uses `datetime64[ns]` instead of `datetime64[us]`. This would still return the same error since Spark only supports microsecond when [converting from Arrow](https://github.com/apache/spark/blob/b2290444e9c1430c18efb5c8de1dce264034dd4d/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala#L90). We actually have [_to_corrected_pandas_type](https://github.com/apache/spark/blob/b2290444e9c1430c18efb5c8de1dce264034dd4d/python/pyspark/sql/pandas/types.py#L748-L777) in the same file to reuse, but it also uses nanosecond and would fail in this case. Any suggestions on reusing this but also fixing the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org