[ 
https://issues.apache.org/jira/browse/SPARK-57808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57808:
-----------------------------
    Shepherd: Max Gekk

> Document the microsecond-only limitation of typed encoders and UDFs for 
> nanosecond timestamps
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57808
>                 URL: https://issues.apache.org/jira/browse/SPARK-57808
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
> precision).
> h2. Decision
> Document only, not implement (per review).
> h2. Problem
> Typed Dataset encoders bind {{java.time}} to microsecond SQL types: 
> {{ScalaReflection}} / {{JavaTypeInference}} and {{Encoders.INSTANT}} / 
> {{Encoders.LOCALDATETIME}} map to {{TimestampType}} / {{TimestampNTZType}} 
> (ScalaReflection.scala ~L331-332, JavaTypeInference.scala ~L99-101). As a 
> result:
> * {{ds.as[CaseClassWithInstant]}} and schema-less 
> {{createDataFrame(Seq(instant))}} do not preserve nanoseconds;
> * typed Scala/Java UDFs ({{udf((i: Instant) => ...)}}) coerce nanosecond 
> inputs to microsecond at the UDF boundary;
> * Kryo encoders bypass the SQL type system entirely.
> Nanosecond precision is available only via the schema-driven {{Row}} / 
> {{Encoders.row(schema)}} path (SPARK-57033).
> h2. Goal
> Document these as known limitations and describe the schema-driven 
> workaround. No encoder/UDF code changes.
> h2. Scope
> Docs only (nanosecond-timestamp type reference / known limitations), 
> cross-linked with the migration guide and the SparkR non-support note 
> (SPARK-57807).
> h2. Acceptance criteria
> * Docs state the typed-encoder / {{ds.as[T]}} / typed-UDF microsecond-only 
> behavior and how to retain nanoseconds via schema-driven APIs.
> h2. Testing
> Docs only.
> h2. Dependencies
> None (documentation only). Cross-links SPARK-57807 (SparkR non-support) and 
> the nanosecond-timestamp docs/migration sub-task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to