Caizhi Weng created FLINK-26277:
-----------------------------------

             Summary: Java docs & implementation of TimestampColumnReader are 
contradicting
                 Key: FLINK-26277
                 URL: https://issues.apache.org/jira/browse/FLINK-26277
             Project: Flink
          Issue Type: Bug
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
    Affects Versions: 1.15.0
            Reporter: Caizhi Weng


(Not sure if this should be classified as a bug, but I don't see a more proper 
type.)

The Java docs of {{TimestampColumnReader}} states that
{code:java}
/**
 * Timestamp {@link ColumnReader}. We only support INT96 bytes now, 
julianDay(4) + nanosOfDay(8).
 * See 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
 * TIMESTAMP_MILLIS and TIMESTAMP_MICROS are the deprecated ConvertedType.
 */
{code}

However the implementation goes like this
{code:java}
ByteBuffer buffer = readDataBuffer(12);
column.setTimestamp(
        rowId + i,
        int96ToTimestamp(utcTimestamp, buffer.getLong(), buffer.getInt()));
{code}

This implementation contradicts the Java docs because {{nanosOfDay(8)}} 
actually precedes {{julianDay(4)}}.

This implementation is also confusing as it relies on the evaluation order of 
the argument list. Although it is specified in the [Java Language 
Specification|https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.7.4]
 that argument lists are evaluated from left to right, it is not true for other 
languages (for example c++ does not specify this and may evaluate the list in 
arbitrary order).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to