walterddr edited a comment on issue #8045:
URL: https://github.com/apache/pinot/issues/8045#issuecomment-1022709088


   Context
   ===
   
   for more context. let's say user want to configure this during ingest:
   ```
       "dateTimeFieldSpecs": [{
         "name": "Date",
         "dataType": "STRING",
         "format" : "1:SECONDS:SIMPLE_DATE_FORMAT:MM/dd/yyyy HH:mm:ss a",
         "granularity": "1:HOURS"
       }]
   ```
   
   they have to set the `"dataType"` to `STRING` because one want the result of 
   ```
   Select Date From myTable 
   ```
   to be a string that conforms with the SDF specified . (for example the 
STRING is directly feed into some downstream program)
   
   Challenge
   ===
   However,
   1. in SQL database, setting a column to STRING type means we need to support 
>= and <= in the raw data format.
   2. in Pinot, we cant support this SDF as time column format because they are 
not both lexical and time order consistent (e.g. `02/01/2021` comes after 
`01/29/2022` in string-ordering but before in timestamp-ordering), if we use 
this field as time field for partitioning real-time and offline table, we will 
have wrong results because the underlying ordering is STRING-based
   3. one can also configure the `dataType` to `TIMESTAMP` and intrinsically 
convert to String in query, but the result has to be the ISO SQL standard 
yyyy-mm-ddTHH:MM:SS format, which might not be what the user wanted.
   
   
   Problem Statement
   ===
   We want to create some kind of ingestion configurable DataType (let's name 
it `DateTime`) that (1) returns a String that conforms with the ingestion 
configured SDF; and (2) ordered by EPOCH ordering;
   
   So that
   ```
   SELECT myDateTimeType FROM myTable ORDER BY myDateTimeType
   ```
   returns 
   ```
   02/01/2021 00:00:00
   01/01/2022 00:00:00
   ```
   
   Proposal
   ===
   We can either store the actual data in STRING or LONG. but 
   1. if we store it in raw string format and force it to order by converted 
EPOCH, this requires us to convert it every time making a compare. very costly.
   2. if we were to store it as LONG which is natively sorted in EPOCH, and 
only do the conversion when query: we need to store the original SDF configured 
by user during ingestion somewhere, so we need to find a way to let Pinot know 
during query time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to