[ https://issues.apache.org/jira/browse/HIVE-28632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Araika Singh updated HIVE-28632: -------------------------------- Description: *Boolean Handling:* Prior to HIVE-21240, the JSON deserializer converted boolean strings without considering case sensitivity, resulting in {{false}} for any non-standard strings. Now, this behavior is case-sensitive, resulting in {{false}} for all non-conforming strings. To follow SQL standards, any string other than "false" (case-insensitive) should be treated as {{{}true{}}}. There are also inconsistencies between outputs when using an INSERT statement (which follows SQL standards) and loading a JSON value directly. These inconsistencies need to be fixed to ensure uniform output. *Binary Handling:* Previously, Hive parsed JSON nodes as raw strings, allowing users to directly load JSON text to achieve desired results. Post HIVE-21240, Hive now mandates base64-encoded JSON texts, throwing errors for any other format. It's important to note that INSERT statements work seamlessly as the values are serialized to base64 and then deserialized. Hive should be capable of detecting if a JSON node is base64-encoded or not; if not, it should default to deserializing as a raw string. +To enhance user experience and maintain SQL standard compliance, we need to address the following:+ # Ensure case-sensitive handling of boolean strings in JSON deserialization and resolve inconsistencies between INSERT statements and direct JSON loading. # Implement automatic detection and appropriate deserialization of JSON nodes, distinguishing between base64-encoded and raw strings. was: *Boolean Handling:* Prior to HIVE-21240, the JSON deserializer implicitly converted boolean strings without considering case sensitivity, resulting in {{false}} for any non-standard strings. With the recent update, the behavior has been modified to be case-sensitive, resulting in {{false}} for all non-conforming strings. Furthermore, to align with SQL standards, any string other than "false" (case-insensitive) should be treated as {{{}true{}}}. Additionally, discrepancies were observed between outputs when using an INSERT(follows SQL standard) statement versus loading a JSON value directly. To adhere to SQL standards, these inconsistencies need to be resolved to ensure uniform output. *Binary Handling:* Previously, Hive parsed JSON nodes as raw strings, allowing users to directly load JSON text to achieve desired results. Post HIVE-21240, Hive now mandates base64-encoded JSON texts, throwing errors for any other format. It's important to note that INSERT statements work seamlessly as the values are serialized to base64 and then deserialized. Hive should be capable of detecting if a JSON node is base64-encoded or not; if not, it should default to deserializing as a raw string. +To enhance user experience and maintain SQL standard compliance, we need to address the following:+ # Ensure case-sensitive handling of boolean strings in JSON deserialization and resolve inconsistencies between INSERT statements and direct JSON loading. # Implement automatic detection and appropriate deserialization of JSON nodes, distinguishing between base64-encoded and raw strings. > Fix issues in JSON SerDe implementations related to Boolean, Binary data types > ------------------------------------------------------------------------------ > > Key: HIVE-28632 > URL: https://issues.apache.org/jira/browse/HIVE-28632 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Reporter: Araika Singh > Assignee: Araika Singh > Priority: Major > Labels: pull-request-available > > *Boolean Handling:* Prior to HIVE-21240, the JSON deserializer converted > boolean strings without considering case sensitivity, resulting in {{false}} > for any non-standard strings. Now, this behavior is case-sensitive, resulting > in {{false}} for all non-conforming strings. To follow SQL standards, any > string other than "false" (case-insensitive) should be treated as > {{{}true{}}}. There are also inconsistencies between outputs when using an > INSERT statement (which follows SQL standards) and loading a JSON value > directly. These inconsistencies need to be fixed to ensure uniform output. > *Binary Handling:* Previously, Hive parsed JSON nodes as raw strings, > allowing users to directly load JSON text to achieve desired results. Post > HIVE-21240, Hive now mandates base64-encoded JSON texts, throwing errors for > any other format. It's important to note that INSERT statements work > seamlessly as the values are serialized to base64 and then deserialized. Hive > should be capable of detecting if a JSON node is base64-encoded or not; if > not, it should default to deserializing as a raw string. > +To enhance user experience and maintain SQL standard compliance, we need to > address the following:+ > # Ensure case-sensitive handling of boolean strings in JSON deserialization > and resolve inconsistencies between INSERT statements and direct JSON loading. > # Implement automatic detection and appropriate deserialization of JSON > nodes, distinguishing between base64-encoded and raw strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)