[ https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-32564: ----------------------------------- Labels: pull-request-available stale-assigned (was: pull-request-available) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issue is assigned but has not received an update in 30 days, so it has been labeled "stale-assigned". If you are still working on the issue, please remove the label and add a comment updating the community on your progress. If this issue is waiting on feedback, please consider this a reminder to the committer/reviewer. Flink is a very active project, and so we appreciate your patience. If you are no longer working on the issue, please unassign yourself so someone else may work on it. > Support cast from BYTES to NUMBER > --------------------------------- > > Key: FLINK-32564 > URL: https://issues.apache.org/jira/browse/FLINK-32564 > Project: Flink > Issue Type: Sub-task > Reporter: Hanyu Zheng > Assignee: Hanyu Zheng > Priority: Major > Labels: pull-request-available, stale-assigned > > We are dealing with a task that requires casting from the BYTES type to > BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert > this string to BYTES and then cast the result to BIGINT with the following > SQL query: > {code:java} > SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code} > However, an issue arises when executing this query, likely due to an error in > the conversion between BYTES and BIGINT. We aim to identify and rectify this > issue so our query can run correctly. The tasks involved are: > # Investigate and identify the specific reason for the failure of conversion > from BYTES to BIGINT. > # Design and implement a solution to ensure our query can function correctly. > # Test this solution across all required scenarios to guarantee its > functionality. > > see also > 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER > types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or > type conversion operator (::) for performing the conversion. URL: > [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS] > 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER > types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT > functions for performing the conversion. URL: > [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html] > 3. Microsoft SQL Server: SQL Server supports casting from BYTES type > (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use > CAST or CONVERT functions for performing the conversion. URL: > [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql] > 4. Oracle Database: Oracle supports casting from RAW type (equivalent to > BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the > TO_NUMBER function for performing the conversion. URL: > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html] > 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or > ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by > using the {{cast}} function. URL: > [https://spark.apache.org/docs/latest/api/sql/#cast] > > for the problem of bytes order may arise (little vs big endian). > > 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with > byte order issues across different platforms and architectures. The Hadoop > File System (HDFS) uses a technique called "sequence files," which include > metadata to describe the byte order of the data. This metadata ensures that > data is read and written correctly, regardless of the endianness of the > platform. > 2. Apache Avro: Avro is a data serialization system used by various big data > frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding > format that includes a marker for the byte order. This allows Avro to handle > endianness issues seamlessly when data is exchanged between systems with > different byte orders. > 3. Apache Parquet: Parquet is a columnar storage format used in big data > processing frameworks like Apache Spark. Parquet uses a little-endian format > for encoding numeric values, which is the most common format on modern > systems. When reading or writing Parquet data, data processing engines > typically handle any necessary byte order conversions transparently. > 4. Apache Spark: Spark is a popular big data processing engine that can > handle data on distributed systems. It relies on the underlying data formats > it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These > formats are designed to handle byte order correctly, ensuring that Spark can > handle data correctly on different platforms. > 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by > Google Cloud. When dealing with binary data and endianness, BigQuery relies > on the data encoding format. For example, when loading data in Avro or > Parquet formats, these formats already include byte order information, > allowing BigQuery to handle data across different platforms correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)