Re: [DISCUSS] Support detecting table schema from external files.

Benchao Li Tue, 16 Jan 2024 02:58:28 -0800

Thanks Yisha for bringing up this discussion. Schema inferring is a
very interesting and useful feature, especially when it comes to
formats with well defined schemas such as Protobuf/Parquet. I'm
looking forward to the FLIP.


Yisha Zhou <zhouyi...@bytedance.com.invalid> 于2024年1月15日周一 16:29写道：
>
> Hi dev,
>
> Currently,  we are used to creating a table by listing all physical columns 
> or using like syntax to reuse the table schema in Catalogs.
> However, in our company there are many cases that the messages in the 
> external systems are with very complex schema. The worst
> case is that some protobuf data has even thousands of fields in it.
>
> In these cases, listing fields in the DDL will be a very hard work. Creating 
> and updating such complex schema in Catalogs will also cost a lot.
> Therefore, I’d like to introduce an ability for detecting table schema from 
> external files in DDL.
>
> A good precedent from SnowFlake[1] works like below:
>
> CREATE TABLE mytable
>   USING TEMPLATE (
>     SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
>       FROM TABLE(
>         INFER_SCHEMA(
>           LOCATION=>'@mystage/json/',
>           FILE_FORMAT=>'my_json_format'
>         )
>       ));
>
> The INFER_SCHEMA is a table function to 'automatically detects the file 
> metadata schema in a set of staged data files that contain
> semi-structured data and retrieves the column definitions.’ The files can be 
> in Parquet, Avro, ORC, JSON, and CSV.
>
> We don’t need to follow the syntax, but the functionality is exactly what I 
> want. In addition, the file can be more than just semi-structured data
> file. It can be metadata file. For example, a .proto file, a .thrift file.
>
> As it will be a big feature which deserves a FLIP to describe it in detail. 
> I'm forward to your feedback and suggestions before I start to do it.
>
> Best,
> Yisha
>
> [1]https://docs.snowflake.com/en/sql-reference/functions/infer_schema 
> <https://docs.snowflake.com/en/sql-reference/functions/infer_schema>



-- 

Best,
Benchao Li

Re: [DISCUSS] Support detecting table schema from external files.

Reply via email to