Hi Dan

I think using Flink SQL should be able to meet your needs.

You can write a Flink Jar program. Accept different directories, schemas,
mappings, and sink tables to generate DDL and DML.

Assuming you have two directories:

directory1 -> f1, f2, f3, f4 -> iceberg1
directory2 -> f1, f2, f3 -> iceberg2

DDL can be generated roughly as follows.

CREATE TABLE s3_table1 (
  f1 varchar,
  f2 varchar,
  f3 varchar,
  f4 varchar
) with (
  'connector' = 's3://dir1'
  .....
);

CREATE TABLE s3_table2 (
  f1 varchar,
  f2 varchar,
  f3 varchar
) with (
  'connector' = 's3://dir2',
 .......
);


Based on your MAPPING selection of the fields you need and then generate
DML.

INSERT INTO iceberg_catalog.iceberg_database1.tb1 SELECT
f1,f2,f3 FROM s3_table;

INSERT INTO iceberg_catalog.iceberg_database.tb12 SELECT
f11,f22 FROM s32_table;


Of course,this is my understanding of your requirements,I don't know if it
meets your scenario.


Best regards,
Feng


On Fri, Nov 24, 2023 at 3:02 AM Oxlade, Dan <dan.oxl...@troweprice.com>
wrote:

> Thanks Feng,
>
> I think my challenge (and why I expected I’d need to use Java) is that
> there will be parquet files with different schemas landing in the s3 bucket
> - so I don’t want to hard-code the schema in a sql table definition.
>
> I’m not sure if this is even possible? Maybe I would have to write a job
> that accepts the schema, directory and iceberg target table as params and
> start instances of the job through the job api.
>
> Unless reading the parquet to a temporary table  doesn’t need the schema
> definition? I couldn't really work things out from the links.
>
> Dan
> ------------------------------
> *From:* Feng Jin <jinfeng1...@gmail.com>
> *Sent:* Thursday, November 23, 2023 6:49:11 PM
> *To:* Oxlade, Dan <dan.oxl...@troweprice.com>
> *Cc:* user@flink.apache.org <user@flink.apache.org>
> *Subject:* [EXTERNAL] Re: flink s3[parquet] -> s3[iceberg]
>
> Hi Oxlade
>
> I think using Flink SQL can conveniently fulfill your requirements.
>
> For S3 Parquet files, you can create a temporary table using a filesystem
> connector[1] .
> For Iceberg tables, FlinkSQL can easily integrate with the Iceberg
> catalog[2].
>
> Therefore, you can use Flink SQL to export S3 files to Iceberg.
>
> If you only need field mapping or transformation, I believe using Flink
> SQL + UDF (User-Defined Functions) would be sufficient to meet your needs.
>
>
> [1].   
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#directory-watching
> [nightlies.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.org_flink_flink-2Ddocs-2Dmaster_docs_connectors_table_filesystem_-23directory-2Dwatching&d=DwMFaQ&c=NUhaNIajfB1frln1iJ2Yk7NG56jrODI6LbjgSoSeFoE&r=DniGlstAN2EzNsLZ9xC7twBZPQnWEW90QWwFv-Z9BnI&m=nzRd1qdU-XluyJusASfUSi-QLOYVOWY6EvDAlicmzJgVY16Jtg60C5aADMd_oLJg&s=rnrUmbL_i3hK6kK_eWoXjz-67_xsc14c1oUxQrwK75A&e=>
> [2].  
> https://iceberg.apache.org/docs/latest/flink-connector/#table-managed-in-hadoop-catalog
> [iceberg.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__iceberg.apache.org_docs_latest_flink-2Dconnector_-23table-2Dmanaged-2Din-2Dhadoop-2Dcatalog&d=DwMFaQ&c=NUhaNIajfB1frln1iJ2Yk7NG56jrODI6LbjgSoSeFoE&r=DniGlstAN2EzNsLZ9xC7twBZPQnWEW90QWwFv-Z9BnI&m=nzRd1qdU-XluyJusASfUSi-QLOYVOWY6EvDAlicmzJgVY16Jtg60C5aADMd_oLJg&s=gbHDXpaow809oo_go0V99A3jIkA2KMh_mINPyNBwcDs&e=>
>
>
> Best,
> Feng
>
>
> On Thu, Nov 23, 2023 at 11:23 PM Oxlade, Dan <dan.oxl...@troweprice.com>
> wrote:
>
> Hi all,
>
>
>
> I’m attempting to create a POC in flink to create a pipeline to stream
> parquet to a data warehouse in iceberg format.
>
>
>
> Ideally – I’d like to watch a directory in s3 (minio locally) and stream
> those to iceberg, doing the appropriate schema mapping/translation.
>
>
>
> I guess first; does this sound like a crazy idea?
>
> Assuming not is anyone able to share examples that might get me going.
> I’ve found lots of iceberg and flink sql examples but I think I’ll need
> something in java to do the schema mapping. Also some examples reading
> parquet for s3 seem a little hard to come by.
>
>
>
> I’m aware I’ll need a catalog, I can use nessie for the prototype. I’m
> also trying to use minio to get this all working locally but this might
> just be adding complexity at the moment.
>
>
>
> TIA
>
> Dan
>
> T. Rowe Price International Ltd (registered number 3957748) is registered
> in England and Wales with its registered office at Warwick Court, 5
> Paternoster Square, London EC4M 7DX. T. Rowe Price International Ltd is
> authorised and regulated by the Financial Conduct Authority. The company
> has a branch in Dubai International Financial Centre (regulated by the DFSA
> as a Representative Office).
>
> T. Rowe Price (including T. Rowe Price International Ltd and its
> affiliates) and its associates do not provide legal or tax advice. Any
> tax-related discussion contained in this e-mail, including any attachments,
> is not intended or written to be used, and cannot be used, for the purpose
> of (i) avoiding any tax penalties or (ii) promoting, marketing, or
> recommending to any other party any transaction or matter addressed herein.
> Please consult your independent legal counsel and/or professional tax
> advisor regarding any legal or tax issues raised in this e-mail.
>
> The contents of this e-mail and any attachments are intended solely for
> the use of the named addressee(s) and may contain confidential and/or
> privileged information. Any unauthorized use, copying, disclosure, or
> distribution of the contents of this e-mail is strictly prohibited by the
> sender and may be unlawful. If you are not the intended recipient, please
> notify the sender immediately and delete this e-mail.
>
> T. Rowe Price International Ltd (registered number 3957748) is registered
> in England and Wales with its registered office at Warwick Court, 5
> Paternoster Square, London EC4M 7DX. T. Rowe Price International Ltd is
> authorised and regulated by the Financial Conduct Authority. The company
> has a branch in Dubai International Financial Centre (regulated by the DFSA
> as a Representative Office).
>
> T. Rowe Price (including T. Rowe Price International Ltd and its
> affiliates) and its associates do not provide legal or tax advice. Any
> tax-related discussion contained in this e-mail, including any attachments,
> is not intended or written to be used, and cannot be used, for the purpose
> of (i) avoiding any tax penalties or (ii) promoting, marketing, or
> recommending to any other party any transaction or matter addressed herein.
> Please consult your independent legal counsel and/or professional tax
> advisor regarding any legal or tax issues raised in this e-mail.
>
> The contents of this e-mail and any attachments are intended solely for
> the use of the named addressee(s) and may contain confidential and/or
> privileged information. Any unauthorized use, copying, disclosure, or
> distribution of the contents of this e-mail is strictly prohibited by the
> sender and may be unlawful. If you are not the intended recipient, please
> notify the sender immediately and delete this e-mail.
>

Reply via email to