GitHub user ZTE-EBASE added a comment to the discussion: Extend the gpfdist
tool to support SFTP/HDFS protocols for high-performance multi-source data
ingestion
Minimize kernel code changes by reusing the gpfdist protocol. Add an sftp/hdfs
protocol marker and use it to call the corresponding functions for data
reading. Meanwhile, it can address the issue of data files not being on the
same machine as gpfdist.
For example:
CREATE EXTERNAL TABLE ext1 (d varchar(20)) location
('gpfdist://ip:port/<sftp://sftp-user:passwd@sftp-hostip:sftp-port/file.csv>')
format 'csv' (DELIMITER '|');
CREATE EXTERNAL TABLE ext2 (d varchar(20)) location
('gpfdist://ip:port/<hdfs://namenode:port/file-path.parquet>') format 'csv'
(DELIMITER '|');
GitHub link:
https://github.com/apache/cloudberry/discussions/1205#discussioncomment-13636999
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]