2416210017 commented on PR #4947:
URL: https://github.com/apache/seatunnel/pull/4947#issuecomment-1706296031
Although the implementation supports string types as partitioning keys, this
design is not very reasonable. Firstly, it affects the table_ The MD5 hash
function is applied to each value in the name column, and the obtained hash
value is modulo 10, followed by an absolute value. Only rows with a result
equal to 1 will be selected.
For example, the specified partition is 10:
The actual SQL executed in the business library is:
partition 1:
SELECT * FROM (
select * from metastore_bdc.collect_dct_table_info
) tt where ABS(MD5(table_name) % 10) = 1;
partition 2:
SELECT * FROM (
select * from metastore_bdc.collect_dct_table_info
) tt where ABS(MD5(table_name) % 10) = 2;
。。。

As shown in the figure, this type of query runs through the entire table in
the business library and does not utilize index keys, resulting in no
performance improvement.
Suggested reference: Sqoop's method of string segmentation, digitizing
existing Unicode characters
Reference link:https://blog.csdn.net/fyhailin/article/details/79069475
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]