mihailom-db commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2242283304
Hi @panbingkun, thanks for taking initiative to push this work forward. The design of the table was discussed previously and the structure that was agreed upon should take a slightly different format. Let me list out things we need to include: - COLLATION_CATALOG (important for udf collations, for now it should be SYSTEM) - COLLATION_SCHEMA (important for udf collations, for now it should be BUILTIN) - COLLATION_NAME (Full name, with all identifiers, just like you did) ✅ - LANGUAGE (Name of the language that corresponds to the locale of given collation. Null if there is no backing language (e.g. for UTF8_* family of collations)) - COUNTRY (Name of the country that corresponds to the locale of given collation. Null if there is no backing country (e.g. for UTF8_* family of collations)) - ACCENT_SENSITIVITY (ACCENT_SENSITIVE/ACCENT_INSENSITIVE) - CASE_SENSITIVITY (CASE_SENSITIVE/CASE_INSENSITIVE) - PAD_ATTRIBUTE (Attribute affects whether leading or trailing spaces are significant in string comparisons. Currently always NO_PAD) - ICU_VERSION (Null if not icu collation) (✅, partially done, just switch UTF8_* family to null) All fields should be of string type and only language, country and version should be nullable. Please let me know if you have any additional questions and we can work through this PR together. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org