mihailom-db commented on PR #47364:
URL: https://github.com/apache/spark/pull/47364#issuecomment-2242283304

   Hi @panbingkun, thanks for taking initiative to push this work forward. The 
design of the table was discussed previously and the structure that was agreed 
upon should take a slightly different format.
   Let me list out things we need to include:
   
   - COLLATION_CATALOG (important for udf collations, for now it should be 
SYSTEM)
   - COLLATION_SCHEMA (important for udf collations, for now it should be 
BUILTIN)
   - COLLATION_NAME (Full name, with all identifiers, just like you did) ✅ 
   - LANGUAGE (Name of the language that corresponds to the locale of given 
collation. Null if there is no backing language (e.g. for UTF8_* family of 
collations))
   - COUNTRY (Name of the country that corresponds to the locale of given 
collation. Null if there is no backing country (e.g. for UTF8_* family of 
collations))
   - ACCENT_SENSITIVITY (ACCENT_SENSITIVE/ACCENT_INSENSITIVE)
   - CASE_SENSITIVITY (CASE_SENSITIVE/CASE_INSENSITIVE)
   - PAD_ATTRIBUTE (Attribute affects whether leading or trailing spaces are 
significant in string comparisons. Currently always NO_PAD)
   - ICU_VERSION (Null if not icu collation) (✅, partially done, just switch 
UTF8_* family to null)
   
   All fields should be of string type and only language, country and version 
should be nullable.
   
   Please let me know if you have any additional questions and we can work 
through this PR together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to