rahil-c commented on code in PR #13650:
URL: https://github.com/apache/hudi/pull/13650#discussion_r2287199630
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkBaseIndexSupport.scala:
##########
@@ -173,18 +188,14 @@ abstract class SparkBaseIndexSupport(spark: SparkSession,
var recordKeyQueries: List[Expression] = List.empty
var compositeRecordKeys: List[String] = List.empty
val recordKeyOpt = getRecordKeyConfig
-
val isComplexRecordKey = {
+ val keyGeneratorClassName =
metaClient.getTableConfig.getKeyGeneratorClassName
val fieldCount = recordKeyOpt.map(recordKeys =>
recordKeys.length).getOrElse(0)
- val encodeFieldNameConfig =
metaClient.getTableConfig.getProps.getProperty(
-
org.apache.hudi.config.HoodieWriteConfig.COMPLEX_KEYGEN_ENCODE_SINGLE_RECORD_KEY_FIELD_NAME.key(),
-
org.apache.hudi.config.HoodieWriteConfig.COMPLEX_KEYGEN_ENCODE_SINGLE_RECORD_KEY_FIELD_NAME.defaultValue().toString
- ).toBoolean
-
+ val isUsingComplexKeyGen = isComplexKeyGenerator(keyGeneratorClassName)
// Consider as complex if:
// 1. Multiple fields (> 1), OR
- // 2. Single field with complex keygen encoding enabled
- (fieldCount > 1) || (fieldCount == 1 && encodeFieldNameConfig)
+ // 2. Using complex key generator with single field
+ fieldCount > 1 || (isUsingComplexKeyGen && fieldCount == 1)
Review Comment:
> in which case the fieldCount cound < 1
Are you asking the case where fieldCount > 1? If so fieldCount is obtained
by the following `val recordKeyOpt = getRecordKeyConfig`, which will invoke
getting the config `RECORDKEY_FIELDS`. This config gives us the fields that are
`concacted` together to make the recordKey, being one or many fields.
```
public static final ConfigProperty<String> RECORDKEY_FIELDS = ConfigProperty
.key("hoodie.table.recordkey.fields")
.noDefaultValue()
.withDocumentation("Columns used to uniquely identify the table.
Concatenated values of these fields are used as "
+ " the record key component of HoodieKey.");
```
Complex key gen based on our docs states this:
https://hudi.apache.org/docs/key_generation/#complex
```
Both record key and partition paths comprise one or more than one field by
name(combination of multiple fields). Fields are expected to be comma separated
in the config value. For example "Hoodie.datasource.write.recordkey.field" :
“col1,col4
```
If your question is can fieldCount < 1, I do not think this is valid case
that will come up, as the user always has to provide a recordkey field.
> we should also follow the config encodeFieldNameConfig
Not to sure I understand what you mean by this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]