yihua commented on code in PR #18204:
URL: https://github.com/apache/hudi/pull/18204#discussion_r3035034031
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HudiHiveSyncJob.java:
##########
Review Comment:
**Line 110:** 🤖 **Bug (still open from previous review):** The short alias
`"bff"` is missing its leading dash. JCommander requires parameter names to
start with `-` or `--`. As written, `-bff` or `--bff` on the command line won't
be recognized, and bare `bff` is not a valid prefix.
```suggestion
@Parameter(names = {"--base-file-format", "-bff"}, description = "Base
file format of the dataset")
```
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HudiHiveSyncJob.java:
##########
Review Comment:
**Line 82:** 🤖 **Bug (still open from previous review):** The Spark master
is hardcoded to `"local[2]"`. When this job is submitted via `spark-submit` to
a YARN/K8s/standalone cluster, `UtilHelpers.buildSparkContext` will override
the cluster-provided master with `local[2]`, running everything on the driver
node only.
Other Hudi utilities (e.g., `HoodieCompactor`, `HoodieCleaner`) pass
`"local[*]"` as a fallback and respect `spark.master` from the environment.
This should follow the same pattern, or better yet, not set a master at all and
let `spark-submit` control it.
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HudiHiveSyncJob.java:
##########
Review Comment:
**Line 82:** 🤖 The `JavaSparkContext` created in `main()` is never stopped.
If `run()` throws an exception, the SparkContext leaks. Wrap in try-finally:
```java
public static void main(String[] args) throws IOException {
final Config cfg = new Config();
new JCommander(cfg, null, args);
LOG.info("Cfg received: {}", cfg);
JavaSparkContext jsc = UtilHelpers.buildSparkContext("HudiHiveSyncJob",
"local[*]", true);
try {
new HudiHiveSyncJob(jsc, cfg).run();
} finally {
jsc.stop();
}
}
```
(The test correctly handles this with try/finally — `main()` should do the
same.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]