Copilot commented on code in PR #2843:
URL: https://github.com/apache/tika/pull/2843#discussion_r3312314661


##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -604,27 +614,27 @@ public void process(String arg) throws Exception {
         System.out.println(localConfig.getConfig().toString());
     }*/
 
-    private void convertConfigXmlToJson(String paths) throws Exception {
-        String[] parts = paths.split(",");
-        if (parts.length != 2) {
-            System.err.println("Error: --convert-config-xml-to-json requires 
input and output paths separated by comma");
-            System.err.println("Usage: 
--convert-config-xml-to-json=<input.xml>,<output.json>");
+    private void convertConfigXmlToJson(String inputPath) throws Exception {
+        if (inputPath == null || inputPath.trim().isEmpty()) {
+            System.err.println("Error: --convert-config-xml-to-json requires 
an input XML path");
+            System.err.println("Usage: 
--convert-config-xml-to-json=<input.xml> > <output.json>");
             return;
         }
 
-        Path xmlPath = Paths.get(parts[0].trim());
-        Path jsonPath = Paths.get(parts[1].trim());
+        Path xmlPath = Paths.get(inputPath.trim());
 
         if (!Files.exists(xmlPath)) {
             System.err.println("Error: Input XML file not found: " + xmlPath);
             return;
         }

Review Comment:
   convertConfigXmlToJson() only checks `Files.exists(xmlPath)`. If the path 
exists but is a directory (or otherwise not a regular readable file), 
`Files.newInputStream(xmlPath)` will throw and you'll get a stack trace rather 
than the friendly CLI error. Consider checking `Files.isRegularFile(xmlPath)` 
(and possibly `Files.isReadable`) and emitting a clear error message before 
opening the stream.
   



##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -771,16 +781,18 @@ private void usage() {
         out.println();
         out.println("    -g  or --gui           Start the Apache Tika GUI");
         out.println();
-        out.println("    --config=<tika-config.xml>");
-        out.println("        TikaConfig file. Must be specified before -g, -s, 
-f or the dump-x-config !");
+        out.println("    --config=<tika-config.json>");
+        out.println("        TikaConfig file (JSON as of Tika 4.x). Must be 
specified before -g, -s or -f !");

Review Comment:
   The usage text says the config must be specified before `-s`, but 
`-s/--server` is explicitly unsupported in this CLI (process() throws 
IllegalArgumentException for it). Consider removing `-s` from this message to 
avoid confusing users, and keep the wording aligned with actual supported 
options.
   



##########
tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java:
##########
@@ -760,6 +760,25 @@ public void testListParserDetailApt() throws Exception {
         
assertTrue(content.contains("application/vnd.oasis.opendocument.text-web"));
     }
 
+    /**
+     * Tests --convert-config-xml-to-json with no separate config file.
+     * Regression test for TIKA-4734: the flag used to be misrouted to async
+     * mode (the input arg ended in ".json"), failing with a 
TikaConfigException
+     * unless a --config was also passed. It must now run standalone and write
+     * the converted JSON to stdout.
+     */
+    @Test
+    public void testConvertConfigXmlToJson() throws Exception {
+        String xmlPath = 
Paths.get(getClass().getResource("/xml-configs/tika-config-simple.xml").toURI()).toString();
+        String content = getParamOutContent("--convert-config-xml-to-json=" + 
xmlPath);
+
+        // stdout should contain the converted JSON (and only the JSON)
+        assertTrue(content.contains("\"parsers\""), "Expected JSON parsers 
section, got: " + content);
+        assertTrue(content.contains("pdf-parser"), "Expected pdf-parser in 
output, got: " + content);
+        assertTrue(content.contains("\"sortByPosition\" : true"), "Expected 
converted param, got: " + content);
+        assertTrue(content.trim().startsWith("{"), "Output should be pure 
JSON, got: " + content);

Review Comment:
   This test asserts specific pretty-printed JSON substrings (including spacing 
like `"sortByPosition" : true`), which is brittle if the JSON formatter 
changes. To make the regression test more stable, consider parsing `content` as 
JSON and asserting on the resulting structure/values (e.g., that the parsers 
array includes `pdf-parser` with `sortByPosition=true`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to