Rui Li created FLINK-15533: ------------------------------ Summary: Writing DataStream as text file fails due to output path already exists Key: FLINK-15533 URL: https://issues.apache.org/jira/browse/FLINK-15533 Project: Flink Issue Type: Bug Reporter: Rui Li
The following program reproduces the issue. {code} Configuration configuration = GlobalConfiguration.loadConfiguration(); configuration.set(DeploymentOptions.TARGET, RemoteExecutor.NAME); StreamExecutionEnvironment streamEnv = new StreamExecutionEnvironment(configuration); EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build(); TableConfig tableConfig = new TableConfig(); StreamTableEnvironment tableEnv = StreamTableEnvironmentImpl.create(streamEnv, settings, tableConfig); tableEnv .connect(new FileSystem().path("file:///path/to/csv")) .withSchema(new Schema().field("f1", DataTypes.INT()).field("f2", DataTypes.INT())) .withFormat(new OldCsv().fieldDelimiter(",")) .registerTableSource("src"); DataStream dataStream = tableEnv.toAppendStream(tableEnv.from("src"), Row.class); dataStream.writeAsText("hdfs://localhost:8020/tmp/output"); streamEnv.execute(); {code} The job will fail with the follow error, even though the output path doesn't exist before job submission: {noformat} org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): /tmp/output already exists as a directory {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)