[ https://issues.apache.org/jira/browse/FLINK-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953637#comment-16953637 ]
John Lonergan edited comment on FLINK-14170 at 10/17/19 11:24 AM: ------------------------------------------------------------------ Hi I disagree with your approach Kostas as it impacts time to marker and adds a huge effort to fixing this critical bug. This existing impl is an attempt at "fail early" and is a *nice to have feature*, however this implementation needlessly disables the product on 2.6 and is therefore a *major bug* for us users. ==== Can we split the discussion into 1 remove the bug (ie the block) 2 other nice to have improvements ==== Re 1 - When removing the bug the requestor's suggested an optional addition ie including a simple NotImplementedException in the Flink code - this seems like a reasonable *but optional* compromise to improve the quality of the error message for any unfortunate's who go via a code path that attempts to use truncate() on 2.6. That approach is a practical solution that satisfies both the need to correctness and helpfulness without completely blocking the use of this product for a large group of potential users, particularly the many-many slower moving enterprises out here in the wild. Let's not add additional barriers in the way of fixing the primary issue. Re 2 - Your points ... "we should fail at build time" - how is the possible - we don't target specifically hadoop 2.6 or other versions? " pre-flight time" - again how is this possible - I've looked at this and it's pretty hard to see how that would work - I can't see an straightforward one (suggest you make a proposal on how this would work) "same strategy" - not needed to fix the bug - and this is separate problem that needs a separate ticket Re "time bomb waiting to explode" - hardly an reasonable description of the issue - it's not like the first time I would run this code is in production? I'd discover the issue within an hour or so of writing my prototype or implementation - not a big deal IMHO. And not a big deal at all if the helpful error message that the original question suggests was included in the solution. *Again can I stress that we separate the critical bug (this 2.6 check) from other nice to haves* was (Author: johnlon): Hi I disagree with that approach as it's impact time to marker and effort in fixing the bug significantly. This existing impl is an attempt at "fail early" and is a *nice to have feature*, however this implementation needlessly disables the product on 2.6 and is therefore a *major bug* for us users. ==== Can we split the discussion into 1 remove the bug (ie the block) 2 other nice to have improvements ==== Re 1 - When removing the bug the requestor's suggested an optional addition ie including a simple NotImplementedException in the Flink code - this seems like a reasonable *but optional* compromise to improve the quality of the error message for any unfortunate's who go via a code path that attempts to use truncate() on 2.6. That approach is a practical solution that satisfies both the need to correctness and helpfulness without completely blocking the use of this product for a large group of potential users, particularly the many-many slower moving enterprises out here in the wild. Let's not add additional barriers in the way of fixing the primary issue. Re 2 - Your points ... "we should fail at build time" - how is the possible - we don't target specifically hadoop 2.6 or other versions? " pre-flight time" - again how is this possible - I've looked at this and it's pretty hard to see how that would work - I can't see an straightforward one (suggest you make a proposal on how this would work) "same strategy" - not needed to fix the bug - and this is separate problem that needs a separate ticket Re "time bomb waiting to explode" - hardly an reasonable description of the issue - it's not like the first time I would run this code is in production? I'd discover the issue within an hour or so of writing my prototype or implementation - not a big deal IMHO. And not a big deal at all if the helpful error message that the original question suggests was included in the solution. *Again can I stress that we separate the critical bug (this 2.6 check) from other nice to haves* > Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder > ------------------------------------------------------------- > > Key: FLINK-14170 > URL: https://issues.apache.org/jira/browse/FLINK-14170 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.9.0 > Reporter: Bhagavan > Priority: Major > > Currently, StreamingFileSink is supported only with Hadoop >= 2.7 > irrespective of Row/bulk format builder. This restriction is due to truncate > is not supported in Hadoop < 2.7 > However, BulkFormatBuilder does not use truncate method to restore the file. > So the restricting StreamingFileSink.BulkFormatBuilder to be used only with > Hadoop >= 2.7 is not necessary. > So requested improvement is to remove the precondition on > HadoopRecoverableWriter and allow BulkFormatBuilder (Parquet) to be used in > Hadoop 2.6 ( Most of the enterprises still on CDH 5.x) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)