[jira] [Comment Edited] (FLINK-14170) Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder

John Lonergan (Jira) Thu, 17 Oct 2019 04:25:25 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953637#comment-16953637
 ]


John Lonergan edited comment on FLINK-14170 at 10/17/19 11:24 AM:
------------------------------------------------------------------

Hi I disagree with your approach Kostas as it impacts time to marker and adds a 
huge effort to fixing this critical bug.

This existing impl is an attempt at "fail early" and is a *nice to have 
feature*, 
however this implementation needlessly disables the product on 2.6 and is 
therefore a *major bug* for us users.

====

Can we split the discussion into 
1 remove the bug (ie the block)
2 other nice to have improvements

====

Re 1 - 

When removing the bug the requestor's suggested an optional addition ie 
including a simple  NotImplementedException in the Flink code - this seems like 
a reasonable *but optional* compromise to improve the quality of the error 
message for any unfortunate's who go via a code path that attempts to use 
truncate() on 2.6. That approach is a practical solution that satisfies both 
the need to correctness and helpfulness without completely blocking the use of 
this product for a large group of potential users, particularly the many-many 
slower moving enterprises out here in the wild.

Let's not add additional barriers in the way of fixing the primary issue.

Re 2 - Your points ... 

"we should fail at build time" - how is the possible - we don't target 
specifically hadoop 2.6 or other versions?
" pre-flight time" - again how is this possible - I've looked at this and it's 
pretty hard to see how that would work - I can't see an straightforward one 
(suggest you make a proposal on how this would work)
"same strategy" - not needed to fix the bug - and this is separate problem that 
needs a separate ticket


Re "time bomb waiting to explode" - hardly an reasonable description of the 
issue - it's not like the first time I would run this code is in production? 
I'd discover the issue within an hour or so of writing my prototype or 
implementation - not a big deal IMHO. And not a big deal at all if the helpful 
error message that the original question suggests was included in the solution.

*Again can I stress that we separate the critical bug (this 2.6 check) from 
other nice to haves*




was (Author: johnlon):
Hi I disagree with that approach as it's impact time to marker and effort in 
fixing the bug significantly.

This existing impl is an attempt at "fail early" and is a *nice to have 
feature*, 
however this implementation needlessly disables the product on 2.6 and is 
therefore a *major bug* for us users.

====

Can we split the discussion into 
1 remove the bug (ie the block)
2 other nice to have improvements

====

Re 1 - 

When removing the bug the requestor's suggested an optional addition ie 
including a simple  NotImplementedException in the Flink code - this seems like 
a reasonable *but optional* compromise to improve the quality of the error 
message for any unfortunate's who go via a code path that attempts to use 
truncate() on 2.6. That approach is a practical solution that satisfies both 
the need to correctness and helpfulness without completely blocking the use of 
this product for a large group of potential users, particularly the many-many 
slower moving enterprises out here in the wild.

Let's not add additional barriers in the way of fixing the primary issue.

Re 2 - Your points ... 

"we should fail at build time" - how is the possible - we don't target 
specifically hadoop 2.6 or other versions?
" pre-flight time" - again how is this possible - I've looked at this and it's 
pretty hard to see how that would work - I can't see an straightforward one 
(suggest you make a proposal on how this would work)
"same strategy" - not needed to fix the bug - and this is separate problem that 
needs a separate ticket


Re "time bomb waiting to explode" - hardly an reasonable description of the 
issue - it's not like the first time I would run this code is in production? 
I'd discover the issue within an hour or so of writing my prototype or 
implementation - not a big deal IMHO. And not a big deal at all if the helpful 
error message that the original question suggests was included in the solution.

*Again can I stress that we separate the critical bug (this 2.6 check) from 
other nice to haves*



> Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder
> -------------------------------------------------------------
>
>                 Key: FLINK-14170
>                 URL: https://issues.apache.org/jira/browse/FLINK-14170
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.9.0
>            Reporter: Bhagavan
>            Priority: Major
>
> Currently, StreamingFileSink is supported only with Hadoop >= 2.7 
> irrespective of Row/bulk format builder. This restriction is due to truncate 
> is not supported in  Hadoop < 2.7
> However, BulkFormatBuilder does not use truncate method to restore the file. 
> So the restricting StreamingFileSink.BulkFormatBuilder to be used only with 
> Hadoop >= 2.7 is not necessary.
> So requested improvement is to remove the precondition on 
> HadoopRecoverableWriter and allow  BulkFormatBuilder (Parquet) to be used in 
> Hadoop 2.6 ( Most of the enterprises still on CDH 5.x)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-14170) Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder

Reply via email to