The throughput on this thread is too high.

Nicholas/Suresh: do either of you disagree with the general approach
in HDFS-347? Holding an inevitable branch open causes a lot of pain
(Suresh, you endured much of that personally with security, which had
a lot of follow-on work). While it makes sense to block HDFS-347 from
2.x before all concerns are addressed, are most objections so
fundamental that the merge to trunk should be prevented (except for
HDFS-2246)? Is there related development that will be impacted? Colin
won't disappear after this is committed, so if the answers to these
questions are "no", then let's move forward.

Todd: Yes, the idea that the project only allows optimizations that
work on all platforms is idiotic, which is why nobody has suggested
it. Given that HDFS-347 is a strictly better approach, once committed,
there will be ample motivation to add support for other OSes and
remove HDFS-2246 entirely. Nobody is confused about this. There's
ample precedent for retaining obscure, clumsy features as a temporary
stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks,
configurable combiner semantics). What's the virtue of insisting on
removing this? Unless there was a lot of follow-on work, HDFS-2246
doesn't look like a lot of code... -C

On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <t...@cloudera.com> wrote:
> On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <sur...@hortonworks.com> 
> wrote:
>>> Given that this is an optimization, and we have a ton of optimizations
>>> which don't yet run on Windows, I don't think that should be
>>> considered. Additionally, the Windows support has not yet been merged,
>>> nor is it in any release, so this isn't a regression.
>>>
>>
>> This is a critical functionality for HBase peformance and an optimization
>> we consider
>> very important to have.
>
> Too bad it doesn't work in any of the normal installations... none of
> the packages for Hadoop would allow it to work, given that the data
> directories will be owned by HDFS and not world readable, and
> tasks/HBase would run as an "hbase" user, which wouldn't have direct
> access to the block files.
>
>>>
>>> I would be happy to review an addition to the HDFS-347 branch which
>>> addresses this issue. But I don't think we should be maintaining two
>>> codepaths for the sake of an optimization on a platform which is not
>>> yet officially supported on trunk, especially when the old code path
>>> is _insecure_ and thus unusable in most environments.
>>
>>
>> I have to disagree. No where in the jira or the design it is explicitly
>> stated that
>> the old short circuit functionality is being removed. My assumption has been
>> that it will not be removed.
>
> I've tried this avenue in the past on other insecurities which were
> fixed. Sorry if you were depending on insecure behavior. The project
> should move on and not have 3+ ways of implementing the same thing.
>
>> As regards "officially supported", we have been doing
>> windows development for
>> more than a year. In fact branch-1-win is being used by a lot users. Given
>> merging it to branch-1 requires first making it available in trunk, we have
>> been doing
>> a lot of work in branch-trunk-win. It is almost ready to be merged as well.
>>
>> I am -1 on removing existing short circuit until an alternative short
>> circuit similar
>> to HDFS-347 on all the platforms.
>
> Great -- are you committed to building this equivalent feature for
> Windows, then? On what timeline? From my viewpoint, Windows isn't a
> supported platform *right now*, so vetoing based on it seems
> meritless.
>
> BTW, the posix_fadvise based readahead is an important optimization
> for many workloads, too, but from what I can tell looking at the
> Windows branch, it doesn't support it. There are other places in the
> Windows branch where performance is going to be worse - eg disabling
> the pipelined crc32c implementation will be a 2-3x hit on that code
> path.
>
> No one has voted to merge Windows support, and if merging Windows
> support means that, from now on, every _optimization_ must work on
> Windows, I don't think I will be able to vote +1. The vast majority of
> the community is _not_ running Windows, and I don't want to block
> progress on the small number of developers who know how to program
> against that platform.
>
> If that's the axe hanging over our head with the Windows branch, then
> I'm all for saying "good, keep it on a branch and don't merge it to
> trunk".
>
> I was hoping we could all work together a bit better here...
> contentious merge votes like this just cause cases where different
> distros diverge by merging different branches way ahead of the
> upstream (eg yours with windows support, ours with 347, etc)
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera

On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <t...@cloudera.com> wrote:
> On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <sur...@hortonworks.com> 
> wrote:
>>> Given that this is an optimization, and we have a ton of optimizations
>>> which don't yet run on Windows, I don't think that should be
>>> considered. Additionally, the Windows support has not yet been merged,
>>> nor is it in any release, so this isn't a regression.
>>>
>>
>> This is a critical functionality for HBase peformance and an optimization
>> we consider
>> very important to have.
>
> Too bad it doesn't work in any of the normal installations... none of
> the packages for Hadoop would allow it to work, given that the data
> directories will be owned by HDFS and not world readable, and
> tasks/HBase would run as an "hbase" user, which wouldn't have direct
> access to the block files.
>
>>>
>>> I would be happy to review an addition to the HDFS-347 branch which
>>> addresses this issue. But I don't think we should be maintaining two
>>> codepaths for the sake of an optimization on a platform which is not
>>> yet officially supported on trunk, especially when the old code path
>>> is _insecure_ and thus unusable in most environments.
>>
>>
>> I have to disagree. No where in the jira or the design it is explicitly
>> stated that
>> the old short circuit functionality is being removed. My assumption has been
>> that it will not be removed.
>
> I've tried this avenue in the past on other insecurities which were
> fixed. Sorry if you were depending on insecure behavior. The project
> should move on and not have 3+ ways of implementing the same thing.
>
>> As regards "officially supported", we have been doing
>> windows development for
>> more than a year. In fact branch-1-win is being used by a lot users. Given
>> merging it to branch-1 requires first making it available in trunk, we have
>> been doing
>> a lot of work in branch-trunk-win. It is almost ready to be merged as well.
>>
>> I am -1 on removing existing short circuit until an alternative short
>> circuit similar
>> to HDFS-347 on all the platforms.
>
> Great -- are you committed to building this equivalent feature for
> Windows, then? On what timeline? From my viewpoint, Windows isn't a
> supported platform *right now*, so vetoing based on it seems
> meritless.
>
> BTW, the posix_fadvise based readahead is an important optimization
> for many workloads, too, but from what I can tell looking at the
> Windows branch, it doesn't support it. There are other places in the
> Windows branch where performance is going to be worse - eg disabling
> the pipelined crc32c implementation will be a 2-3x hit on that code
> path.
>
> No one has voted to merge Windows support, and if merging Windows
> support means that, from now on, every _optimization_ must work on
> Windows, I don't think I will be able to vote +1. The vast majority of
> the community is _not_ running Windows, and I don't want to block
> progress on the small number of developers who know how to program
> against that platform.
>
> If that's the axe hanging over our head with the Windows branch, then
> I'm all for saying "good, keep it on a branch and don't merge it to
> trunk".
>
> I was hoping we could all work together a bit better here...
> contentious merge votes like this just cause cases where different
> distros diverge by merging different branches way ahead of the
> upstream (eg yours with windows support, ours with 347, etc)
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera

Reply via email to