I filed HDFS-6200 to demonstrate the feasibility of the approach.
~Haohui
On Fri, Apr 4, 2014 at 11:46 AM, Haohui Mai wrote:
> I agree with Nicholas, Steve and Alejandro that it might require some
> nontrivial to achieve the goal. Here is my high-level plan:
>
> 1. Create a new hdfs-client pac
I agree with Nicholas, Steve and Alejandro that it might require some
nontrivial to achieve the goal. Here is my high-level plan:
1. Create a new hdfs-client package, and gradually move classes from hdfs
to hdfs-client. Fortunately IDEs like Eclipse and IntelliJ can do most of
the heavy-liftings.
Tuning the POM only mitigates the problem. The problem of one HDFS jar is
that you can't rule out all unnecessary dependency. For example,
NamenodeWebHdfsMethods depends on jersey-server and servlet. The Apache
Falcon project has clients for HDFS, Hive, Pig, Oozie, thus it pulls in the
dependency.
Haouhi's suggestion of a hdfs-client JAR with client dependencies only,
would be IMO the 'correct' way of doing things, we should have a
hdfs-server and hdfs-client JARs.
Doing this is practice is not trivial as classes are not properly
segregated. So, Steven's suggestion of an hdfs-client seems
to follow up with an example,
JIRA on updating dependencies and tuning the POMs
https://issues.apache.org/jira/browse/HADOOP-9991
here's a JIRA on dropping ZK from the hadoop-client POM
https://issues.apache.org/jira/browse/HADOOP-9905
And there's an mr-client POM where we've been slowly cut
On 3 April 2014 00:02, Haohui Mai wrote:
> The rpc and the web client can stay in one jar for the first cut. Indeed it
> might introduce some extra dependency, but the downstream projects always
> have the option to implement the webhdfs protocol themselves if they really
> need to avoid the depe
It's not an issue with hdfs/hadoop JARs itself, but the POMs -and the same
problem exists with the hadoop core JAR - too much stuff you don't need
client side.
We can address this -without changing the packaging into an hdfs-client.jar
(and so complicating everything related to HDFS code).
All we
The rpc and the web client can stay in one jar for the first cut. Indeed it
might introduce some extra dependency, but the downstream projects always
have the option to implement the webhdfs protocol themselves if they really
need to avoid the dependency.
Hadoop common is a bigger problem. Indeed
It is a very good idea although it might not be easy to do. One aspect to
consider is that do we need separated jars for rpc client and web client? Now,
suppose we could successfully separate HFDS Client jar(s) from HDFS. However,
HDFS Client uses Common as a library. We have to separate Com