Re: HDFS sink to a remote HDFS node

Hari Shreedharan Wed, 01 Oct 2014 12:42:01 -0700

What are you trying to accomplish here? You’d really need to have more nodes in 
your HDFS cluster to test anything realistic. I am not sure what happens when 
the cluster is really just 1 datanode - not sure the replication code within 
HDFS really handles that case.



Thanks,
Hari

On Wed, Oct 1, 2014 at 6:04 AM, Ed Judge <ejud...@gmail.com> wrote:

> Looks like they are up.  I see the following on one of the nodes but both 
> look generally the same (1 live datanode).
> [hadoop@localhost bin]$ hdfs dfsadmin -report
> 14/10/01 12:51:56 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Configured Capacity: 40797364224 (38.00 GB)
> Present Capacity: 37030862848 (34.49 GB)
> DFS Remaining: 37030830080 (34.49 GB)
> DFS Used: 32768 (32 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
> Live datanodes:
> Name: 127.0.0.1:50010 (localhost)
> Hostname: localhost
> Decommission Status : Normal
> Configured Capacity: 40797364224 (38.00 GB)
> DFS Used: 32768 (32 KB)
> Non DFS Used: 3766501376 (3.51 GB)
> DFS Remaining: 37030830080 (34.49 GB)
> DFS Used%: 0.00%
> DFS Remaining%: 90.77%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Last contact: Wed Oct 01 12:51:57 UTC 2014
> I don’t know how to demonstrate that they are accessible except to telnet 
> into each of them.  Right now that test shows that both nodes accept the 
> connection to port 50010.
> Is there some other test I can perform?
> Thanks,
> -Ed
> On Oct 1, 2014, at 12:31 AM, Hari Shreedharan <hshreedha...@cloudera.com> 
> wrote:
>> Looks like one data node is inaccessible or down - so the HDFS client has 
>> black listed it and the writes are failing as blocks are allocated to that 
>> one.
>> 
>> Thanks,
>> Hari
>> 
>> 
>> On Tue, Sep 30, 2014 at 7:33 PM, Ed Judge <ejud...@gmail.com> wrote:
>> 
>> I’ve pulled over all of the Hadoop jar files for my flume instance to use.  
>> I am seeing some slightly different errors now.  Basically I have 2 
>> identically configured hadoop instances on the same subnet.  Running flume 
>> on those same instances and pointing flume at the local hadoop/hdfs instance 
>> works fine and the files get written.  However, when I point it to the 
>> adjacent hadoop/hdfs instance I get many exceptions/errors (show below) and 
>> the files never get written.  Here is my HDFS sink configuration on 
>> 10.0.0.14:
>> 
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://10.0.0.16:9000/tmp/
>> a1.sinks.k1.hdfs.filePrefix = twitter
>> a1.sinks.k1.hdfs.fileSuffix = .ds
>> a1.sinks.k1.hdfs.rollInterval = 0
>> a1.sinks.k1.hdfs.rollSize = 10
>> a1.sinks.k1.hdfs.rollCount = 0
>> a1.sinks.k1.hdfs.fileType = DataStream
>> #a1.sinks.k1.serializer = TEXT
>> a1.sinks.k1.channel = c1
>> 
>> Any idea why this is not working?
>> 
>> Thanks.
>> 
>> 01 Oct 2014 01:59:45,098 INFO  
>> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
>> (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = 
>> TEXT, UseRawLocalFileSystem = false
>> 01 Oct 2014 01:59:45,385 INFO  
>> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
>> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating 
>> hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
>> 01 Oct 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] 
>> (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 
>> 100 docs
>> 01 Oct 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] 
>> (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 
>> 200 docs
>> 01 Oct 2014 01:59:49,379 INFO  [Thread-7] 
>> (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream:1378)
>>   - Exception in createBlockOutputStream
>> java.io.EOFException: Premature EOF: no length prefix available
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,390 INFO  [Thread-7] 
>> (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1275)
>>   - Abandoning BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
>> 01 Oct 2014 01:59:49,398 INFO  [Thread-7] 
>> (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1278)
>>   - Excluding datanode 127.0.0.1:50010
>> 01 Oct 2014 01:59:49,431 WARN  [Thread-7] 
>> (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - 
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>> /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes 
>> instead of minReplication (=1).  There are 1 datanode(s) running and 1 
>> node(s) are excluded in this operation.
>>      at 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>      at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>      at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> 
>>      at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>>      at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>>      at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>      at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>      at java.lang.reflect.Method.invoke(Method.java:606)
>>      at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>>      at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>>      at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] 
>> (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error while 
>> syncing
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>> /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes 
>> instead of minReplication (=1).  There are 1 datanode(s) running and 1 
>> node(s) are excluded in this operation.
>>      at 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>      at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>      at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> 
>>      at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>>      at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>>      at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>      at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>      at java.lang.reflect.Method.invoke(Method.java:606)
>>      at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>>      at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>>      at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,439 WARN  
>> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>> /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes 
>> instead of minReplication (=1).  There are 1 datanode(s) running and 1 
>> node(s) are excluded in this operation.
>> 
>> On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hshreedha...@cloudera.com> 
>> wrote:
>> 
>>> You'd need to add the jars that hadoop itself depends on. Flume pulls it in 
>>> if Hadoop is installed on that machine, else you'd need to manually 
>>> download it and install it. If you are using Hadoop 2.x, install the RPM 
>>> provided by Bigtop.
>>> 
>>> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ejud...@gmail.com> wrote:
>>> I added commons-configuration and there is now another missing dependency.  
>>> What do you mean by “all of Hadoop’s dependencies”?
>>> 
>>> 
>>> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hshreedha...@cloudera.com> 
>>> wrote:
>>> 
>>>> You actually need to add of all Hadoop’s dependencies to Flume classpath. 
>>>> Looks like Apache Commons Configuration is missing in classpath.
>>>> 
>>>> Thanks,
>>>> Hari
>>>> 
>>>> 
>>>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ejud...@gmail.com> wrote:
>>>> 
>>>> Thank you.  I am using hadoop 2.5 which I think uses 
>>>> protobuf-java-2.5.0.jar.
>>>> 
>>>> I am getting the following error even after adding those 2 jar files to my 
>>>> flume-ng classpath:
>>>> 
>>>> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] 
>>>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)
>>>>   - Configuration provider starting
>>>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)
>>>>   - Reloading configuration file:./src.conf
>>>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)
>>>>   - Added sinks: k1 Agent: a1
>>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration 
>>>> property ignored: i# = Describe the sink
>>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>>>>   - Processing:k1
>>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - 
>>>> Post-validation flume configuration contains configuration for agents: [a1]
>>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - 
>>>> Creating channels
>>>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating 
>>>> instance of channel c1 type memory
>>>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - 
>>>> Created channel c1
>>>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating 
>>>> instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer 
>>>> Key:        'tobhMtidckJoe1tByXDmI4pW3'
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer 
>>>> Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access 
>>>> Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>>>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access 
>>>> Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>>>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] 
>>>> (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance 
>>>> of sink: k1, type: hdfs
>>>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] 
>>>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)
>>>>   - Failed to start agent because dependencies were not found in 
>>>> classpath. Error follows.
>>>> java.lang.NoClassDefFoundError: 
>>>> org/apache/commons/configuration/Configuration
>>>>    at 
>>>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>>>>    at 
>>>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>>>>    at 
>>>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>>>>    at 
>>>> org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>>>>    at 
>>>> org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>>>>    at 
>>>> org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>>>>    at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>>>>    at 
>>>> org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>>>>    at 
>>>> org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>>>>    at 
>>>> org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>>>>    at 
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>>    at 
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>>    at 
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>    at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.ClassNotFoundException: 
>>>> org.apache.commons.configuration.Configuration
>>>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>>    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>    ... 17 more
>>>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] 
>>>> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping 
>>>> lifecycle supervisor 10
>>>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] 
>>>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83) 
>>>>  - Configuration provider stopping
>>>> [vagrant@localhost 6]$ 
>>>> 
>>>> Is there another jar file I need?
>>>> 
>>>> Thanks.
>>>> 
>>>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <shengyi....@gmail.com> wrote:
>>>> 
>>>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your 
>>>>> flume-ng classpath, and the dependent hadoop jar version must match your 
>>>>> hadoop system.
>>>>>  
>>>>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" 
>>>>> (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is 
>>>>> under flume lib directory ), because the pb interface of hdfs-2.0 is 
>>>>> compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will 
>>>>> fail to start....
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> 2014-09-30
>>>>> shengyi.pan
>>>>> 发件人：Ed Judge <ejud...@gmail.com>
>>>>> 发送时间：2014-09-29 22:38
>>>>> 主题：HDFS sink to a remote HDFS node
>>>>> 收件人："user@flume.apache.org"<user@flume.apache.org>
>>>>> 抄送：
>>>>>  
>>>>> I am trying to run the flume-ng agent on one node with an HDFS sink 
>>>>> pointing to an HDFS filesystem on another node.
>>>>> Is this possible?  What packages/jar files are needed on the flume agent 
>>>>> node for this to work?  Secondary goal is to install only what is needed 
>>>>> on the flume-ng node.
>>>>> 
>>>>> # Describe the sink
>>>>> a1.sinks.k1.type = hdfs
>>>>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Ed
>>>> 
>>>> 
>>> 
>>> 
>> 
>>

Re: HDFS sink to a remote HDFS node

Reply via email to