Hi,

I'm currently investigating the suitability of using an Embedded Agent with a 
file based channel which will be used to write to another agent with a file 
based channel, and then ultimately into a hdfs sequence file. I've done some 
early testing in a local environment (with a VM acting as a small Hadoop set up 
with a flume agent running on it), and found that using a file based channel is 
very slow when compared to the memory channel. We're currently writing around 
150,000 - 200,000 messages/sec (each message ranging from a few hundred bytes 
up to 6KB), and this is achieved by writing directly to a Sequence File using 
Hadoop's File System API. However, I've read that the best we could hope for 
(on a single channel) is around 8000 events/sec, with each event being around 
2KB. I believe this was achieved by having  the checkpoint file on one disk and 
use other disks for the data directories. Is this performance the best we can 
get for on a single machine with a file based channel in an Embedded Agent?

Thanks and kind regards,

Adam
The information included in this email and any files transmitted with it may 
contain information that is confidential and it must not be used by, or its 
contents or attachments copied or disclosed, to persons other than the intended 
addressee. If you have received this email in error, please notify BJSS. In the 
absence of written agreement to the contrary BJSS' relevant standard terms of 
contract for any work to be undertaken will apply. Please carry out virus or 
such other checks as you consider appropriate in respect of this email. BJSS do 
not accept responsibility for any adverse effect upon your system or data in 
relation to this email or any files transmitted with it. BJSS Limited, a 
company registered in England and Wales (Company Number 2777575), VAT 
Registration Number 613295452, Registered Office Address, First Floor, Coronet 
House, Queen Street, Leeds, LS1 2TW

Reply via email to