Hi, I'm currently investigating the suitability of using an Embedded Agent with a file based channel which will be used to write to another agent with a file based channel, and then ultimately into a hdfs sequence file. I've done some early testing in a local environment (with a VM acting as a small Hadoop set up with a flume agent running on it), and found that using a file based channel is very slow when compared to the memory channel. We're currently writing around 150,000 - 200,000 messages/sec (each message ranging from a few hundred bytes up to 6KB), and this is achieved by writing directly to a Sequence File using Hadoop's File System API. However, I've read that the best we could hope for (on a single channel) is around 8000 events/sec, with each event being around 2KB. I believe this was achieved by having the checkpoint file on one disk and use other disks for the data directories. Is this performance the best we can get for on a single machine with a file based channel in an Embedded Agent?
Thanks and kind regards, Adam The information included in this email and any files transmitted with it may contain information that is confidential and it must not be used by, or its contents or attachments copied or disclosed, to persons other than the intended addressee. If you have received this email in error, please notify BJSS. In the absence of written agreement to the contrary BJSS' relevant standard terms of contract for any work to be undertaken will apply. Please carry out virus or such other checks as you consider appropriate in respect of this email. BJSS do not accept responsibility for any adverse effect upon your system or data in relation to this email or any files transmitted with it. BJSS Limited, a company registered in England and Wales (Company Number 2777575), VAT Registration Number 613295452, Registered Office Address, First Floor, Coronet House, Queen Street, Leeds, LS1 2TW