Thanks Gwen, For your comments " if one collector is down, the client can connect to another" in #3, how it related to the two-tier architecture? And client and collector in this case means?
regards, Lin On Sun, Mar 8, 2015 at 10:42 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > There are several benefits to the two tier architecture: > > 1. Limit number of processes writing to HDFS. As you correctly > mentioned, there are some limitations there. > 2. Enable us to create larger files faster. (We want to switch files > on HDFS fast to allow querying new data faster, but we also don't want > gazillion small files) > 3. Two tier architecture can support high availability and load > balancing - if one collector is down, the client can connect to > another. > > Gwen > > On Sun, Mar 8, 2015 at 10:30 PM, Lin Ma <lin...@gmail.com> wrote: > > Thanks Gwen, > > > > Using two-tier architecture of Flume is for the purpose of reduce the > number > > of processes written to HDFS? Remember if too many processes written to > > HDFS, name node will have issues. > > > > regards, > > Lin > > > > On Sun, Mar 8, 2015 at 8:26 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> > >> As stated in the docs, you'll need to have the timestamp in the event > >> header for HDFS to automatically place the events in the correct > >> directory. > >> This can be done using the timestamp interceptor. > >> > >> You can see an example here: > >> > >> > https://github.com/hadooparchitecturebook/hadoop-arch-book/tree/master/ch09-clickstream/Flume > >> > >> This example uses 2-tier architecture (i.e. one flume agent collecting > >> logs from web servers and the other writing to HDFS). > >> However, you can see how in client.conf the spooling-directory source > >> is configured with timestamp interceptor and in collector.conf the > >> HDFS sink has a parameterized target directory with the timestamp in > >> it. > >> > >> Gwen > >> > >> > >> Gwen > >> > >> On Sun, Mar 8, 2015 at 7:56 PM, Lin Ma <lin...@gmail.com> wrote: > >> > Thanks Ashish, > >> > > >> > One further question on HDFS sink. If I configure the destination > >> > directory > >> > on HDFS to be Year Month Day Hour, etc. pattern, Flume will put the > data > >> > event it received automatically to the related directory and created > new > >> > directory with time elapsed further? Or I have to setup some Key/Value > >> > headers event in order for HDFS sink to recognize event time and put > >> > into > >> > appropriate time based folder? > >> > > >> > regards, > >> > Lin > >> > > >> > On Sun, Mar 8, 2015 at 6:32 PM, Ashish <paliwalash...@gmail.com> > wrote: > >> >> > >> >> Your understanding is correct :) > >> >> > >> >> On Mon, Mar 9, 2015 at 6:54 AM, Lin Ma <lin...@gmail.com> wrote: > >> >> > Thanks Ashish, > >> >> > > >> >> > Followed your guidance, and found below instructions of which have > >> >> > further > >> >> > questions to confirm with you, it seems we need to close the files > >> >> > and > >> >> > never > >> >> > touch it for Flume to process correctly, so not sure if it is good > >> >> > practice > >> >> > that -- (1) let the application write log file in existing way, > like > >> >> > hourly > >> >> > or 5 mins pattern, (2) close and move the files to another > directory > >> >> > as > >> >> > input Source for Flume Agent which Flume could process as Spooling > >> >> > Directory? > >> >> > > >> >> > “This source will watch the specified directory for new files, and > >> >> > will > >> >> > parse events out of new files as they appear. ” > >> >> > > >> >> > " > >> >> > > >> >> > If a file is written to after being placed into the spooling > >> >> > directory, > >> >> > Flume will print an error to its log file and stop processing. > >> >> > If a file name is reused at a later time, Flume will print an error > >> >> > to > >> >> > its > >> >> > log file and stop processing. > >> >> > > >> >> > " > >> >> > > >> >> > regards, > >> >> > Lin > >> >> > > >> >> > On Sun, Mar 8, 2015 at 12:23 AM, Ashish <paliwalash...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> Please look at following > >> >> >> Spooling Directory Source > >> >> >> > >> >> >> [ > http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source] > >> >> >> and > >> >> >> HDFS Sink (http://flume.apache.org/FlumeUserGuide.html#hdfs-sink) > >> >> >> > >> >> >> Spooling Directory Source need immutable files, means files should > >> >> >> not > >> >> >> be written to once they are being consumed. In short your > >> >> >> application > >> >> >> cannot write to the file being read by Flume. > >> >> >> > >> >> >> Log format is not an issue, as long as you don't want it to be > >> >> >> interpreted by Flume components. Since it's log assuming single > log > >> >> >> per line with line separator at the end of line. > >> >> >> > >> >> >> You can also look at Exec source > >> >> >> (http://flume.apache.org/FlumeUserGuide.html#exec-source) for > >> >> >> tailing > >> >> >> to a file being written by application. Documentation covers > details > >> >> >> on all the links. > >> >> >> > >> >> >> HTH ! > >> >> >> > >> >> >> > >> >> >> On Sun, Mar 8, 2015 at 12:32 PM, Lin Ma <lin...@gmail.com> wrote: > >> >> >> > Hi Flume masters, > >> >> >> > > >> >> >> > I want to install Flume on a box, and consume local log file as > >> >> >> > source > >> >> >> > and > >> >> >> > send to remote HDFS sink. The log format is private and text > (not > >> >> >> > Avro > >> >> >> > or > >> >> >> > JSON format). > >> >> >> > > >> >> >> > I am reading the guide on Flume and many advanced Source > >> >> >> > configuration, > >> >> >> > wondering for the plain local log file source, any reference > >> >> >> > samples? > >> >> >> > And > >> >> >> > not sure if Flume could consume the local file while the > >> >> >> > application > >> >> >> > is > >> >> >> > still writing the log file? Thanks. > >> >> >> > > >> >> >> > regards, > >> >> >> > Lin > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> thanks > >> >> >> ashish > >> >> >> > >> >> >> Blog: http://www.ashishpaliwal.com/blog > >> >> >> My Photo Galleries: http://www.pbase.com/ashishpaliwal > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> thanks > >> >> ashish > >> >> > >> >> Blog: http://www.ashishpaliwal.com/blog > >> >> My Photo Galleries: http://www.pbase.com/ashishpaliwal > >> > > >> > > > > > >