The issue seems to be that it is already the next day when the event
arrives at the agent. You can either move the timestamp interceptor to the
first Flume agent in the pipeline - which reduces the time window in which
this can occur or insert a timestamp header when you create the event (the
heade
Hi Hari,
Below is the config for one of our source-channel-sink combos. In
hadoop/spark world, how do you then handle the events that arrive late to
the bucket? That is, events for July 15 UTC end up in the July 16 bucket.
The ugly way I have been handling this to date is that for any query for
Can you send your config? There are a couple of params that allow the files
to be rolled faster - idleTimeout and rollInterval. I am assuming you are
using rollInterval already. idleTimeout will close a file when it is not
written to for the configured time. That might help with the rolling.
Rememb
We are an ad tech company that buys and sells digital media. To date, we
have been using Apache Flume 1.4.x to ingest all of our bid request,
response, impression and attribution data.
The logs currently 'roll' hourly for each data type, meaning that at some
point during each hour (if Flume is be