Re: Flume log rolling when you need to do rollups for multiple time zones

2014-07-29 Thread Hari Shreedharan
The issue seems to be that it is already the next day when the event arrives at the agent. You can either move the timestamp interceptor to the first Flume agent in the pipeline - which reduces the time window in which this can occur or insert a timestamp header when you create the event (the heade

Re: Flume log rolling when you need to do rollups for multiple time zones

2014-07-29 Thread Gary Malouf
Hi Hari, Below is the config for one of our source-channel-sink combos. In hadoop/spark world, how do you then handle the events that arrive late to the bucket? That is, events for July 15 UTC end up in the July 16 bucket. The ugly way I have been handling this to date is that for any query for

Re: Flume log rolling when you need to do rollups for multiple time zones

2014-07-29 Thread Hari Shreedharan
Can you send your config? There are a couple of params that allow the files to be rolled faster - idleTimeout and rollInterval. I am assuming you are using rollInterval already. idleTimeout will close a file when it is not written to for the configured time. That might help with the rolling. Rememb

Flume log rolling when you need to do rollups for multiple time zones

2014-07-29 Thread Gary Malouf
We are an ad tech company that buys and sells digital media. To date, we have been using Apache Flume 1.4.x to ingest all of our bid request, response, impression and attribution data. The logs currently 'roll' hourly for each data type, meaning that at some point during each hour (if Flume is be