Hi -- I'm passing the jars that have patches and would like to use the
patched jars instead of the ones in the java classpath. When I pass to
'flume-ng agent .. --classpath ..', I still see the old jars ahead of the
patched jars. How can I override the default jars?
Thanks!
arget like mvn
> dependency:copy-dependencies to copy the dependencies to
> ${FLUME_HOME}/lib/
>
> -Mike
>
>
> --
> *From:* Buntu Dev [buntu...@gmail.com]
> *Sent:* Thursday, January 15, 2015 12:57 PM
> *To:* user@flume.apache.org
> *Subject
Hi -- I'm ingesting data to HDFS via Flume and wanted to know if there any
built-in features to handle spam detection and rate limiting to avoid any
possible flooding of data. Please let me know.
Thanks!
We got Kafka->Flume->Kite Dataset sink configured to write to Hive backed
dataset. One of the main requirements for us is to do some sessionization
on the data and do funnel analysis.
We are currently handling this relying on Impala/Hive but its quite slow
and given that we want the reports to be
Are there any known strategies to handle duplicate events during ingestion?
I use Flume to ingest apache logs to parse the request using Morphlines and
there are some duplicate requests with certain query params differing. I
would like to handle these once I parse and split the query params into
to
gt; That would have to be done outside Flume, perhaps using something like
> Spark Streaming, or Storm.
>
> Thanks,
> Hari
>
>
> On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev wrote:
>
>> Are there any known strategies to handle duplicate events during
>> ingestion? I
I'm using the Kafka source and need to replay some events from past 3 or 4
days. I do notice there is a "auto.offset.reset" option but seems to take
values like 'largest' or 'smallest'.
How do I go about setting the offset to some timestamp or specific offset?
Thanks!
I'm using Memory channel along with Kite dataset sink and keep running into
this error:
ERROR kafka.KafkaSource: KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Unable to put batch on required channel:
org.apache.flume.channel.MemoryChannel{name: tracksChannel}
at
org.apache.flu
I'm using a Memory channel with capacity set to 10. Does this mean when
the flume agent restarts, its possible that I loose about 100k events?
For any other durable channel say File channel I noticed .tmp files created
and written to but when I restart the agent these .tmp files are left
as-is
I'm planning on implementing a tiered Flume setup with a master Flume agent
to use Kafka Source and then transform the events via Morphlines to write
to another Kafka Channel without any Sink. The Kafka Channel of the master
Flume will be then used as a source for other downstream Flume agents to
w
I got Flume configured to read Avro events from Kafka source and I'm also
attaching the schema like this:
~~~
f1.sources.kafka-source.interceptors.attach-f1-schema.type = static
f1.sources.kafka-source.interceptors.attach-f1-schema.key =
flume.avro.schema.url
f1.sources.kafka-source.interceptors.a
I need to read Avro files using spooldir and here is how I've configured
the source:
f1.sources.src1.type = spooldir
f1.sources.src1.spoolDir = /path/to/avro/files
f1.sources.src1.deserializer = avro
But when I run the flume agent, I keep running into these exceptions:
~~~
org.apache.flume.Flume
Don't mean to hijack this thread but I've some issue along the same lines
-- does the hdfs.fileType need to be set to DataStream even if the source
(kafka in my case instead of spooldir) data is Avro?
On Mon, Sep 14, 2015 at 12:35 PM, Robin Jain wrote:
> Hi Darshan,
>
> Define hdfs.fileType para
Currently I have a single flume agent that converts apache logs into Avro
and writes to HDFS sink. I'm looking for ways to create tiered topology and
want to have the Avro records available to other flume agents. I used Kafka
channel/sink to write these Avro records but was running into this error
source to
> a topic that is used by a channel and not by a kafka sink
>
> Regards,
> Gonzalo
>
>
> On Sep 15, 2015 6:42 PM, "Buntu Dev" wrote:
>
>> Currently I have a single flume agent that converts apache logs into Avro
>> and writes to HDFS sink. I'
I have enabled JSON reporting and was able to get the metrics going to the
/metrics page. But based on the metrics reported, how do I go about
generating some sort of throughput summary to benchmark the Source, Channel
and Sink?
Here is the sample JSON metrics:
{
"CHANNEL.my-file-channel": {
I got a File channel with HDFS sink. In the case when the sink slows down
and event taken from the channel falls behind while the event puts continue
at the same pace, how would one go about finding the amount of backlog or
time it takes to clear the backlog?
Thanks!
com.sun.management.jmxremote.port=12346
>>
>> or whatever port you choose.
>>
>>
>> --
>> *From:* Buntu Dev [buntu...@gmail.com]
>> *Sent:* Monday, November 02, 2015 4:33 PM
>> *To:* user@flume.apache.org
>> *Subject
inutes and push the results to a TSDB (
> http://opentsdb.net/) database. TSDB is great for visualizing your data
> rates. Depending on your flume configuration, you will get greatly varying
> rates. If you are using spinning disks with a file channel you'll want to
> m
19 matches
Mail list logo