I would implement a custom serializer and configure it in the standard Hdfs
sink.
That way you control how you build the key for each event.
Regards,
Gonzalo
On 8 September 2015 at 06:42,
wrote:
>
> Hello,
>
> I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm
> looking for
ber 2015 at 13:14,
wrote:
>
>
>
> Von:Gonzalo Herreros
> An:user@flume.apache.org,
> Datum:08.09.2015 09:29
> Betreff:Re: How to customize the key in a HDFS SequenceFile sink
> --
>
> Thanks for your prompt reply. May
I'm not sure if I understand your topology and what you mean exactly by
"used Kafka channel/sink", it would help if you send the configuration.
My best guess about the error is that you are pointing the kafka source to
a topic that is used by a channel and not by a kafka sink
Regards,
Gonzalo
O
Usually that means you are loading different version of the servlet API.
You need to do a bit of classpath troubleshooting to find which jars
contain javax.servlet.AsyncContext and keep just the one included in
Flume/lib
Regards,
Gonzalo
On 17 September 2015 at 11:31, Radu Gheorghe
wrote:
> Hel
og Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Thu, Sep 17, 2015 at 1:46 PM, Gonzalo Herreros
> wrote:
>
>> Usually that means you are loading different version of the servlet API.
>> You need to do a bit of classpath tro
> Best regards,
> Radu
>
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Thu, Sep 17, 2015 at 2:50 PM, Gonzalo Herreros
> wrote:
>
>> That's it.
>> Remove that jar from the
Does it happen with Oracle JDK 8 or is only with OpenJDK?
Regards,
Gonzalo
On Sep 17, 2015 8:06 PM, "Doug McClure" wrote:
> When trying to use Flume 1.6 and the spooldir source I'm getting this
> error. Other sources work fine. Are there known issues with Java 1.8.0_51?
>
> Tks - Doug
>
>
> *a
03:14, Doug McClure wrote:
> Do you recommend I test with other versions? I'm using Cloudera's RPM
> based version so I'll need to see where that's being set.
>
> Doug
>
> On Thu, Sep 17, 2015 at 5:05 PM, Gonzalo Herreros
> wrote:
>
>> D
If the parameter is chosen from a fixed list, it's cumbersome but can be
done.
However if you want it to be arbitrary and create topics on demand then you
need to write your own code custom sinks and it's not trivial to manage it
efficiently
Regards,
Gonzalo
On Sep 19, 2015 7:42 PM, "Hemanth Abbin
Set the same groupId in all the sources using the same topic.
Each message will be read just by one of them
Saludos,
Gonzalo
On Sep 24, 2015 9:59 PM, "Carlos Rojas Matas" wrote:
> Hi Guys!
>
> Thanks for accepting my request. We're using flume to ingest massive
> amount of data from a kafka sour
There are subtle but significant differences.
When you configure in the sink: "batchSize" you are specifying how many
messages are taken as a transaction from the channel at once (like in any
other sink).
While the Kafka property "batch.num.messages" (which in the flume config is
specified as "kaf
My guess is that the HBase serializer is not filling the payload column
correctly. i.e. an empty value in the property "payloadColumn"
Can you share the sink configuration?
Regards,
Gonzalo
On 29 September 2015 at 23:40, Tinte garcia, Miguel Angel <
miguel.ti...@atos.net> wrote:
> Hi,
> I am try
s.channel1.byteCapacity=134217728
>
>
>
> agent.sinks.hbaseSink.type=hbase
>
> agent.sinks.hbaseSink.channel=channel1
>
> agent.sinks.hbaseSink.channel.capacity=100
>
> agent.sinks.hbaseSink.channel.transactionCapacity=10
>
> agent.sinks.hbaseSink.table=Test_
OS disk space is usually freed later after you delete files in hdfs (unless
it needs it now), check the available space on the hfds console to see if
there if space is freed
Hdfs allocates blocks, not space, and doesn't matter if you kill the
process that requested the blocks
Regards
Gonzalo
On Oc
I don't think is possible to write one hdfs file per event with the default
sink.
But it shouldn't be too hard to extend it to do what you want.
Kafka works best with small messages, not big files.
Maybe it would be a better option to send the files directly the HDFS Http
server or create an NFS g
I believe you are suffering from this bug:
https://issues.apache.org/jira/browse/FLUME-2778
So when it's running is able to keep up but when the channel has more than
4 events queued, the Sink tried to extract 100 (default batch size) and you
get that error.
Regards,
Gonzalo
On 13 October 2015 at
Why don't you use a Kafka channel?
It would be simpler and it would meet your initial requirement of having
channel fail tolerance.
Regards,
Gonzalo
On 19 October 2015 at 10:23, Simone Roselli
wrote:
> However,
>
> since the arrive order on Kafka (main sink) is not a particular problem to
> me,
le roll, other sinks..). In
> case of Kafka Channel (another separated Kafka cluster) I would exclusively
> rely on the Kafka cluster, which was my initial non-ideal situation, having
> it as a Sink.
>
>
> Thanks
> Simone
>
>
>
>
>
> On Mon, Oct 19, 2015 at 1
I would use a Hadoop distribution such as Cloudera or Hortonworks. Both
have free versions including monitoring and alert tools.
If you think that is too much, I believe Apache Ambari has that capability.
Finally, the most lightweight solution is a standard linux tool to
monitor/restart processes,
>
> On Tue, Oct 20, 2015 at 1:32 PM, Gonzalo Herreros
> wrote:
>
>> I would use a Hadoop distribution such as Cloudera or Hortonworks. Both
>> have free versions including monitoring and alert tools.
>> If you think that is too much, I believe Apache Ambari has that
Hari, I wish every success in this new role!!
On 22 October 2015 at 01:59, Ashish wrote:
> Congrats Hari !
>
> Arvind - Thanks for watching over and taking care of the community.
> Hope you would continue to do so in the future as well :)
>
> On Wed, Oct 21, 2015 at 5:50 PM, Arvind Prabhakar
>
Create a channel for each sink and then link the source to the 3 channels
instead of just one
Regards,
Gonzalo
On 27 October 2015 at 07:08, lizhenm...@163.com wrote:
>
> hi all:
> i want to split one source log to many sinks, but i don't know how and
> where to split it. Thanks for regards.
blems like
transactions or low performance).
Regards,
Gonzalo
On 27 October 2015 at 08:33, lizhenm...@163.com wrote:
>
> thank to Gonzalo,
> but that is just a sample, the one event maybe split to 10,100...
> --
> lizhenm...@163.com
>
>
> *Fro
The capacity is just the buffer size which is how long is the queue of
events we can accept which are waiting to the processed by a sink.
How many event can be processed (throughput) really depends on how fast the
sink can handle them.
In other words, if the sink is not able to keep up, eventually
Hola Guillermo,
If I understand correctly you want Flume to write to kafka as a channel and
then Spark to read from kafka.
To do that you have two options:
- Make Spark deserialize the FlumeEvent read from kafka. for instance in
scala:
val parseFlumeEvent = { body: Array[Byte] =>
va
I did a custom serializer that parses the event an json and the top level
properties become columns inside a configurable column family.
I also have a custom property to configure which fields make up the
composite key (which I salt based on the number of regions).
It shouldn't be too hard having
ember 2015 at 11:07, Rani Yaroshinski
wrote:
> Any pointers to the code, as sample ?
>
> On Sat, Nov 7, 2015 at 12:45 PM, Gonzalo Herreros
> wrote:
>
>> I did a custom serializer that parses the event an json and the top level
>> properties become columns inside a confi
byte[] columnFamily)
you get the event and extract the data
In
public List getActions() throws FlumeException
you generate the HBase Put actions.
Regards,
Gonzalo
On 10 November 2015 at 14:55, Gonzalo Herreros wrote:
> I started by extending RegexHbaseEventSerializer, so I didn't have t
I think your expectations are not realistic.
The MemoryChannel adds minimum overhead but is not reliable like the
KafkaChannel
In the first case you can lose 10k messages if you are unlucky while with
the KafkaChannel you won't lose a single one.
With more reliability normally you have a small perf
If that is just with a single server, 600 messages per sec doesn't sound
bad to me.
Depending on the size of each message, it could be the network the limiting
factor.
I would try with the null sink and in memory channel. If that doesn't
improve things I would say you need more nodes to go beyond
I looks fine to me. Do you get any errors?
Do you have kerberos enabled? maybe is a security issue
Are you sure the problem is hdfs and not some netcat error? Try with
another sink to confirm that.
Regards,
Gonzalo
On 16 November 2015 at 07:35, zaenal rifai wrote:
> Hello guys, i'm newbie on fl
method sink and it still
> failed.
>
> can you give me some simple example for agent configuration to collect log
> and write on hdfs ?
>
>
>
>
>
>
> On 16 November 2015 at 14:47, Gonzalo Herreros
> wrote:
>
>> I looks fine to me. Do you get any errors?
&g
es
>
> and i check on hdfs, there is no file
>
> On 16 November 2015 at 15:34, Gonzalo Herreros
> wrote:
>
>> I see your problem now. You are using "memory-channel" and "memoryChannel"
>> to refer to the same thing.
>> Change the 3rd line to:
AFAIK, only the Hdfs sink supports that. So you are going to have to extend
the standard HDFSEventSink to build a tar using a library like Apache
Commons Compress.
Please note, the tar format is not really compressed but just appended, if
you want compression you need to add gzip on the tar so it b
For the sink, I would be surprised if the connection to kafka is not the
same all the time.
For the http source you could create a custom source where you keep a long
lived http connection and have some way of detecting where a batch of
events is sent (e.g. a new line character).
Regards,
Gonzalo
I see two options, either run a Flume agent on windows that spools the
local dir and either has access to hdfs or talks to other Flume agents
which do
Or you can have a small script scheduled on a regular basis to get logs and
post them to Flume.
Regards,
Gonzalo
On 19 November 2015 at 09:07, cha
Using a distribution like Cloudera or Hortonworks. They both have free
versions.
Alternatively you can use standard linux process monitoring tools.
Regards,
Gonzalo
On 19 November 2015 at 09:11, Zhishan Li wrote:
>
>
> Is there a way to simply and conveniently monitor flume agents?
>
> Current
le thread would obviously be slow. How many messages per batch?
> The bigger your batch is, better your perf will be
>
> On Saturday, November 14, 2015, Hemanth Abbina
> wrote:
>
> Thanks Gonzalo.
>
>
>
> Yes, it’s a single server. First we would like to confirm the m
When it fails you are running it from the conf directory so it doesn't find
the --conf conf
In the second case you run it from the flume home dir
Regards,
Gonzalo
On 20 November 2015 at 22:23, Minnie Haridasa (mharidas) wrote:
> Hi,
>
>
> I am using Apache Flume 1.6 and using a simple memory ch
Hi,
As any other Apache licensed project, it is open source and free to use.
To use it from C/.net I would use a standard protocol such as http.
Configure an http source in Flume and then you can use an http client in
any language.
Alternatively you can have C/.net generate local files and have a
You cannot have multiple processes writing concurrently to the same hdfs
file.
What you can do is have a topology where many agents forward to an agent
that writes to hdfs but you need a channel that allows the single hdfs
writer to lag behind without slowing the sources.
A kafka channel might be a
43, zaenal rifai wrote:
> why not to use avro channel gonzalo ?
>
> On 26 November 2015 at 20:12, Gonzalo Herreros
> wrote:
>
>> You cannot have multiple processes writing concurrently to the same hdfs
>> file.
>> What you can do is have a topology where many agents
n 27 November 2015 at 14:52, Gonzalo Herreros
> wrote:
>
>> Hi Zaenal,
>>
>> There is no "avro channel", Flume will write by default avro to any of
>> the channels.
>> The point is that a memory channel or even a file channel will very
>> quic
Adding a library to Flume shouldn't affect hive or any other tools.
You can add the jar to the lib or plugin.d directories.
Regards,
Gonzalo
On 1 December 2015 at 10:13, yogendra reddy wrote:
> update
>
> I ran the flume agent first and then made changes to hadoop log4j
> properties file and af
din't follow. I'm adding flume libraries to hadoop classpath i.e
> hadoop-hdfs lib folder and this is causing the issue. I need these jars to
> be in hdfs lib as I have added log4j appender to hdfs log4j properties.
>
> On Tue, Dec 1, 2015 at 4:09 PM, Gonzalo Herreros
> wrot
appender.Log4jAppender.
>
>
> On Tue, Dec 1, 2015 at 4:24 PM, Gonzalo Herreros
> wrote:
>
>> That doesn't sound right:
>> -Flume should use it's own log4j.properties in the conf directory
>> -Never update the hdfs libs to add stuff you need for Flume, each pro
It might be a bug in the sink you are using.
For instance, I have a serializer for the HbaseSink so I added two custom
properties.
tier1.sinks.hbase-sink-1.serializer.numberBuckets=20
tier1.sinks.hbase-sink-1.serializer.customKey=timestamp,type,resource,hostname
Then in the configure method the
s?
>
> -R
> P.S - Sent code sample and config in separate email directly to you.
>
> ------
> *From:* Gonzalo Herreros
> *Sent:* Thursday, December 3, 2015 12:32 AM
> *To:* user
> *Subject:* Re: Context/Configuration values not passed to custom
&g
I don't know what is this "Database HA" feature but I can tell you what I
do.
I use Kafka channels and have multiple agents with the same configuration.
In front of the agents I have an http load balancer.
That way, any agent can accept requests and any agent can process them once
in the channel.
I think the problem is in your json, while you are sending an array on
events, the event doesn't match what Flume expects which is the properties
headers (optional) and body (string).
Try like this:
curl -H "Content-Type: application/json" -X POST -d '[{"body":
"{\"username\":\"shashi\",\"password
What that means is that the KafkaSource is trying to read messages from the
last time it was running (or at least the last time some client used kafka
with the same groupId) but they have been already deleted by Kafka so is
working you that there are messages that have been missed.
Even if is the f
messages lost in flume pipeline. But I don’t know the
> reason. Please do me a favour.
>
> Thanks,
>
>
>
> On 7 Dec, 2015, at 4:06 pm, Gonzalo Herreros wrote:
>
> What that means is that the KafkaSource is trying to read messages from
> the last time it was runni
Normally you configure encryption in hdfs so it works automatically, rather
than having each tool having to worry about it.
Otherwise, you will need to build your own custom sink
Regards,
Gonzalo
On 7 December 2015 at 12:15, Ravi Kiran Aita
wrote:
>
>
> Hi,
>
>
>
> We are working on a prototyp
I'm thinking the groups are not needed and also the asterisc in regex
doesn't work like in linux.
Try this:
^filedata.*\.log$|^file_post.*\.log$
If it doesn't work, list the full names of the files that aren't ignored
but should. You can use a tool like http://myregexp.com/ to test it.
Regards,
Unless you are using a custom partitioner, the DefaultPartitioner assigns
them randomly so the content of the headers shouldn't make any difference.
The only explanation I can see for what you are seeing is that somehow the
producer thinks there are only 2.
Are the msgs going just to 0 and 1 or dif
Why don't you use a AvroSink and Source to link the tiers? I believe it
will preserve the headers.
You can still use Kafka as the channel if you want it's reliability
Regards,
Gonzalo
On 17 December 2015 at 20:02, Jean wrote:
> Hello,
> I have this configuration :
> Source agent 1=> channel =>
I guess something is wrong with the Spooling sources that don't take the
files.
Check the Flume log initialization for errors.
Or maybe you have some pattern that matches one of the sources files but
not the others.
Regards,
Gonzalo
On 18 December 2015 at 14:42, Jeff Alfeld wrote:
> I am encou
Seems pretty obvious to me. You are using a class that doesn't exist
httpagent1.sources.http-source.type =
org.apache.flume.source.http.testHTTPSource
You should use:
httpagent1.sources.http-source.type = org.apache.flume.source.http.HTTPSource
which can be shortcutted to just "http"
You'll
It's a warning that it's not finding the log4j configuration file, the real
problem is that probably is not finding any of the other configuration
files but you don't see the errors.
I think the issue might be with "-c conf", I kinda remember is relative to
the bin directory like this "../conf"; if
That Cloudera documentation is ancient and talks about the old Flume
(that's why it differs from what you see in the Apache website), the modern
Flume (also called Flume-ng) doesn't not have a master, to have HA you need
several agents with the same configuration and a load balancer in front.
In so
wrote:
> Thanks Gonzalo for quick reply.
>
> By load balancer, do you mean load balancing group of flume agents ?
> If yes, I do need to take care of HA for sources, channels and sinks too.
> Am I correct ?
>
>
>
> Regards,
> Ajay
>
>
> On 18-Jan-2016, at 2:5
You can configure rsyslog to do the failover and only send to one of them
using "$ActionExecOnlyWhenPreviousIsSuspended on" I think
If you can life with an occasional duplicate that should do, otherwise you
need something more complex.
Regards,
Gonzalo
On 21 January 2016 at 15:05, Margus Roo wro
I don't know the internal details but I guess all those threads write to a
single file, so it will reach a point where there is no improvement.
On the other side having multiple sinks will create multiple files, which
should scale better but you need to make sure the files are written in
different
I'm concerned with the warning "no brokers found when trying to rebalance"
Double check that the path in zookeeper is correct zk01:2181/mesos-kafka
and it's not the standard /kafka
When you connect with the kafka-console-consumer, do you specify
/mesos-kafka or just zk01:2181?
You can use the zkcl
opTime" : "0",
> "KafkaCommitTimer" : "0",
> "Type" : "SOURCE",
> "AppendBatchAcceptedCount" : "0",
> "EventReceivedCount" : "0",
> "OpenConnectionCount" : "0&qu
1.6 doesn't support kafka kerberos
That upgrade is work in progress, maybe there is a nightly built already
with it but not a release.
Gonzalo
On 11 February 2016 at 10:39, manish jaiswal wrote:
> Hi,
>
>
>
> I am not able to use kafka kerberos security auth via flume 1.6.
>
> can you please he
There is no hdfs source because normally you want to bring data into hadoop
(it's possible to have an hdfs source but I don't think anybody had that
need)
To copy data between hdfs clusters better use "distcp" included in Hadoop
Gonzalo
On 11 February 2016 at 10:41, manish jaiswal wrote:
> Hi,
I don't think that's possible without writing/reusing custom code.
You would need an interceptor to add the header following the conditions
you describe so the multiplexer can do the routing.
Gonzalo
On 15 February 2016 at 21:06, chandra koripella wrote:
> Hi,
>
>
> Is there a way implement mul
The way I have done that is by having a copy the spark config folder with
the updated log4j settings and running the job with the flag that points to
that configuration folder.
The drawback is that if you change other Spark settings for the cluster,
that job won't be updated.
I guess other options
Assuming that the batches are just slow and queued up (if it makes no
progress at all means is something wrong with the job), usually you can
improve the speed by increasing the number executors, cores or memory.
It's a bit of trial/error plus observing how the job behaves.
To avoid the queues the
Local will only run with one executor, you are specifying 4 cores to be
used by the executor
That affects the number of taks an executor can run concurrently.
Please note this is the Flume distribution list, not the Spark one
Gonzalo
On 25 February 2016 at 02:22, Sutanu Das wrote:
> Community
Could it be that you are serializing avro instead of json?
On 2 March 2016 at 08:25, Baris Akgun (Garanti Teknoloji) <
barisa...@garanti.com.tr> wrote:
> Hi,
>
>
>
> When I send json data to flume with using http post, flume adds
> Co**ntent-Typeapplication/json** for each json post.
>
>
>
> In m
tter data but in flume channel ı saw content type word
> for each tweet. Is it normal ? How can ı send just tweets json without any
> content type. I took tweets json from GNIP company.
>
> Thanks
> iPhone'umdan gönderildi
>
> 2 Mar 2016 tarihinde 10:56 saatinde, Gonzalo H
isplayName":"Twitter","link":"
> http://www.twitter.com"},"link":";
> http://twitter.com/semagokcee/statuses/642910743302668288","body":"RT
> @Hadis_Tweet: \"Kim sabah namazını kılarsa, Allah'ın garantisi
> altı
; tier1.sinks.sink1.hdfs.rollInterval=60
>
> tier1.sinks.sink1.hdfs.rollSize = 268435456
>
> tier1.sinks.sink1.hdfs.batchSize = 1
>
> tier1.sinks.sink1.hdfs.writeFormat = Text
>
> tier1.sinks.sink1.serializer = text
>
>
>
> tier1.sources.source1.channels = channel1
&g
Jms queues guarantee only one of the clients will get each message.
Unless you build it yourself, Flume doesn't have active/pasive. HA is
achieved by having multiple agents running the same configuration.
On 2 March 2016 at 18:12, samnik60 . wrote:
> Hi guys,
> I have the following queries about
e , if i use a spool directory as source i cannot have
> active/active or active/stand by HA , since active/active will result in
> race condition when two source try to process from same directory.
>
> Thanks,
> sam
>
> On Wed, Mar 2, 2016 at 1:17 PM, Gonzalo Herreros
> wrot
Looks like the hdfs sink needs to be updated to support the latest Hadoop.
In the meanwhile I would use an older client, which probably works in a
newer server. Alternatively you can use the Flume branch that Hortonworks
compile for 2.7.1
Gonzalo
On 11 March 2016 at 03:37, 2402 楊建中 joeyang wrot
For me the best practice is what the big vendors do and recommend.
Your solution is of deploying multiple identical agents sharing the group
in the source is fine as long as you have a durable channel and are ok with
some messages getting delayed (or even lost) when a node goes down.
If you want f
I think you are right
Gonzalo
On 8 April 2016 at 03:39, Jeong-shik Jang wrote:
> Hi Flume team and users,
>
> User Guide document says:
>
> readSmallestOffset false When set to true, the channel will read all data
> in the topic, starting from the oldest event when false, it will read only
> ev
You cannot set a serializer in the channel, whatever you put in the topic
events will be stored in hdfs so you shouldn't need it
If you want to do some parsing then you can implement a sink serializer.
Gonzalo
On 8 April 2016 at 08:44, Baris Akgun (Garanti Teknoloji) <
barisa...@garanti.com.tr> w
That would depend on the channel.
AFAIK, all the channels provided are FIFO without expiration but
technically you could implement a channel that does that.
You could achieve some priority management using multiplexing.
Gonzalo
On 15 April 2016 at 11:38, Ronald Van De Kuil
wrote:
> Hello,
>
>
iority lane), and if there is a nomatch then it would route
> to the default channel. And if I would need more then I would need to make
> a code mod, right?
>
>
> 2016-04-15 13:54 GMT+02:00 Gonzalo Herreros :
>
>> That would depend on the channel.
>> AFAIK, all the chan
Seems your agent config file doesn't specify the host/port for the Sink
Gonzalo
On 28 April 2016 at 11:13, Divya Gehlot wrote:
> Hi,
> I am trying to move data from hdfs to Phoenix
> I downloaded the https://github.com/forcedotcom/phoenix/
> and build the project as per instrunctions in Apache
> Thanks for the help.
> I am just a day old to Flume
>
> Could you please help me which host/port do I need to specify ?
>
> Thanks,
> Divya
>
>
>
> On 28 April 2016 at 18:31, Gonzalo Herreros wrote:
>
>> Seems your agent config file doesn't specif
Flume 1.5.0 is pretty old,
Why don't you use version 1.6.0 included in CDH, that will ensure the
library compatibility.
On 18 May 2016 at 08:43, Baris Akgun (Garanti Teknoloji) <
barisa...@garanti.com.tr> wrote:
> Hi,
>
>
>
> I am trying to make real time indexing with using flume 1.5.0 and
> mor
It should work as you say.
I wonder how do you know the events are "empty", do you get new lines in
the console consumer?
Also, the example payload you show looks like avro but not the standard
FlumeEvent, can you show us your agent configuration
Gonzalo
On 2 June 2016 at 12:22, George M. wrote
Don't see any reason why it shouldn't work.
I would try without the morphline and the multiplexing, just to see what
you get in the channel and eliminate possible suspects.
My feeling is that the channed somehow is not receiving the standard
FlumeEvent, it might be something changed in the new unre
Seems the morphline is transforming the event into one without "body", are
you converting the event body into headers?
parseAsFlumeEvent only handles the body, the headers are lost.
On 2 June 2016 at 16:59, George M. wrote:
> With the morphlines and without multiplexing
> ===
The explanation could be clearer but it boils down to this:
-If you use parseAsEvent=true (default)
Then the events in the channel are avro FlumeEvents. So you have to
read/write avro but you have all the object metadata (timestamp, headers,
etc)
-If you use parseAsEvent=false
Then the events
Morphlines should be ok in both cases but if you disable the FlumeEvents
you will lose metadata, so the morthline should be aware of that.
You can always put whatever you want in the body (e.g. a json with the
headers plus the original body)
If metadata is so important to you, it would be better i
I think what you need is "multiplexing", you can read about it in the user
guide
Gonzalo
On Jun 6, 2016 2:50 AM, "Santoshakhilesh"
wrote:
> Hi All ,
>
> I have this particular scenario
>
> Avro Source -> Memory Channel - > Sink1 , Sink2 , Sink 3
>
> Now I need to do some changes to original even
In your config, the name of the partitioner is missing a T ("Paritioner"),
you should be getting an exception and maybe the sink is reverting to
partition by key:
relay_agent.sinks.activity_kafka_sink.kafka_partitioner.class =
org.apache.kafka.clients.producer.internals.RandomParitioner
Gonzalo
O
By default a Kafka producer will choose a random partition, however, I
believe the Kafka sink by default partitions of the message key, so if the
key is null it won't do a good job.
On 7 June 2016 at 09:43, Jason Williams wrote:
> Hey Chris,
>
> Thanks for help!
>
> Is that a limitation of the F
You need an interceptor to update/remove the topic header
Gonzalo
On Jun 12, 2016 4:57 AM, "lxw" wrote:
> Hi,All:
>
>I use Kafka Source to read events from one Kafka topic and write events
> to another Topic with Kafka Sink,
> the Kafka Sink topic configuration is not work, flume still write
Apache Flume 1.6 runs on kafka 0.8, what you have is a branch from Cloudera.
I would advise against that. The whole point of using a distribution is
that you use component versions they have integrated and tested together.
Either use kafka 0.9 or don't upgrade to 5.7 until you are ready
If you st
I would avoid doing calculations in the source, that can impact the
ingestion and cause timeouts, duplicates, etc. specially for some sources
(e.g. http).
However, what I have done in the past is having a durable channel, create a
custom sink that extends a regular sink and does additional
calculat
I think you should be able to do it using an Http source with a custom
handler that understands SOAP.
If not, then you need to create a plugin where you extend the standard http
source
Regards,
Gonzalo
On 12 September 2016 at 04:32, chen dong wrote:
> Hi guys,
>
> I am trying to load data from
98 matches
Mail list logo