Re: How to customize the key in a HDFS SequenceFile sink

2015-09-08 Thread Gonzalo Herreros
I would implement a custom serializer and configure it in the standard Hdfs sink. That way you control how you build the key for each event. Regards, Gonzalo On 8 September 2015 at 06:42, wrote: > > Hello, > > I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm > looking for

Re: Re: How to customize the key in a HDFS SequenceFile sink

2015-09-08 Thread Gonzalo Herreros
ber 2015 at 13:14, wrote: > > > > Von:Gonzalo Herreros > An:user@flume.apache.org, > Datum:08.09.2015 09:29 > Betreff:Re: How to customize the key in a HDFS SequenceFile sink > -- > > Thanks for your prompt reply. May

Re: Avro source and sink

2015-09-15 Thread Gonzalo Herreros
I'm not sure if I understand your topology and what you mean exactly by "used Kafka channel/sink", it would help if you send the configuration. My best guess about the error is that you are pointing the kafka source to a topic that is used by a channel and not by a kafka sink Regards, Gonzalo O

Re: Dependency issues while starting Flume 1.6 with MorphlineSolrSink

2015-09-17 Thread Gonzalo Herreros
Usually that means you are loading different version of the servlet API. You need to do a bit of classpath troubleshooting to find which jars contain javax.servlet.AsyncContext and keep just the one included in Flume/lib Regards, Gonzalo On 17 September 2015 at 11:31, Radu Gheorghe wrote: > Hel

Re: Dependency issues while starting Flume 1.6 with MorphlineSolrSink

2015-09-17 Thread Gonzalo Herreros
og Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > On Thu, Sep 17, 2015 at 1:46 PM, Gonzalo Herreros > wrote: > >> Usually that means you are loading different version of the servlet API. >> You need to do a bit of classpath tro

Re: Dependency issues while starting Flume 1.6 with MorphlineSolrSink

2015-09-17 Thread Gonzalo Herreros
> Best regards, > Radu > > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > On Thu, Sep 17, 2015 at 2:50 PM, Gonzalo Herreros > wrote: > >> That's it. >> Remove that jar from the

Re: Flume 1.6 and Java 1.8 with spooldir source bug?

2015-09-17 Thread Gonzalo Herreros
Does it happen with Oracle JDK 8 or is only with OpenJDK? Regards, Gonzalo On Sep 17, 2015 8:06 PM, "Doug McClure" wrote: > When trying to use Flume 1.6 and the spooldir source I'm getting this > error. Other sources work fine. Are there known issues with Java 1.8.0_51? > > Tks - Doug > > > *a

Re: Flume 1.6 and Java 1.8 with spooldir source bug?

2015-09-18 Thread Gonzalo Herreros
03:14, Doug McClure wrote: > Do you recommend I test with other versions? I'm using Cloudera's RPM > based version so I'll need to see where that's being set. > > Doug > > On Thu, Sep 17, 2015 at 5:05 PM, Gonzalo Herreros > wrote: > >> D

Re: Flume usage with Kafka channel & HDFS sink

2015-09-19 Thread Gonzalo Herreros
If the parameter is chosen from a fixed list, it's cumbersome but can be done. However if you want it to be arbitrary and create topics on demand then you need to write your own code custom sinks and it's not trivial to manage it efficiently Regards, Gonzalo On Sep 19, 2015 7:42 PM, "Hemanth Abbin

Re: Multiple agents in high availability

2015-09-24 Thread Gonzalo Herreros
Set the same groupId in all the sources using the same topic. Each message will be read just by one of them Saludos, Gonzalo On Sep 24, 2015 9:59 PM, "Carlos Rojas Matas" wrote: > Hi Guys! > > Thanks for accepting my request. We're using flume to ingest massive > amount of data from a kafka sour

Re: Batchsize in kafka sink

2015-09-27 Thread Gonzalo Herreros
There are subtle but significant differences. When you configure in the sink: "batchSize" you are specifying how many messages are taken as a transaction from the channel at once (like in any other sink). While the Kafka property "batch.num.messages" (which in the flume config is specified as "kaf

Re: No columns to insert for #1 item

2015-09-30 Thread Gonzalo Herreros
My guess is that the HBase serializer is not filling the payload column correctly. i.e. an empty value in the property "payloadColumn" Can you share the sink configuration? Regards, Gonzalo On 29 September 2015 at 23:40, Tinte garcia, Miguel Angel < miguel.ti...@atos.net> wrote: > Hi, > I am try

Re: No columns to insert for #1 item

2015-09-30 Thread Gonzalo Herreros
s.channel1.byteCapacity=134217728 > > > > agent.sinks.hbaseSink.type=hbase > > agent.sinks.hbaseSink.channel=channel1 > > agent.sinks.hbaseSink.channel.capacity=100 > > agent.sinks.hbaseSink.channel.transactionCapacity=10 > > agent.sinks.hbaseSink.table=Test_

Re: Wrong disk space in HDFS

2015-10-07 Thread Gonzalo Herreros
OS disk space is usually freed later after you delete files in hdfs (unless it needs it now), check the available space on the hfds console to see if there if space is freed Hdfs allocates blocks, not space, and doesn't matter if you kill the process that requested the blocks Regards Gonzalo On Oc

Re: writing avro files to hdfs

2015-10-09 Thread Gonzalo Herreros
I don't think is possible to write one hdfs file per event with the default sink. But it shouldn't be too hard to extend it to do what you want. Kafka works best with small messages, not big files. Maybe it would be a better option to send the files directly the HDFS Http server or create an NFS g

Re: recovery after memory transaction capacity is exceeded

2015-10-13 Thread Gonzalo Herreros
I believe you are suffering from this bug: https://issues.apache.org/jira/browse/FLUME-2778 So when it's running is able to keep up but when the channel has more than 4 events queued, the Sink tried to extract 100 (default batch size) and you get that error. Regards, Gonzalo On 13 October 2015 at

Re: Flume-ng 1.6 reliable setup

2015-10-19 Thread Gonzalo Herreros
Why don't you use a Kafka channel? It would be simpler and it would meet your initial requirement of having channel fail tolerance. Regards, Gonzalo On 19 October 2015 at 10:23, Simone Roselli wrote: > However, > > since the arrive order on Kafka (main sink) is not a particular problem to > me,

Re: Flume-ng 1.6 reliable setup

2015-10-19 Thread Gonzalo Herreros
le roll, other sinks..). In > case of Kafka Channel (another separated Kafka cluster) I would exclusively > rely on the Kafka cluster, which was my initial non-ideal situation, having > it as a Sink. > > > Thanks > Simone > > > > > > On Mon, Oct 19, 2015 at 1

Re: Flume Agent failure Handlling

2015-10-20 Thread Gonzalo Herreros
I would use a Hadoop distribution such as Cloudera or Hortonworks. Both have free versions including monitoring and alert tools. If you think that is too much, I believe Apache Ambari has that capability. Finally, the most lightweight solution is a standard linux tool to monitor/restart processes,

Re: Flume Agent failure Handlling

2015-10-20 Thread Gonzalo Herreros
> > On Tue, Oct 20, 2015 at 1:32 PM, Gonzalo Herreros > wrote: > >> I would use a Hadoop distribution such as Cloudera or Hortonworks. Both >> have free versions including monitoring and alert tools. >> If you think that is too much, I believe Apache Ambari has that

Re: [ANNOUNCE] Change of Apache Flume PMC Chair

2015-10-22 Thread Gonzalo Herreros
Hari, I wish every success in this new role!! On 22 October 2015 at 01:59, Ashish wrote: > Congrats Hari ! > > Arvind - Thanks for watching over and taking care of the community. > Hope you would continue to do so in the future as well :) > > On Wed, Oct 21, 2015 at 5:50 PM, Arvind Prabhakar >

Re: how to split one event to many

2015-10-27 Thread Gonzalo Herreros
Create a channel for each sink and then link the source to the 3 channels instead of just one Regards, Gonzalo On 27 October 2015 at 07:08, lizhenm...@163.com wrote: > > hi all: > i want to split one source log to many sinks, but i don't know how and > where to split it. Thanks for regards.

Re: Re: how to split one event to many

2015-10-27 Thread Gonzalo Herreros
blems like transactions or low performance). Regards, Gonzalo On 27 October 2015 at 08:33, lizhenm...@163.com wrote: > > thank to Gonzalo, > but that is just a sample, the one event maybe split to 10,100... > -- > lizhenm...@163.com > > > *Fro

Re: How to set the capacity of the memory channel ?

2015-11-04 Thread Gonzalo Herreros
The capacity is just the buffer size which is how long is the queue of events we can accept which are waiting to the processed by a sink. How many event can be processed (throughput) really depends on how fast the sink can handle them. In other words, if the sink is not able to keep up, eventually

Re: Reading with Spark from KafkaChannel

2015-11-06 Thread Gonzalo Herreros
Hola Guillermo, If I understand correctly you want Flume to write to kafka as a channel and then Spark to read from kafka. To do that you have two options: - Make Spark deserialize the FlumeEvent read from kafka. for instance in scala: val parseFlumeEvent = { body: Array[Byte] => va

Re: Hbase Sink

2015-11-07 Thread Gonzalo Herreros
I did a custom serializer that parses the event an json and the top level properties become columns inside a configurable column family. I also have a custom property to configure which fields make up the composite key (which I salt based on the number of regions). It shouldn't be too hard having

Re: Hbase Sink

2015-11-10 Thread Gonzalo Herreros
ember 2015 at 11:07, Rani Yaroshinski wrote: > Any pointers to the code, as sample ? > > On Sat, Nov 7, 2015 at 12:45 PM, Gonzalo Herreros > wrote: > >> I did a custom serializer that parses the event an json and the top level >> properties become columns inside a confi

Re: Hbase Sink

2015-11-10 Thread Gonzalo Herreros
byte[] columnFamily) you get the event and extract the data In public List getActions() throws FlumeException you generate the HBase Put actions. Regards, Gonzalo On 10 November 2015 at 14:55, Gonzalo Herreros wrote: > I started by extending RegexHbaseEventSerializer, so I didn't have t

Re: KafkaSink vs KafkaChannel performance

2015-11-12 Thread Gonzalo Herreros
I think your expectations are not realistic. The MemoryChannel adds minimum overhead but is not reliable like the KafkaChannel In the first case you can lose 10k messages if you are unlucky while with the KafkaChannel you won't lose a single one. With more reliability normally you have a small perf

Re: Flume benchmarking with HTTP source & File channel

2015-11-14 Thread Gonzalo Herreros
If that is just with a single server, 600 messages per sec doesn't sound bad to me. Depending on the size of each message, it could be the network the limiting factor. I would try with the null sink and in memory channel. If that doesn't improve things I would say you need more nodes to go beyond

Re: Flume-agent conf on ambari

2015-11-15 Thread Gonzalo Herreros
I looks fine to me. Do you get any errors? Do you have kerberos enabled? maybe is a security issue Are you sure the problem is hdfs and not some netcat error? Try with another sink to confirm that. Regards, Gonzalo On 16 November 2015 at 07:35, zaenal rifai wrote: > Hello guys, i'm newbie on fl

Re: Flume-agent conf on ambari

2015-11-16 Thread Gonzalo Herreros
method sink and it still > failed. > > can you give me some simple example for agent configuration to collect log > and write on hdfs ? > > > > > > > On 16 November 2015 at 14:47, Gonzalo Herreros > wrote: > >> I looks fine to me. Do you get any errors? &g

Re: Flume-agent conf on ambari

2015-11-16 Thread Gonzalo Herreros
es > > and i check on hdfs, there is no file > > On 16 November 2015 at 15:34, Gonzalo Herreros > wrote: > >> I see your problem now. You are using "memory-channel" and "memoryChannel" >> to refer to the same thing. >> Change the 3rd line to:

Re: Flume sink for writing Tar formatted files on HDFS

2015-11-17 Thread Gonzalo Herreros
AFAIK, only the Hdfs sink supports that. So you are going to have to extend the standard HDFSEventSink to build a tar using a library like Apache Commons Compress. Please note, the tar format is not really compressed but just appended, if you want compression you need to add gzip on the tar so it b

Re: Possibility of persisting the connection

2015-11-17 Thread Gonzalo Herreros
For the sink, I would be surprised if the connection to kafka is not the same all the time. For the http source you could create a custom source where you keep a long lived http connection and have some way of detecting where a batch of events is sent (e.g. a new line character). Regards, Gonzalo

Re: IIS Web server Logs

2015-11-19 Thread Gonzalo Herreros
I see two options, either run a Flume agent on windows that spools the local dir and either has access to hdfs or talks to other Flume agents which do Or you can have a small script scheduled on a regular basis to get logs and post them to Flume. Regards, Gonzalo On 19 November 2015 at 09:07, cha

Re: How to monitor flume agents

2015-11-19 Thread Gonzalo Herreros
Using a distribution like Cloudera or Hortonworks. They both have free versions. Alternatively you can use standard linux process monitoring tools. Regards, Gonzalo On 19 November 2015 at 09:11, Zhishan Li wrote: > > > Is there a way to simply and conveniently monitor flume agents? > > Current

RE: Flume benchmarking with HTTP source & File channel

2015-11-19 Thread Gonzalo Herreros
le thread would obviously be slow. How many messages per batch? > The bigger your batch is, better your perf will be > > On Saturday, November 14, 2015, Hemanth Abbina > wrote: > > Thanks Gonzalo. > > > > Yes, it’s a single server. First we would like to confirm the m

Re: Help! Apache Flume 1.6 Intermittently fails to function

2015-11-20 Thread Gonzalo Herreros
When it fails you are running it from the conf directory so it doesn't find the --conf conf In the second case you run it from the flume home dir Regards, Gonzalo On 20 November 2015 at 22:23, Minnie Haridasa (mharidas) wrote: > Hi, > > > I am using Apache Flume 1.6 and using a simple memory ch

Re: Information regarding Apache Flume

2015-11-26 Thread Gonzalo Herreros
Hi, As any other Apache licensed project, it is open source and free to use. To use it from C/.net I would use a standard protocol such as http. Configure an http source in Flume and then you can use an http client in any language. Alternatively you can have C/.net generate local files and have a

Re: Flume Topology

2015-11-26 Thread Gonzalo Herreros
You cannot have multiple processes writing concurrently to the same hdfs file. What you can do is have a topology where many agents forward to an agent that writes to hdfs but you need a channel that allows the single hdfs writer to lag behind without slowing the sources. A kafka channel might be a

Re: Flume Topology

2015-11-26 Thread Gonzalo Herreros
43, zaenal rifai wrote: > why not to use avro channel gonzalo ? > > On 26 November 2015 at 20:12, Gonzalo Herreros > wrote: > >> You cannot have multiple processes writing concurrently to the same hdfs >> file. >> What you can do is have a topology where many agents

Re: Flume Topology

2015-11-27 Thread Gonzalo Herreros
n 27 November 2015 at 14:52, Gonzalo Herreros > wrote: > >> Hi Zaenal, >> >> There is no "avro channel", Flume will write by default avro to any of >> the channels. >> The point is that a memory channel or even a file channel will very >> quic

Re: Flume log4j Appender issue

2015-12-01 Thread Gonzalo Herreros
Adding a library to Flume shouldn't affect hive or any other tools. You can add the jar to the lib or plugin.d directories. Regards, Gonzalo On 1 December 2015 at 10:13, yogendra reddy wrote: > update > > I ran the flume agent first and then made changes to hadoop log4j > properties file and af

Re: Flume log4j Appender issue

2015-12-01 Thread Gonzalo Herreros
din't follow. I'm adding flume libraries to hadoop classpath i.e > hadoop-hdfs lib folder and this is causing the issue. I need these jars to > be in hdfs lib as I have added log4j appender to hdfs log4j properties. > > On Tue, Dec 1, 2015 at 4:09 PM, Gonzalo Herreros > wrot

Re: Flume log4j Appender issue

2015-12-01 Thread Gonzalo Herreros
appender.Log4jAppender. > > > On Tue, Dec 1, 2015 at 4:24 PM, Gonzalo Herreros > wrote: > >> That doesn't sound right: >> -Flume should use it's own log4j.properties in the conf directory >> -Never update the hdfs libs to add stuff you need for Flume, each pro

Re: Context/Configuration values not passed to custom serializer.

2015-12-03 Thread Gonzalo Herreros
It might be a bug in the sink you are using. For instance, I have a serializer for the HbaseSink so I added two custom properties. tier1.sinks.hbase-sink-1.serializer.numberBuckets=20 tier1.sinks.hbase-sink-1.serializer.customKey=timestamp,type,resource,hostname Then in the configure method the

Re: Context/Configuration values not passed to custom serializer.

2015-12-03 Thread Gonzalo Herreros
s? > > -R > P.S - Sent code sample and config in separate email directly to you. > > ------ > *From:* Gonzalo Herreros > *Sent:* Thursday, December 3, 2015 12:32 AM > *To:* user > *Subject:* Re: Context/Configuration values not passed to custom &g

Re: how to implement "database HA" feature

2015-12-04 Thread Gonzalo Herreros
I don't know what is this "Database HA" feature but I can tell you what I do. I use Kafka channels and have multiple agents with the same configuration. In front of the agents I have an http load balancer. That way, any agent can accept requests and any agent can process them once in the channel.

Re: Flume | Curl Post is not sending data to flume using http source

2015-12-06 Thread Gonzalo Herreros
I think the problem is in your json, while you are sending an array on events, the event doesn't match what Flume expects which is the properties headers (optional) and body (string). Try like this: curl -H "Content-Type: application/json" -X POST -d '[{"body": "{\"username\":\"shashi\",\"password

Re: Kafka Source Error

2015-12-07 Thread Gonzalo Herreros
What that means is that the KafkaSource is trying to read messages from the last time it was running (or at least the last time some client used kafka with the same groupId) but they have been already deleted by Kafka so is working you that there are messages that have been missed. Even if is the f

Re: Kafka Source Error

2015-12-07 Thread Gonzalo Herreros
messages lost in flume pipeline. But I don’t know the > reason. Please do me a favour. > > Thanks, > > > > On 7 Dec, 2015, at 4:06 pm, Gonzalo Herreros wrote: > > What that means is that the KafkaSource is trying to read messages from > the last time it was runni

Re: Option to encrypt data at HDFS Sink

2015-12-07 Thread Gonzalo Herreros
Normally you configure encryption in hdfs so it works automatically, rather than having each tool having to worry about it. Otherwise, you will need to build your own custom sink Regards, Gonzalo On 7 December 2015 at 12:15, Ravi Kiran Aita wrote: > > > Hi, > > > > We are working on a prototyp

Re: Flume agent spoolDir ignorePattern

2015-12-08 Thread Gonzalo Herreros
I'm thinking the groups are not needed and also the asterisc in regex doesn't work like in linux. Try this: ^filedata.*\.log$|^file_post.*\.log$ If it doesn't work, list the full names of the files that aren't ignored but should. You can use a tool like http://myregexp.com/ to test it. Regards,

Re: Kafka Sink, bad distribution of data in the partitions.

2015-12-15 Thread Gonzalo Herreros
Unless you are using a custom partitioner, the DefaultPartitioner assigns them randomly so the content of the headers shouldn't make any difference. The only explanation I can see for what you are seeing is that somehow the producer thinks there are only 2. Are the msgs going just to 0 and 1 or dif

Re: Kafka Sink avro event

2015-12-18 Thread Gonzalo Herreros
Why don't you use a AvroSink and Source to link the tiers? I believe it will preserve the headers. You can still use Kafka as the channel if you want it's reliability Regards, Gonzalo On 17 December 2015 at 20:02, Jean wrote: > Hello, > I have this configuration : > Source agent 1=> channel =>

Re: Multiple spool directory sources

2015-12-18 Thread Gonzalo Herreros
I guess something is wrong with the Spooling sources that don't take the files. Check the Flume log initialization for errors. Or maybe you have some pattern that matches one of the sources files but not the others. Regards, Gonzalo On 18 December 2015 at 14:42, Jeff Alfeld wrote: > I am encou

Re: Flume HTTP source failed

2015-12-21 Thread Gonzalo Herreros
Seems pretty obvious to me. You are using a class that doesn't exist httpagent1.sources.http-source.type = org.apache.flume.source.http.testHTTPSource You should use: httpagent1.sources.http-source.type = org.apache.flume.source.http.HTTPSource which can be shortcutted to just "http" You'll

Re: log4j:WARN

2016-01-12 Thread Gonzalo Herreros
It's a warning that it's not finding the log4j configuration file, the real problem is that probably is not finding any of the other configuration files but you don't see the errors. I think the issue might be with "-c conf", I kinda remember is relative to the bin directory like this "../conf"; if

Re: Installing flume in distributed mode and HA

2016-01-18 Thread Gonzalo Herreros
That Cloudera documentation is ancient and talks about the old Flume (that's why it differs from what you see in the Apache website), the modern Flume (also called Flume-ng) doesn't not have a master, to have HA you need several agents with the same configuration and a load balancer in front. In so

Re: Installing flume in distributed mode and HA

2016-01-18 Thread Gonzalo Herreros
wrote: > Thanks Gonzalo for quick reply. > > By load balancer, do you mean load balancing group of flume agents ? > If yes, I do need to take care of HA for sources, channels and sinks too. > Am I correct ? > > > > Regards, > Ajay > > > On 18-Jan-2016, at 2:5

Re: Two parallel agents from same source to same sink

2016-01-21 Thread Gonzalo Herreros
You can configure rsyslog to do the failover and only send to one of them using "$ActionExecOnlyWhenPreviousIsSuspended on" I think If you can life with an occasional duplicate that should do, otherwise you need something more complex. Regards, Gonzalo On 21 January 2016 at 15:05, Margus Roo wro

Re: Problems performance with FileChannel and HDFS Sink.

2016-02-02 Thread Gonzalo Herreros
I don't know the internal details but I guess all those threads write to a single file, so it will reach a point where there is no improvement. On the other side having multiple sinks will create multiple files, which should scale better but you need to make sure the files are written in different

Re: KafkaSource not picking up any messages

2016-02-04 Thread Gonzalo Herreros
I'm concerned with the warning "no brokers found when trying to rebalance" Double check that the path in zookeeper is correct zk01:2181/mesos-kafka and it's not the standard /kafka When you connect with the kafka-console-consumer, do you specify /mesos-kafka or just zk01:2181? You can use the zkcl

Re: KafkaSource not picking up any messages

2016-02-08 Thread Gonzalo Herreros
opTime" : "0", > "KafkaCommitTimer" : "0", > "Type" : "SOURCE", > "AppendBatchAcceptedCount" : "0", > "EventReceivedCount" : "0", > "OpenConnectionCount" : "0&qu

Re: flume kafka SASL(kerberos) support

2016-02-11 Thread Gonzalo Herreros
1.6 doesn't support kafka kerberos That upgrade is work in progress, maybe there is a nightly built already with it but not a release. Gonzalo On 11 February 2016 at 10:39, manish jaiswal wrote: > Hi, > > > > I am not able to use kafka kerberos security auth via flume 1.6. > > can you please he

Re: HDFS as Source

2016-02-11 Thread Gonzalo Herreros
There is no hdfs source because normally you want to bring data into hadoop (it's possible to have an hdfs source but I don't think anybody had that need) To copy data between hdfs clusters better use "distcp" included in Hadoop Gonzalo On 11 February 2016 at 10:41, manish jaiswal wrote: > Hi,

Re: Flume interseptor and multiplexing

2016-02-16 Thread Gonzalo Herreros
I don't think that's possible without writing/reusing custom code. You would need an interceptor to add the header following the conditions you describe so the multiplexer can do the routing. Gonzalo On 15 February 2016 at 21:06, chandra koripella wrote: > Hi, > > > Is there a way implement mul

Re: Spark suppress INFO messages per Streaming Job

2016-02-24 Thread Gonzalo Herreros
The way I have done that is by having a copy the spark config folder with the updated log4j settings and running the job with the flag that points to that configuration folder. The drawback is that if you change other Spark settings for the cluster, that job won't be updated. I guess other options

Re: Spark Streaming job Slow / Qued after 12 Hours - Spark Bug ? -- HELP please

2016-02-24 Thread Gonzalo Herreros
Assuming that the batches are just slow and queued up (if it makes no progress at all means is something wrong with the job), usually you can improve the speed by increasing the number executors, cores or memory. It's a bit of trial/error plus observing how the job behaves. To avoid the queues the

Re: How to increase NUMBER of Spark Executors ?

2016-02-25 Thread Gonzalo Herreros
Local will only run with one executor, you are specifying 4 cores to be used by the executor That affects the number of taks an executor can run concurrently. Please note this is the Flume distribution list, not the Spark one Gonzalo On 25 February 2016 at 02:22, Sutanu Das wrote: > Community

Re: flume problem

2016-03-02 Thread Gonzalo Herreros
Could it be that you are serializing avro instead of json? On 2 March 2016 at 08:25, Baris Akgun (Garanti Teknoloji) < barisa...@garanti.com.tr> wrote: > Hi, > > > > When I send json data to flume with using http post, flume adds > Co**ntent-Typeapplication/json** for each json post. > > > > In m

Re: flume problem

2016-03-02 Thread Gonzalo Herreros
tter data but in flume channel ı saw content type word > for each tweet. Is it normal ? How can ı send just tweets json without any > content type. I took tweets json from GNIP company. > > Thanks > iPhone'umdan gönderildi > > 2 Mar 2016 tarihinde 10:56 saatinde, Gonzalo H

Re: flume problem

2016-03-02 Thread Gonzalo Herreros
isplayName":"Twitter","link":" > http://www.twitter.com"},"link":"; > http://twitter.com/semagokcee/statuses/642910743302668288","body":"RT > @Hadis_Tweet: \"Kim sabah namazını kılarsa, Allah'ın garantisi > altı

Re: flume problem

2016-03-02 Thread Gonzalo Herreros
; tier1.sinks.sink1.hdfs.rollInterval=60 > > tier1.sinks.sink1.hdfs.rollSize = 268435456 > > tier1.sinks.sink1.hdfs.batchSize = 1 > > tier1.sinks.sink1.hdfs.writeFormat = Text > > tier1.sinks.sink1.serializer = text > > > > tier1.sources.source1.channels = channel1 &g

Re: Flume NG High Availability

2016-03-02 Thread Gonzalo Herreros
Jms queues guarantee only one of the clients will get each message. Unless you build it yourself, Flume doesn't have active/pasive. HA is achieved by having multiple agents running the same configuration. On 2 March 2016 at 18:12, samnik60 . wrote: > Hi guys, > I have the following queries about

Re: Flume NG High Availability

2016-03-02 Thread Gonzalo Herreros
e , if i use a spool directory as source i cannot have > active/active or active/stand by HA , since active/active will result in > race condition when two source try to process from same directory. > > Thanks, > sam > > On Wed, Mar 2, 2016 at 1:17 PM, Gonzalo Herreros > wrot

Re: writing flume to hdfs failed

2016-03-11 Thread Gonzalo Herreros
Looks like the hdfs sink needs to be updated to support the latest Hadoop. In the meanwhile I would use an older client, which probably works in a newer server. Alternatively you can use the Flume branch that Hortonworks compile for 2.7.1 Gonzalo On 11 March 2016 at 03:37, 2402 楊建中 joeyang wrot

Re: Flume High Available Methods

2016-03-31 Thread Gonzalo Herreros
For me the best practice is what the big vendors do and recommend. Your solution is of deploying multiple identical agents sharing the group in the source is fine as long as you have a durable channel and are ok with some messages getting delayed (or even lost) when a node goes down. If you want f

Re: Description about readSmallestOffset property in Kafka channel

2016-04-08 Thread Gonzalo Herreros
I think you are right Gonzalo On 8 April 2016 at 03:39, Jeong-shik Jang wrote: > Hi Flume team and users, > > User Guide document says: > > readSmallestOffset false When set to true, the channel will read all data > in the topic, starting from the oldest event when false, it will read only > ev

Re: Flume kafka channel

2016-04-08 Thread Gonzalo Herreros
You cannot set a serializer in the channel, whatever you put in the topic events will be stored in hdfs so you shouldn't need it If you want to do some parsing then you can implement a sink serializer. Gonzalo On 8 April 2016 at 08:44, Baris Akgun (Garanti Teknoloji) < barisa...@garanti.com.tr> w

Re: Priority and Expiry time of flume events

2016-04-15 Thread Gonzalo Herreros
That would depend on the channel. AFAIK, all the channels provided are FIFO without expiration but technically you could implement a channel that does that. You could achieve some priority management using multiplexing. Gonzalo On 15 April 2016 at 11:38, Ronald Van De Kuil wrote: > Hello, > >

Re: Priority and Expiry time of flume events

2016-04-15 Thread Gonzalo Herreros
iority lane), and if there is a nomatch then it would route > to the default channel. And if I would need more then I would need to make > a code mod, right? > > > 2016-04-15 13:54 GMT+02:00 Gonzalo Herreros : > >> That would depend on the channel. >> AFAIK, all the chan

Re: [ERROR:]Phoenix 4.4 Plugin for Flume 1.5

2016-04-28 Thread Gonzalo Herreros
Seems your agent config file doesn't specify the host/port for the Sink Gonzalo On 28 April 2016 at 11:13, Divya Gehlot wrote: > Hi, > I am trying to move data from hdfs to Phoenix > I downloaded the https://github.com/forcedotcom/phoenix/ > and build the project as per instrunctions in Apache

Re: [ERROR:]Phoenix 4.4 Plugin for Flume 1.5

2016-04-28 Thread Gonzalo Herreros
> Thanks for the help. > I am just a day old to Flume > > Could you please help me which host/port do I need to specify ? > > Thanks, > Divya > > > > On 28 April 2016 at 18:31, Gonzalo Herreros wrote: > >> Seems your agent config file doesn't specif

Re: Flume MorphlineSolrSink

2016-05-18 Thread Gonzalo Herreros
Flume 1.5.0 is pretty old, Why don't you use version 1.6.0 included in CDH, that will ensure the library compatibility. On 18 May 2016 at 08:43, Baris Akgun (Garanti Teknoloji) < barisa...@garanti.com.tr> wrote: > Hi, > > > > I am trying to make real time indexing with using flume 1.5.0 and > mor

Re: [Flume 1.7.0] 'parseAsFlumeEvent = false' works in Kafka channel?

2016-06-02 Thread Gonzalo Herreros
It should work as you say. I wonder how do you know the events are "empty", do you get new lines in the console consumer? Also, the example payload you show looks like avro but not the standard FlumeEvent, can you show us your agent configuration Gonzalo On 2 June 2016 at 12:22, George M. wrote

Re: [Flume 1.7.0] 'parseAsFlumeEvent = false' works in Kafka channel?

2016-06-02 Thread Gonzalo Herreros
Don't see any reason why it shouldn't work. I would try without the morphline and the multiplexing, just to see what you get in the channel and eliminate possible suspects. My feeling is that the channed somehow is not receiving the standard FlumeEvent, it might be something changed in the new unre

Re: [Flume 1.7.0] 'parseAsFlumeEvent = false' works in Kafka channel?

2016-06-02 Thread Gonzalo Herreros
Seems the morphline is transforming the event into one without "body", are you converting the event body into headers? parseAsFlumeEvent only handles the body, the headers are lost. On 2 June 2016 at 16:59, George M. wrote: > With the morphlines and without multiplexing > ===

Re: [Flume 1.7.0] 'parseAsFlumeEvent = false' works in Kafka channel?

2016-06-03 Thread Gonzalo Herreros
The explanation could be clearer but it boils down to this: -If you use parseAsEvent=true (default) Then the events in the channel are avro FlumeEvents. So you have to read/write avro but you have all the object metadata (timestamp, headers, etc) -If you use parseAsEvent=false Then the events

Re: [Flume 1.7.0] 'parseAsFlumeEvent = false' works in Kafka channel?

2016-06-03 Thread Gonzalo Herreros
Morphlines should be ok in both cases but if you disable the FlumeEvents you will lose metadata, so the morthline should be aware of that. You can always put whatever you want in the body (e.g. a json with the headers plus the original body) If metadata is so important to you, it would be better i

Re: Can I write Sink specific interceptor for a single source?

2016-06-05 Thread Gonzalo Herreros
I think what you need is "multiplexing", you can read about it in the user guide Gonzalo On Jun 6, 2016 2:50 AM, "Santoshakhilesh" wrote: > Hi All , > > I have this particular scenario > > Avro Source -> Memory Channel - > Sink1 , Sink2 , Sink 3 > > Now I need to do some changes to original even

Re: Kafka Sink random partition assignment

2016-06-07 Thread Gonzalo Herreros
In your config, the name of the partitioner is missing a T ("Paritioner"), you should be getting an exception and maybe the sink is reverting to partition by key: relay_agent.sinks.activity_kafka_sink.kafka_partitioner.class = org.apache.kafka.clients.producer.internals.RandomParitioner Gonzalo O

Re: Kafka Sink random partition assignment

2016-06-07 Thread Gonzalo Herreros
By default a Kafka producer will choose a random partition, however, I believe the Kafka sink by default partitions of the message key, so if the key is null it won't do a good job. On 7 June 2016 at 09:43, Jason Williams wrote: > Hey Chris, > > Thanks for help! > > Is that a limitation of the F

Re: Kafka Sink Topic was overwritten by Kafka Source Topic

2016-06-12 Thread Gonzalo Herreros
You need an interceptor to update/remove the topic header Gonzalo On Jun 12, 2016 4:57 AM, "lxw" wrote: > Hi,All: > >I use Kafka Source to read events from one Kafka topic and write events > to another Topic with Kafka Sink, > the Kafka Sink topic configuration is not work, flume still write

Re: Flume 1.6 support on Kafka 0.8.2

2016-07-20 Thread Gonzalo Herreros
Apache Flume 1.6 runs on kafka 0.8, what you have is a branch from Cloudera. I would advise against that. The whole point of using a distribution is that you use component versions they have integrated and tested together. Either use kafka 0.9 or don't upgrade to 5.7 until you are ready If you st

Re: Is it a good idea to use Flume Interceptor to process data?

2016-07-28 Thread Gonzalo Herreros
I would avoid doing calculations in the source, that can impact the ingestion and cause timeouts, duplicates, etc. specially for some sources (e.g. http). However, what I have done in the past is having a durable channel, create a custom sink that extends a regular sink and does additional calculat

Re: Is Flume suitable for this use case?

2016-09-12 Thread Gonzalo Herreros
I think you should be able to do it using an Http source with a custom handler that understands SOAP. If not, then you need to create a plugin where you extend the standard http source Regards, Gonzalo On 12 September 2016 at 04:32, chen dong wrote: > Hi guys, > > I am trying to load data from