Hi Israel,

sorry for the delay. I tried your suggestion but still does not work. I have 
notice that if I do not specify the input/output encoding, the error is the 
same(always stops in the same event cutting it at the same character and stop 
of processing the rest of the file). However, comparing the resulting file with 
the one that we get when specifying enconding we have note that there are some 
differences. Specifically, the are some events that are spllited into two 
events because some break line is introduced(this happens when specifying the 
encoding). It looks like our files are not UTF-8 but the OS recognize them as 
UTF-8(some of them have BOM and others not). However, flume does not recognize 
them as UTF-8 because some weird character.

Thanks for your help, any other suggestion will be very appreciated.



De: Israel Ekpo <[email protected]<mailto:[email protected]>>
Responder a: Flume User List 
<[email protected]<mailto:[email protected]>>
Fecha: martes, 27 de agosto de 2013 17:53
Para: Flume User List <[email protected]<mailto:[email protected]>>
Asunto: Re: Events being cut by flume

The default value for the available memory specified in 
$FLUME_HOME/bin/flume-ng is very small (20MB)

So, in your $FLUME_HOME/conf/flume-env.sh file

Try increasing your Java memory to a higher number (at most 50% of the 
available RAM)
JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MaxPermSize=4096m"

Then, in your agent configuration file:

Increase the maximum number of lines per event to a much higher number (like 
5000).

Also change the output encoding to UTF-8

Let's make sure that the input encoding matches the encoding of the original 
event. This can cause problems if it is not the right one.

Let's see if these changes make a difference.


Author and Instructor for the Upcoming Book and Lecture Series
Massive Log Data Aggregation, Processing, Searching and Visualization with Open 
Source Software
http://massivelogdata.com


On 27 August 2013 11:13, ZORAIDA HIDALGO SANCHEZ 
<[email protected]<mailto:[email protected]>> wrote:
Hi Israel,

thanks for your response. We already checked this, doing :set list with vi 
editor our events look like this:

"line1field1";"line1field2";"line1fieldN"$
"lineNfield1";"lineNfield2";"lineNfieldN"$

There are not event delimiters($) between fields of an event.
I have tried forcing the encoding(because I believe this files, that are 
generated by our customer, are converted from ascii to utf-8 by BOM and they 
could contain characters with more bytes that the expected one):

agent.sources.rpb.inputCharset = UTF-16
agent.sources.rpb.deserializer.maxLineLength = 250
agent.sources.rpb.deserializer.outputCharset = UTF-16

but if i use a maxLineLenght of this size(250) then lot of events are 
truncated(event the max characters per line are 250):
13/08/27 17:03:34 WARN serialization.LineDeserializer: Line length exceeds max 
(250), truncating line!

if I take a look into the generated file, there are unrecognized chacarters: �� 
and events have been cut in a random way(there are lines with only 3 
characters).

I have tried increasing the maxLineLenght parameter but I end getting a java 
heap space exception :(

Again, thanks. Any help will be very appreciated.



De: Israel Ekpo <[email protected]<mailto:[email protected]>>

Responder a: Flume User List 
<[email protected]<mailto:[email protected]>>
Fecha: martes, 27 de agosto de 2013 16:29

Para: Flume User List <[email protected]<mailto:[email protected]>>
Asunto: Re: Events being cut by flume

Hello Zoraida,

What sources are you events coming from?

I have a feeling they are coming from SpoolingDirectory and the events contains 
newline characters (even delimiter).

If this is the case, you are going to see the events split up whenever the 
parser encounters the delimiter.


Author and Instructor for the Upcoming Book and Lecture Series
Massive Log Data Aggregation, Processing, Searching and Visualization with Open 
Source Software
http://massivelogdata.com


On 27 August 2013 06:20, ZORAIDA HIDALGO SANCHEZ 
<[email protected]<mailto:[email protected]>> wrote:

Hello,

I am having some weird problem while processing events coming from a file with 
this format:
UTF-8 Unicode (with BOM) English text, with CRLF line terminators

Some of the events in the file contain this text: "Marés". While some events 
are sent correctly without begin cut by flume, there are others that arrive 
incomplete. And even more, the process of sending more events (once one event 
has been cut) stops. We end with incomplete files on HDFS. We have isolate the 
problem: trying with roll file sink instead of HDFS , removing all the 
interceptors, etc. However, we still have the same problem. Apparently, the 
troublesome event does not have any hide weird character and files are 
generated automatically so we would expect that if some malformed input comes 
from one event, it would come for the others too.

We really appreciate any hint that you could give us.

Thanks.



________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx


________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx


________________________________

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Reply via email to