Hi Jonathan,

>From these elements, it might be the segfault is related to the info
messages you receive - both would boil down to some spurious packets
being generated by nfprobe. The best way to start debugging this is
to make a capture, in libpcap format, of the NetFlow packets hitting
the collector so that i can replay them in lab. If feasible for you,
on the collector box you can generate the capture as follows:

shell> tcpdump -i <interface to listen> -s 0 -n -w jthorpe_netflow_trace.pcap 
port 2101

Then mail me privately the jthorpe_netflow_trace.pcap file. You can
stop the trace after a couple of occurrences of the info message or,
even better (but depending how large the capture file becomes), once
the collector crashes.

Cheers,
Paolo 

On Thu, May 01, 2014 at 03:24:27AM +0000, Jonathan Thorpe wrote:
> Hi All,
> 
> I have nfacctd 1.5.0rc2 collecting NetFlow v9 flows from a pair of pmacctd 
> processes which send their flows to nfacctd.
> 
> Every so often, I observe segmentation faults in nfacctd requiring me to 
> restart the daemon.
> 
> According to gdb, the issue is happening here (consistently):
> 
> ----
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `nfacctd: Core Process [default]                        
>       '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x000000000041f89c in process_v9_packet (pkt=0x80005934b00d <Address 
> 0x80005934b00d out of bounds>, 
>     pkt@entry=0x7fff5934ae40 "\t", len=len@entry=508, 
> pptrsv=pptrsv@entry=0x7fff59339580, req=req@entry=0x7fff59338f00, version=9)
>     at nfacctd.c:1197
> (gdb) info locals
> hdr_v9 = 0x7fff5934ae40
> hdr_v10 = 0x7fff5934ae40
> template_hdr = <optimized out>
> opt_template_hdr = <optimized out>
> tpl = <optimized out>
> data_hdr = 0x80005934b00d
> pptrs = 0x7fff59339580
> fid = <optimized out>
> off = 461
> flowoff = <optimized out>
> flowsetlen = <optimized out>
> direction = 38272
> FlowSeqInc = 1
> HdrSz = <optimized out>
> SourceId = <optimized out>
> FlowSeq = <optimized out>
> (gdb) info args
> pkt = 0x80005934b00d <Address 0x80005934b00d out of bounds>
> len = 508
> pptrsv = 0x7fff59339580
> req = 0x7fff59338f00
> version = 9
> (gdb)
> ----
> 
> I'm not an expert at understanding the gdb output, but would be happy to 
> provide the gdb output if anyone would like to have a look.
> 
> It's not clear if these are in some way related to these messages, which are 
> frequently seen in the nfacct log (but appear harmless):
> 
> ----
> May 01 03:15:46 INFO: unable to read next Data Flowset (incomplete NetFlow 
> v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 
> May 01 03:15:53 INFO: unable to read next Data Flowset (incomplete NetFlow 
> v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 
> May 01 03:16:11 INFO: unable to read next Data Flowset (incomplete NetFlow 
> v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462
> ----
> 
> There are two pmacct (1.5.0rc2) instances serving as nfprobes that comprise 
> the following configuration. The configs are the same, but have a different 
> nfprobe_engine (0:1 and 0:2) for each one.
> 
> ---
> ! pmacctd configuration
> daemonize: true
> pidfile: /var/run/pmacctd.eth2.pid
> ! syslog: daemon
> logfile: /var/log/pmacct/pmacctd.eth2.log
> 
> interface: eth2
> 
> plugins: nfprobe[probe]
> !
> nfprobe_version: 9
> nfprobe_receiver: 127.0.0.1:2100
> nfprobe_source_ip: 127.0.0.1
> nfprobe_direction[probe]: tag
> nfprobe_engine[probe]: 0:2
> 
> !plugin_buffer_size: 819200
> !plugin_pipe_size: 1638400000
> 
> plugin_buffer_size: 16384 
> plugin_pipe_size: 32768000
> 
> !
> aggregate: dst_host, src_host, src_mac, dst_mac, vlan, proto, dst_port, 
> src_port, tag
> !
> pre_tag_map: /etc/pmacct/pretag.map
> refresh_maps: true
> pre_tag_map_entries: 3840
> --- 
> 
> The nfacct collector (that shows the above warnings and segfaults) contains 
> the following config:
> 
> ----
> ! nfacctd configuration
> daemonize: true
> debug: false
> pidfile: /var/run/nfacctd.collector.pid
> ! syslog: daemon
> logfile: /var/log/pmacct/nfacctd.collector.log
> 
> ! Listen locally only
> nfacctd_ip: 127.0.0.1
> nfacctd_port: 2101
> 
> nfacctd_time_new: true
> 
> plugins: mysql[inbound], mysql[outbound]
> 
> sql_optimize_clauses: true
> 
> ! Tables for traffic accounting
> aggregate[inbound]: src_mac, dst_mac, vlan, tag, tag2, dst_host
> aggregate[outbound]: src_mac, dst_mac, vlan, tag, tag2, src_host
> 
> sql_table[inbound]:  acct_v8_5m_in
> sql_table[outbound]:  acct_v8_5m_out
> 
> sql_history_roundoff[inbound]: m
> sql_history_roundoff[outbound]: m
> 
> sql_history[inbound]: 5m
> sql_refresh_time[inbound]: 300
> sql_history[outbound]: 5m
> sql_refresh_time[outbound]: 300
> 
> sql_dont_try_update[inbound]: true
> sql_dont_try_update[outbound]: true
> sql_multi_values[inbound]: 1024000
> sql_multi_values[outbound]: 1024000
> 
> ! End tables for traffic accounting
> 
> !plugin_buffer_size: 819200
> !plugin_pipe_size: 1638400000
> 
> !plugin_buffer_size: 8192
> !plugin_pipe_size: 16384000
> 
> plugin_buffer_size: 163840
> plugin_pipe_size: 32768000
> 
> pre_tag_map: /etc/pmacct/pretag-netflow.map
> 
> pre_tag_filter[inbound]: 1
> pre_tag_filter[outbound]: 2
> 
> refresh_maps: true
> pre_tag_map_entries: 3840
> 
> sql_host: localhost
> sql_user: <removed>
> sql_db: <removed>
> sql_passwd: <removed>
> 
> ! in case of emergency, log to this file
> sql_recovery_logfile[inbound]: /var/lib/pmacct/recovery-in_log
> sql_recovery_logfile[outbound]: /var/lib/pmacct/recovery-out_log
> ----
> 
> This is running from a Debian 7.4 server.
> 
> Does anyone have any thoughts as to why we might be seeing nfacctd segfault 
> occasionally and also the occasional "unable to read next Data Flowset" 
> messages?
> 
> Kind Regards,
> Jonathan
> 
> _______________________________________________
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to