Re: [Ntop-misc] Few general question on using nprobe as a collector with Kafka

Luca Deri Wed, 13 Dec 2017 02:42:07 -0800

Hi Mark
please see below, but first of all please move to 8.2 as we have fixes
many issues and many improvements in particular when collecting flows
https://www.ntop.org/category/nprobe/


On 12/12/2017 06:09 PM, Mark Petronic wrote:
> I am fairly new to nprobe and have been experimenting with the many
> commandline options. I have a few general questions that I would
> appreciate any clarification. 
>
> nprobe -v
>
> Welcome to nProbe v.8.0.171020 (r5797) for x86_64-unknown-linux-gnu
> with native PF_RING acceleration.
> Copyright 2002-17 ntop.org <http://ntop.org>
>
> Build OS:      CentOS Linux release 7.3.1611 (Core)
> SystemID:      68A2B43E76056A7E
> GIT rev:     
>  8.0-stable:478c52c6ce70feaf6c65fe4806be05f75fe0e196:20171020
> License:       Invalid nProbe license (/etc/nprobe.license) [Missing
> license file]
>
>
> Q1. When running on a multi-core host, will nprobe utilize all cores.
> Somewhere, I thought I saw something about it being single threaded
> but now cannot find that reference. This question goes to sizing my
> HW. I am seeing ~5% CPU load for one router's flow (about 2500 flow
> records/sec). I will ultimately need more than 20x this volume so I
> need to deploy N hosts eventually in full production setup. I just
> want to know if there are any settings needed to enable nprobe to
> fully utilize all cores on a given host.

nprobe will use one core because if you use RSS you can spawn an
instance per core. From our tests in collection mode we should be able
to handle ~20k flows/core
>
> Q2. I am running with this configuration:
>
> [root@vmwdnacollector01 ~]# cat /etc/nprobe/nprobe.conf 
> --interface=none
> --collector=none
> --collector-port=2055
> --verbose=1 
> --flow-version=9 
> --hash-size=262144
> --kafka="kafka01:9092;netflow-raw;1"
> --dump-stats=/var/log/nprobe/stats.txt
> --event-log=/var/log/nprobe/events.txt
> -T="%IPV4_SRC_ADDR %IPV4_DST_ADDR %L4_SRC_PORT %L4_DST_PORT
> %IPV4_SRC_MASK %IPV4_DST_MASK %IPV4_NEXT_HOP %IN_PKTS %IN_BYTES
> %OUT_PKTS %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %TCP_FLAGS
> %PROTOCOL %SRC_TOS %DIRECTION %EXPORTER_IPV4_ADDRESS"
>
> I am collecting netflow V9 records from a Cisco router. I was sort of
> expecting that the record would include the IP address of the router
> because I need that to know where the data came from for upstream
> enrichment.

> I have nprobe publishing to Kafka. But, looking at the raw flows
> coming from the router, there is no field that identifies the router
> IP. So, I experimented and added a -T <template> definition that
> matches the actual fields coming from the router. Then I added
> the %EXPORTER_IPV4_ADDRESS field (which is NOT in the raw record from
> the router) and voila, the IP address of the router shows up in that
> field. So, I assume that nprobe is simply adding the source IP address
> of each incoming flow record into that field, as well as mapping each
> field in the incoming flow record into the matching field in my
> defined template - sort of "cherry picking" the fields out of the
> source record and packing them into my template.
>
> So, my question on this point is, am I doing this correctly with
> defining my own template? Seems like the only way I can figure it out.
This is correct. But in >= 8.2 this is done automatically when using
ZMQ, but we'll also extend to kafka as of this email
>
> Q3. It appears, for the mode I am operating in, that no license is
> required to allow this to work. When I run in the mode where nprobe
> sniffs packets from my local interface, it will only produce 25K flows
> then stops if there is no license. However, in collector mode, where
> it just receives flows from a router and forwards them as JSON to
> Kafka, it runs for millions of flows. So, question here is, do I need
> a license for this sort of use case?
>
yes you need a license

> Q4. The Kafka producer has a boat load of configuration options but
> nprobe only exposes a couple basic options (topic, acks, brokers). Is
> that it or is there some way to provide additional configuration
> information to the embedded producer? For example, to properly
> aggregate data flows, I would like to partition the topic on the
> IPV4_SRC_ADDR. I am running in a multi-tenant environment where each
> tenant can have overlapping private IP addresses that we see in the
> flows. So, I need to aggregate the flows by TENANT_ID + IPV4_SRC_ADDR,
> for example. I see no way to configure this with nprobe + kafka mode.

This is not possible, but Simone is the kafka expert: If you can agree
on what type of extensions are needed, we'll implement them
>
> Q5. Is there any way to bind nprobe to specific interface when used as
> a collector in my use case? Meaning, I might need to run multiple
> instances on a single host but I want to be able to configure routers
> to direct their flow records to a specific IP address so that I can
> load-balance the flows over N instances of nprobe running on a single
> host. I cannot find any configuration option that will bind the UDP
> listening port to a specific interface on a single host.

The -n option supports the format IP:port. So you would need this also
for the -3 option, correct?

Regards Luca

>
> Thanks for any insights into my questions.
>  
>  
>
>
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] Few general question on using nprobe as a collector with Kafka

Reply via email to