Hello,

Let me share my recent works below:
https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow

This standalone command-line tool allows to capture network packets
from network interface devices,
and convert them into Apache Arrow data format according to the
pre-defined data schema for each
supported protocol (TCP, UDP, ICMP x IPv4, IPv6), then write out the
destination files.

It internally uses PF_RING [*1] to support fast network interface card
(> 10Gb), and to minimize
packet losses by utilization of multi-core CPUs.
Even though I confirmed that Pcap2Arrow write out the captured network
packets more than
50Gb/s ratio, my test cases are artificial and biased traffic patterns.
If you can test the software on your environment, it makes sense to
improve the software.
[*1] https://www.ntop.org/products/packet-capture/pf_ring/

As you may know, network traffic data tends to grow so large, thus, it
is not easy to import
them into database systems for analytics. Once we can convert them
into Apache Arrow,
we don't need to import the captured data again. Just map the files
prior to analytics.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

Reply via email to