Nick, Have you tried https://github.com/kaitoy/pcap4j
I’ve used this in a Spark app already and didn’t have any issues. My use case was slightly different than yours, but you should give it a try. From: Nick Allen <[email protected]<mailto:[email protected]>> Date: Friday, January 16, 2015 at 10:09 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: How to 'Pipe' Binary Data in Apache Spark I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe that binary data to an external program that will translate it to string/text data. Unfortunately, it seems that Spark is mangling the binary data before it gets passed to the external program. This code is representative of what I am trying to do. What am I doing wrong? How can I pipe binary data in Spark? Maybe it is getting corrupted when I read it in initially with 'textFile'? bin = sc.textFile("binary-data.dat") csv = bin.pipe ("/usr/bin/binary-to-csv.sh") csv.saveAsTextFile("text-data.csv") Specifically, I am trying to use Spark to transform pcap (packet capture) data to text/csv so that I can perform an analysis on it. Thanks! -- Nick Allen <[email protected]<mailto:[email protected]>>
