Storage capable of keeping up with 10G/20G packet capture doesn't have to be extremely expensive...
We build this with a commodity host, multiple 10G, multiple SAS HBAs each attached to a JBOD enclosure of at least 36 4TB 7.2k commodity sata3 disks. In our configuration, this delivers 58 TB per JBOD enclosure. Properly tuned, and with a little commodity SSD cache, it delivers synchronous sequential reads and writes over 2.5GB/sec, (and incredible random speeds which I can't recall off the top of my head) and all for under $25k. It could yield less or much more, depending on your redundancy/striping choices. Run out of room? Fill another JBOD shelf for ~18k. You could opt for lower parity than we did, or fewer stripes. Either one would stretch the space out by quite a bit. (At least 20 TB.) I didn't want to be constantly changing drives out, however. On Apr 13, 2012 1:46 AM, "Jimmy Hess" <mysi...@gmail.com> wrote: > On Thu, Apr 12, 2012 at 4:18 PM, Ian McDonald <i...@st-andrews.ac.uk> > wrote: > > You'll need to build an array that'll random read/write upwards of > 200MB/s if you > > want to get a semi-reliable capture to disk. That means SSD if you're > very rich, or many spindles > > Hey, Saving packet captures to file is a ~98% asynchronous write, 2% > read; ~95% sequential activity. And maybe you think about applying > some variant of header compression to the packets during capture, to > trade a little CPU and increased RAM requirements for storage > efficiency. > > The format used by PCAP and saving raw packet header bits directly to > disk is not necessarily among the most I/O or space efficient on disk > storage formats to pick. > > > Random writes should only occur if you are saving your captures to a > fragmented file system, which is not recommended; avoiding > fragmentation is important. Random reads aren't involved for > archiving data, only for analyzing it. > > Do you make random reads into your saved capture files? Possibly > you're more likely to be doing a sequential scan, even during > analysis; random reads imply you have already indexed a dataset and > you are seeking a smaller number of specific records, to collect > information about them. > > Read requirements are totally dependent on your analysis workload, > e.g. Table scan vs Index search. Depending on what the analysis is, > it may make sense to even make extra filtered copies of the data, > using more disk space, in order to avoid a random access pattern. > > If you are building a database of analysis results from raw data, you > can and use a separate random IO optimized disk subsystem for the > stats database. > > > If you really need approximately 200 MB/s with some random read > performance for analysis, you should probably be looking at building > a RAID50 with several 4-drive sets and 1gb+ of writeback cache. > > RAID10 makes more sense in situations where write requirements are not > sequential, when external storage is actually shared with multiple > applications, or when there is a requirement for a disk drive failure > to be truly transparent, but there is a huge capacity sacrifice in > choosing mirroring over parity. > > > There is a Time vs Cost tradeoff with regards to the analysis of the data. > > When your 'analysis tools' start reading data, the reads increase > the disk access time, and therefore reduce write performance; > therefore the reads should be throttled, the higher the capacity the > disk subsystem, the higher the cost. > > > Performing your analysis ahead of time via pre-caching, or at least > indexing newly captured data in small chunks on a continuous basis may > be useful, to minimize the amount of searching of the raw dataset > later. A small SSD or separate mirrored drive pair for that > function, would avoid adding load to the "raw capture storage" > disk system, if your analysis requirements are amenable to that > pattern. > > Modern OSes cache some recent filesystem data in RAM. So if the > server capturing data has sufficient SDRAM, analyzing data while > it's still hot in the page cache, and saving that analysis in an > efficient index for later use, can be useful. > > >(preferably 15k's) in a stripe/ raid10 if you're building from your scrap > pile. Bear in mind that write >cache won't help you, as the io isn't going > to be bursty, rather a continuous stream. > > Not really... A good read cache is more important for the analysis, > but Efficient write cache on your array and OS page cache is still > highly beneficial, especially because it can ensure that your RAID > subsystem is performing full stripe writes, for maximal efficiency of > sequential write activity, and it can delay the media write until the > optimal moment based on platter position, and sequence the read/write > requests; > > as long as the performance of the storage system behind the cache is > such that the storage system can on average successfully drain the > cache at a faster rate than you can fill it with data a sufficient > amount of the time, the write cache serves an important function. > > > Your I/O may be a continuous stream, but there are most certainly > variations and spikes in the rate of packets and the performance of > mechanical disk drives. > > > > Aligning your partitions with the physical disk geometry can produce > surprising speedups, as can >stripe block size changes, but that's > generally empirical, and depends on your workload. > > > For RAID systems partitions should absolutely be aligned if the OS > install defaults don't align them correctly; on a modern OS, the > defaults are normally OK. Having an unaligned or improperly aligned > partition is just a misconfiguration; A track crossing for every > other sector read is an easy way of doubling the size of small I/Os. > > You won't notice with this particular use case when you are writing > large blocks, you're writing a 100Mb chunk, asynchronously, you > won't notice a 63kB difference, it's less than .0001% of your > transfer size; this is primarily a concern during analysis or > database searching which may involve small random reads and small > synchronous random writes. > > In other words, you will probably get away just ignoring partition > alignment and filesystem block size, > so there are other aspects of the configuration to be more concerned about > (YMMV). > > -- > -JH > >