Ken, We don't have sth like that now. It shouldn't be too hard to write one though. You probably need some kind of time-based partitioning to different files.
Thanks, Jun On Fri, May 16, 2014 at 4:40 PM, Carlile, Ken <carli...@janelia.hhmi.org>wrote: > Hi Jun, > > I was wondering if there was something out there already. GPFS appears to > the OS as local filesystem, so if there was a consumer that dumped to local > filesystem, we'd be gold. > > Thanks, > --Ken > > On May 16, 2014, at 7:04 PM, Jun Rao <jun...@gmail.com> wrote: > > > You probably would have to write a consumer app to dump data in binary > form > > to GPFS or NFS, since the HDFS api is very special. > > > > Thanks, > > > > Jun > > > > > > On Fri, May 16, 2014 at 8:17 AM, Carlile, Ken <carli...@janelia.hhmi.org > >wrote: > > > >> Hi all, > >> > >> Sorry for the possible repost--hadn't seen this in the list after 18 > hours > >> and figured I'd try again.... > >> > >> We are experimenting as using Kafka as a midpoint between microscopes > and > >> a Spark cluster for data analysis. Our microscopes almost universally > use > >> Windows machines for acquisition (as do most scientific instruments), > and > >> our compute cluster (which runs Spark among many other things) runs > Linux. > >> We use Isilon for file storage primarily, although we also have a GPFS > >> cluster for HPC. > >> > >> We have a working http post system going into Kafka from the Windows > >> acquisition machine, which is performing more reliably and faster than > an > >> SMB connection to the Isilon or GPFS clusters. Unfortunately, the Spark > >> streaming consumer is much slower than reading from disk (Isilon or > GPFS) > >> on the Spark cluster. > >> > >> My proposal would be to not only improve the Spark streaming, but also > to > >> have a consumer (or multiple consumers!) that writes to disk, either > over > >> NFS or "locally" via a GPFS client. > >> > >> As I am a systems engineer, I'm not equipped to write this, so I'm > >> wondering if anyone has done this sort of thing with Kafka before. I > know > >> there are HDFS consumers out there, and our Isilons can do HDFS, but the > >> implementation on the Isilon is very limited at this time, and the > ability > >> to write to local filesystem or NFS would give us much more flexibility. > >> > >> Ideally, I would like to be able to use Kafka as a high speed transfer > >> point between acquisition instruments (usually running Windows) and > several > >> kinds of storage, so that we could write virtually simultaneously to > >> archive storage for the raw data and to HPC scratch for data analysis, > >> thereby limiting the penalty incurred from data movement between storage > >> tiers. > >> > >> Thanks for any input you have, > >> > >> --Ken > >