Ken,

We don't have sth like that now. It shouldn't be too hard to write one
though. You probably need some kind of time-based partitioning to different
files.

Thanks,

Jun


On Fri, May 16, 2014 at 4:40 PM, Carlile, Ken <carli...@janelia.hhmi.org>wrote:

> Hi Jun,
>
> I was wondering if there was something out there already. GPFS appears to
> the OS as local filesystem, so if there was a consumer that dumped to local
> filesystem, we'd be gold.
>
> Thanks,
> --Ken
>
> On May 16, 2014, at 7:04 PM, Jun Rao <jun...@gmail.com> wrote:
>
> > You probably would have to write a consumer app to dump data in binary
> form
> > to GPFS or NFS, since the HDFS api is very special.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, May 16, 2014 at 8:17 AM, Carlile, Ken <carli...@janelia.hhmi.org
> >wrote:
> >
> >> Hi all,
> >>
> >> Sorry for the possible repost--hadn't seen this in the list after 18
> hours
> >> and figured I'd try again....
> >>
> >> We are experimenting as using Kafka as a midpoint between microscopes
> and
> >> a Spark cluster for data analysis. Our microscopes almost universally
> use
> >> Windows machines for acquisition (as do most scientific instruments),
> and
> >> our compute cluster (which runs Spark among many other things) runs
> Linux.
> >> We use Isilon for file storage primarily, although we also have a GPFS
> >> cluster for HPC.
> >>
> >> We have a working http post system going into Kafka from the Windows
> >> acquisition machine, which is performing more reliably and faster than
> an
> >> SMB connection to the Isilon or GPFS clusters. Unfortunately, the Spark
> >> streaming consumer is much slower than reading from disk (Isilon or
> GPFS)
> >> on the Spark cluster.
> >>
> >> My proposal would be to not only improve the Spark streaming, but also
> to
> >> have a consumer (or multiple consumers!) that writes to disk, either
> over
> >> NFS or "locally" via a GPFS client.
> >>
> >> As I am a systems engineer, I'm not equipped to write this, so I'm
> >> wondering if anyone has done this sort of thing with Kafka before. I
> know
> >> there are HDFS consumers out there, and our Isilons can do HDFS, but the
> >> implementation on the Isilon is very limited at this time, and the
> ability
> >> to write to local filesystem or NFS would give us much more flexibility.
> >>
> >> Ideally, I would like to be able to use Kafka as a high speed transfer
> >> point between acquisition instruments (usually running Windows) and
> several
> >> kinds of storage, so that we could write virtually simultaneously to
> >> archive storage for the raw data and to HPC scratch for data analysis,
> >> thereby limiting the penalty incurred from data movement between storage
> >> tiers.
> >>
> >> Thanks for any input you have,
> >>
> >> --Ken
>
>

Reply via email to