need help with simple http request mapper

2014-12-21 Thread kmatzen
I have what I think is a pretty simple task and one that works pretty well with Celery . I wanted to see how easy it was to configure for Spark since I already run a Mesos cluster for something else. But I had a pretty hard time getting Spark configured so that i

JVM heap and native allocation questions

2014-08-19 Thread kmatzen
I'm trying to use Spark to process some data using some native function's I've integrated using JNI and I pass around a lot of memory I've allocated inside these functions. I'm not very familiar with the JVM, so I have a couple of questions. (1) Performance seemed terrible until I LD_PRELOAD'ed l

Re: OpenCV + Spark : Where to put System.loadLibrary ?

2014-08-19 Thread kmatzen
Reviving this thread hoping I might be able to get an exact snippet for the correct way to do this in Scala. I had a solution for OpenCV that I thought was correct, but half the time the library was not loaded by time it was needed. Keep in mind that I am completely new at Scala, so you're going

s3:// sequence file startup time

2014-08-16 Thread kmatzen
I have some RDD's stored as s3://-backed sequence files sharded into 1000 parts. The startup time is pretty long (~10's of minutes). It's communicating with S3, but I don't know what it's doing. Is it just fetching the metadata from S3 for each part? Is there a way to pipeline this with the com

No space left on device

2014-08-08 Thread kmatzen
I need some configuration / debugging recommendations to work around "no space left on device". I am completely new to Spark, but I have some experience with Hadoop. I have a task where I read images stored in sequence files from s3://, process them with a map in scala, and write the result back