Re: How to execute a function from class in distributed jar on each worker node?

2014-11-11 Thread aaronjosephs
I'm not sure that this will work but it makes sense to me. Basically you write the functionality in a static block in a class and broadcast that class. Not sure what your use case is but I need to load a native library and want to avoid running the init in mapPartitions if it's not necessary (just

Re: Keep state inside map function

2014-07-30 Thread aaronjosephs
use mapPartitions to get the equivalent functionality to hadoop -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Keep-state-inside-map-function-tp10968p10969.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming with long batch / window duration

2014-07-21 Thread aaronjosephs
So I think I may end up using hourglass (https://engineering.linkedin.com/datafu/datafus-hourglass-incremental-data-processing-hadoop) a hadoop framework for incremental data processing, it would be very cool if spark (not streaming ) could support something like this -- View this message in co

Re: Spark Streaming with long batch / window duration

2014-07-18 Thread aaronjosephs
Unfortunately for reasons I won't go into my options for what I can use are limited, it was more of a curiosity to see if spark could handle a use case like this since the functionality I wanted fit perfectly into the reduceByKeyAndWindow frame of thinking. Anyway thanks for answering. -- View t

Re: NullPointerException When Reading Avro Sequence Files

2014-07-18 Thread aaronjosephs
I think you probably want to use `AvroSequenceFileOutputFormat` with `newAPIHadoopFile`. I'm not even sure that in hadoop you would use SequenceFileInput format to read an avro sequence file -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerExceptio

Spark Streaming with long batch / window duration

2014-07-18 Thread aaronjosephs
Would it be a reasonable use case of spark streaming to have a very large window size (lets say on the scale of weeks). In this particular case the reduce function would be invertible so that would aid in efficiency. I assume that having a larger batch size since the window is so large would also l

Re: Difference among batchDuration, windowDuration, slideDuration

2014-07-16 Thread aaronjosephs
The only other thing to keep in mind is that window duration and slide duration have to be multiples of batch duration, IDK if you made that fully clear -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDurati