spark-submit reset JVM

2016-03-20 Thread Udo Fholl
Hi all, Is there a way for spark-submit to restart the JVM in the worker machines? Thanks. Udo.

More than one StateSpec in the same application

2016-02-15 Thread Udo Fholl
Hi all, Does StateSpec have their own state or the state is per stream, thus all StateSpec over the same stream will share the state? Thanks. Best regards, Udo.

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

2016-02-06 Thread Udo Fholl
thod) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157) On Fri, Feb 5, 2016 at 10:44 AM, Udo Fholl wrote: > It does not look like. Here is the output of "grep -A2 -i waiting > spark_tdump.log" > > "RMI TCP Co

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

2016-02-05 Thread Udo Fholl
n prio=5 tid=20 WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) -- "JobGenerator" daemon prio=5 tid=82 WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

2016-02-04 Thread Udo Fholl
not push data into > BlockManager, there may be something wrong in BlockGenerator. Could you > share the top 50 objects in the heap dump and the thread dump? > > > On Wed, Feb 3, 2016 at 9:52 AM, Udo Fholl wrote: > >> Hi all, >> >> I recently migrated from 'updateSta

Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

2016-02-03 Thread Udo Fholl
Hi all, I recently migrated from 'updateStateByKey' to 'mapWithState' and now I see a huge increase of memory. Most of it is a massive "BlockGenerator" (which points to a massive "ArrayBlockingQueue" that in turns point to a huge "Object[]"). I'm pretty sure it has to do with my code, but I barel

Spark Streaming: Dealing with downstream services faults

2016-02-03 Thread Udo Fholl
Hi all, I need to send to an external service the result of our aggregations. I need to make sure that these results are actually sent. My current approach is to send them in an invocation of "foreachRDD". But how that is going to work with failures? Should I instead use "mapWithState" then "tra

Spark streaming archive results

2016-02-03 Thread Udo Fholl
Hi, I want to aggregate on a small window and send downstream every 30 secs. But I would also like to store in our archive the outcome every 20min. My current approach (simplified version) is: val stream = // val statedStream = stream.mapWithState(stateSpec) val archiveStream = statedStrea

Re: mapWithState: remove key

2016-02-01 Thread Udo Fholl
* Check with `exists()` whether the state exists or not before calling > `get()`. >* >* @throws java.util.NoSuchElementException If the state does not exist. >*/ > > > > > On Fri, Jan 29, 2016 at 10:45 AM, Udo Fholl > wrote: > >> Hi, >> >>

mapWithState: remove key

2016-01-29 Thread Udo Fholl
Hi, >From the signature of the "mapWithState" method I infer that by returning a "None.type" (in Scala) the key is removed from the state. Is that so? Sorry if it is in the docs, but it wasn't entirely clear to me. I'm chaining operations and calling "mapWithState" twice (one to consolidate, then

mapWithState: multiple operations on the same stream

2016-01-29 Thread Udo Fholl
Hi, I have a stream which I need to process events and send them to another service and then remove the key from the state. I'm storing state because I some events are delayed. My current approach is to consolidate the state, store it with a mapWithState invocation. Then rather than using a forea