Re: Going it alone.

2020-04-15 Thread Matt Smith
This is so entertaining. 1. Ask for help 2. Compare those you need help from to a lower order primate. 3. Claim you provided information you did not 4. Explain that providing any information would be "too revealing" 5. ??? Can't wait to hear what comes next, but please keep it up. This is a brig

Grouping into Arrays

2016-10-24 Thread Matt Smith
I worked up the following for grouping a DataFrame by a key and aggregating into arrays. It works, but I think it is horrible. Is there a better way? Especially one that does not require RDDs? This is a common pattern we need as we often want to explode JSON arrays, do something to enrich the

Iterative mapWithState

2016-08-30 Thread Matt Smith
Is is possible to use mapWithState iteratively? In other words, I would like to keep calling mapWithState with the output from the last mapWithState until there is no output. For a given minibatch mapWithState could be called anywhere from 1..200ish times depending on the input/current state.

Spark Streaming batch sequence number

2016-08-29 Thread Matt Smith
Is it possible to get a sequence number for the current batch (ie. first batch is 0, second is 1, etc?).