Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734634 It is hard to say what threshold to use. I couldn't think of a use case that requires a large window size, but I cannot say there is none. Another possible approach is to pass all parent partitions to SlidingRDDPartition and then retrieve the tail to append in compute(). If we find we need to scan many partitions to assemble the tail, we send a warning message. I'm not sure whether this would be more efficient than the current implementation.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---