Hello,

This is about SPARK-3276 and I want to make MIN_REMEMBER_DURATION (that is
now a constant) a variable (configurable, with a default value). Before
spending effort on developing something and creating a pull request, I
wanted to consult with the core developers to see which approach makes most
sense, and has the higher probability of being accepted.

The constant MIN_REMEMBER_DURATION can be seen at:


https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L338

it is marked as private member of private[streaming] object
FileInputDStream.

Approach 1: Make MIN_REMEMBER_DURATION a variable, with a new name of
minRememberDuration, and then  add a new fileStream method to
JavaStreamingContext.scala :


https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala

such that the new fileStream method accepts a new parameter, e.g.
minRememberDuration: Int (in seconds), and then use this value to set the
private minRememberDuration.


Approach 2: Create a new, public Spark configuration property, e.g. named
spark.rememberDuration.min (with a default value of 60 seconds), and then
set the private variable minRememberDuration to the value of this Spark
property.


Approach 1 would mean adding a new method to the public API, Approach 2
would mean creating a new public Spark property. Right now, approach 2
seems more straightforward and simpler to me, but nevertheless I wanted to
have the opinions of other developers who know the internals of Spark
better than I do.

Kind regards,
Emre Sevinç

Reply via email to