Hi Matei, thanks for the comments. On Mon, Jun 23, 2014 at 7:58 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > When we did the configuration pull request, we actually avoided having a big > list of defaults in one class file, because this creates a file that all the > components in the project depend on. For example, since we have some settings > specific to streaming and the REPL, do we want those settings to appear in a > file that’s in “core”? It might be better to just make sure we use constants > for defaults in the code, or maybe have a separate class per project.
I'm actually following the style used in Hadoop here, but expanding it a little bit to add type-safety too. (e.g., see DFSConfigKeys.java in HDFS.) I sort of see your point of "everybody needs to depend on this file", but then all the affected code already depends on SparkConf. > The other problem with this kind of change is that it’s disruptive to all the > other ongoing patches, so I wouldn’t consider it high-priority right now. We > haven’t had a ton of problems with settings being mistyped. Yes, large changes like this are always disruptive. But since it mostly touches setup code, not the innards of algorithms and the like, it shouldn't be hard to merge things. > If you do want to do something like this though, apart from the comment above > about modules, please make sure this is not a public API. As soon as we add > it to the API, it means we can’t change or remove those config settings. I’d > also suggest giving each config setting a single name instead of having “ui”, > “shuffle”, etc objects, since the chained calls to conf.ui.port.value look > somewhat confusing. Making it not public is easy; but then how do we want to expose this to users? It's normal right now, as you say, to hardcode the settings in the code (or config files - which this change would not help with). In the process of writing the p.o.c., I found cases of the same setting with different casing in different source files, and the same setting with different defaults. So maybe it has not affected any users yet, but it's easy to miss things like that with the current approach. I don't have a good suggestion for separating the module-specific configs; I kinda liked the "everything under SparkConf" approach and tried to group settings in a way that made sense. Creating objects in each module to hold the configs for those is possible, but would probably look a bit uglier. But in any case, this is not urgent, more of a nice thing to have in my opinion. -- Marcelo