I think that's a bit orthogonal - right now you can't specify continuous spaces. The straightforward thing is to allow random sampling from a big grid. You can create a geometric series of values to try, of course - 0.001, 0.01, 0.1, etc. Yes I get that if you're randomly choosing, you can randomly choose from a continuous space of many kinds. I don't know if it helps a lot vs the change in APIs (and continuous spaces don't make as much sense for grid search) Of course it helps a lot if you're doing a smarter search over the space, like what hyperopt does. For that, I mean, one can just use hyperopt + Spark ML already if desired.
On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry <londonjava...@gmail.com> wrote: > Thanks, Sean! I hope to offer a PR next week. > > Not sure about a dependency on the grid search, though - but happy to hear > your thoughts. I mean, you might want to explore logarithmic space evenly. > For example, something like "please search 1e-7 to 1e-4" leads to a > reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly) > evenly spaced in logarithmic space but not in linear space. So, saying what > fraction of a grid search to sample wouldn't make sense (unless the grid > was warped, of course). > > Does that make sense? It might be better for me to just write the code as > I don't think it would be very complicated. > > Happy to hear your thoughts. > > Phillip > > > > On Fri, Jan 29, 2021 at 1:47 PM Sean Owen <sro...@gmail.com> wrote: > >> I don't know of anyone working on that. Yes I think it could be useful. I >> think it might be easiest to implement by simply having some parameter to >> the grid search process that says what fraction of all possible >> combinations you want to randomly test. >> >> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry <londonjava...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I have no work at the moment so I was wondering if anybody would be >>> interested in me contributing code that generates an Array[ParamMap] for >>> random hyperparameters? >>> >>> Apparently, this technique can find a hyperparameter in the top 5% of >>> parameter space in fewer than 60 iterations with 95% confidence [1]. >>> >>> I notice that the Spark code base has only the brute force >>> ParamGridBuilder unless I am missing something. >>> >>> Hyperparameter optimization is an area of interest to me but I don't >>> want to re-invent the wheel. So, if this work is already underway or there >>> are libraries out there to do it please let me know and I'll shut up :) >>> >>> Regards, >>> >>> Phillip >>> >>> [1] >>> https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html >>> >>