Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I think that's a bit orthogonal - right now you can't specify continuous spaces. The straightforward thing is to allow random sampling from a big grid. You can create a geometric series of values to try, of course - 0.001, 0.01, 0.1, etc. Yes I get that if you're randomly choosing, you can randomly

Re: Public API access to UDTs

2021-01-29 Thread Fitch, Simeon
On Fri, Jan 29, 2021 at 9:46 AM Sean Owen wrote: > Are there implications for storing UDTs in particular engines or formats? > I've found UDTs I/O to Parquet without problem. They work fine with PySpark with implementation of mirror classes. Without properly constructed mirror classe they show

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Thanks, Sean! I hope to offer a PR next week. Not sure about a dependency on the grid search, though - but happy to hear your thoughts. I mean, you might want to explore logarithmic space evenly. For example, something like "please search 1e-7 to 1e-4" leads to a reasonably random sample being {3

Re: Public API access to UDTs

2021-01-29 Thread Sean Owen
I'm also interested: are there problems with opening up this API beyond needing to freeze it and keep it stable? it's pretty stable. As @DeveloperApi at least? Are there implications for storing UDTs in particular engines or formats? Just making it public for developers, even with a 'use at your ow

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I don't know of anyone working on that. Yes I think it could be useful. I think it might be easiest to implement by simply having some parameter to the grid search process that says what fraction of all possible combinations you want to randomly test. On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry

Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Hi, I have no work at the moment so I was wondering if anybody would be interested in me contributing code that generates an Array[ParamMap] for random hyperparameters? Apparently, this technique can find a hyperparameter in the top 5% of parameter space in fewer than 60 iterations with 95% confi