Morning everyone, The question may seem to broad but will try to synth as much as possible:
I'm used to work with Spark SQL, DFs and such on a daily basis, easily grouping, getting extra counters and using functions or UDFs. However, I've come to an scenario where I need to make some predictions and linear regression is the way to go. However, lurking through the docs this belongs to the ML side of Spark and never been in there before... How is it working with Spark ML compared to what I'm used to? Training models, building a new one, adding more columns and such... Is there even a change or I'm just confused and it's pretty easy? When deploying ML pipelines, is there anything to take into account compared to the usual ones with Spark SQL and such? And... Is it even possible to do linear regression (or any other ML method) inside a traditional pipeline without training or any other ML related aspects? Some guidelines (or articles, ref to docs) would be helpful to start if possible. Thanks!