Morning everyone,

The question may seem to broad but will try to synth as much as possible:

I'm used to work with Spark SQL, DFs and such on a daily basis, easily
grouping, getting extra counters and using functions or UDFs. However, I've
come to an scenario where I need to make some predictions and linear
regression is the way to go.

However, lurking through the docs this belongs to the ML side of Spark and
never been in there before...

How is it working with Spark ML compared to what I'm used to? Training
models, building a new one, adding more columns and such... Is there even a
change or I'm just confused and it's pretty easy?

When deploying ML pipelines, is there anything to take into account
compared to the usual ones with Spark SQL and such?

And... Is it even possible to do linear regression (or any other ML method)
inside a traditional pipeline without training or any other ML related
aspects?

Some guidelines (or articles, ref to docs) would be helpful to start if
possible.

Thanks!

Reply via email to