How many iterations are you doing on the data? Like Jörn said, you don't
necessarily need a billion samples for linear regression.
On Tue, Aug 22, 2017 at 6:28 PM, Sea aj wrote:
> Jorn,
>
> My question is not about the model type but instead, the spark capability
> on reusing any already trained
Jorn,
My question is not about the model type but instead, the spark capability
on reusing any already trained ml model in training a new model.
On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke wrote:
> Is it really required to have one billion samples for just linear
> regression? Probably your
Is it really required to have one billion samples for just linear regression?
Probably your model would do equally well with much less samples. Have you
checked bias and variance if you use much less random samples?
> On 22. Aug 2017, at 12:58, Sea aj wrote:
>
> I have a large dataframe of 1 b
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to
train a linear regression model on the df but it failed due to lack of
memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of
CPU.
I decided to split my data into multiple chunks and train the model in
Hello Mahesh,
We have built one. You can download from here :
https://www.sparkflows.io/download
Feel free to ping me for any questions, etc.
Best Regards,
Jayant
On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker <
mahesh_sawai...@persistent.com> wrote:
> Hi,
>
>
> 1) Is anyone aware of any wor