Jorn,

My question is not about the model type but instead, the spark capability
on reusing any already trained ml model in training a new model.




On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> Is it really required to have one billion samples for just linear
> regression? Probably your model would do equally well with much less
> samples. Have you checked bias and variance if you use much less random
> samples?
>
> On 22. Aug 2017, at 12:58, Sea aj <saj3...@gmail.com> wrote:
>
> I have a large dataframe of 1 billion rows of type LabeledPoint. I tried
> to train a linear regression model on the df but it failed due to lack of
> memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of
> CPU.
>
> I decided to split my data into multiple chunks and train the model in
> multiple phases but I learned the linear regression model in ml library
> does not have "setinitialmodel" function to be able to pass the trained
> model from one chunk to the rest of chunks. In another word, each time I
> call the fit function over a chunk of my data, it overwrites the previous
> mode.
>
> So far the only solution I found is using Spark Streaming to be able to
> split the data to multiple dfs and then train over each individually to
> overcome memory issue.
>
> Do you know if there's any other solution?
>
>
>
>
> On Mon, Jul 10, 2017 at 7:57 AM, Jayant Shekhar <jayantbaya...@gmail.com>
> wrote:
>
>> Hello Mahesh,
>>
>> We have built one. You can download from here :
>> https://www.sparkflows.io/download
>>
>> Feel free to ping me for any questions, etc.
>>
>> Best Regards,
>> Jayant
>>
>>
>> On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker <
>> mahesh_sawai...@persistent.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> 1) Is anyone aware of any workbench kind of tool to run ML jobs in
>>> spark. Specifically is the tool  could be something like a Web application
>>> that is configured to connect to a spark cluster.
>>>
>>>
>>> User is able to select input training sets probably from hdfs , train
>>> and then run predictions, without having to write any Scala code.
>>>
>>>
>>> 2) If there is not tool, is there value in having such tool, what could
>>> be the challenges.
>>>
>>>
>>> Thanks,
>>>
>>> Mahesh
>>>
>>>
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information which is
>>> the property of Persistent Systems Ltd. It is intended only for the use of
>>> the individual or entity to which it is addressed. If you are not the
>>> intended recipient, you are not authorized to read, retain, copy, print,
>>> distribute or use this message. If you have received this communication in
>>> error, please notify the sender and delete all copies of this message.
>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>> mails.
>>>
>>
>>
>

Reply via email to