It depends on what model you would like to train but models requiring optimisation could use SGD with mini batches. See: https://spark.apache.org/docs/latest/mllib-optimization.html#stochastic-gradient-descent-sgd
On 23 August 2017 at 14:27, Sea aj <saj3...@gmail.com> wrote: > Hi, > > I am trying to feed a huge dataframe to a ml algorithm in Spark but it > crashes due to the shortage of memory. > > Is there a way to train the model on a subset of the data in multiple > steps? > > Thanks > > > > <https://mailtrack.io/> Sent with Mailtrack > <https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22> > -- Mehmet Süzen, MSc, PhD <su...@acm.org> | PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information that is legally privileged. If you are not the intended recipient or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED within the applicable law. If you have received this transmission in error, please: (1) immediately notify me by reply e-mail to su...@acm.org, and (2) destroy the original transmission and its attachments without reading or saving in any manner. |