Zhipeng Zhang created FLINK-27826:
-------------------------------------

             Summary: Support machine learning training for high dimesional 
models
                 Key: FLINK-27826
                 URL: https://issues.apache.org/jira/browse/FLINK-27826
             Project: Flink
          Issue Type: New Feature
          Components: Library / Machine Learning
            Reporter: Zhipeng Zhang
            Assignee: Zhipeng Zhang


There is limited support for training high dimensional machine learning models 
in FlinkML though it is often useful especially in industrial cases. When the 
size of the model parameter can not be hold in the memory of a single machine, 
FlinkML crashes now.

So it is useful to support high dimensional model training in FlinkML. To 
achieve this, we probably need to do the following things:
 # Do a survey on how to training large machine learning models of existing 
machine learning systems (e.g. data paralllel, model parallel)
 # Define/Implement the infra of supporting large model training in FlinkML
 # Implement a logistic regression model that can train models with more than 
ten billion parameters
 # Benchmark the implementation and further improve it



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to