[ 
https://issues.apache.org/jira/browse/FLINK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kate Eri updated FLINK-5782:
----------------------------
    Description: 
This ticket was initiated as continuation of the dev discussion thread: [New 
Flink team member - Kate Eri (Integration with DL4J 
topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]  
Recently we have proposed the idea to integrate 
[Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. 
It is known that DL models training is resource demanding process, so training 
on CPU could converge much longer than on GPU.  

But not only for DL training GPU usage could be supposed, but also for 
optimization of graph analytics and other typical data manipulations, nice 
overview of GPU related problems is presented [Accelerating Spark workloads 
using 
GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].

Currently the community pointed the following issues to consider:
1)      Flink would like to avoid to write one more time its own GPU support, 
to reduce engineering burden. That’s why such libraries like 
[ND4J|http://nd4j.org/userguide]  should be considered. 
2)      Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to 
optimize linear algebra calculations, ND4J can’t be integrated as is, because 
it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe 
this issue should be simply contributed to ND4J to enable its usage?
3)      The calculations would have to work with both available and not 
available GPUs. If the system detects that GPUs are available, then ideally it 
would exploit them. Thus GPU resource management could be incorporated in 
[FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested).
4)      It was mentioned that as far Flink takes care of shipping data around 
the cluster, also it will perform its dump out to GPU for calculation and load 
back up. In practice, the lack of a persist method for intermediate results 
makes this troublesome (not because of GPUs but for calculating any sort of 
complex algorithm we expect to be able to cache intermediate results).
That’s why the Ticket 
[FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be 
implemented to solve such problem.  
5)      Also it was recommended to take a look at Apache Mahout, at least to 
get the experience with  GPU integration and check its
https://github.com/apache/mahout/tree/master/viennacl-omp
https://github.com/apache/mahout/tree/master/viennacl 

6)  For now, GPU proposed only for batch calculations optimization, to support 
GPU for streaming should be started another ticket, because optimization of 
streaming by GPU requires additional research.       
7) Also experience of Netflix regarding this question could be considered: 
[Distributed Neural Networks with GPUs in the AWS 
Cloud|http://techblog.netflix.com/search/label/CUDA]   

This is considered as master ticket for GPU related ticktes


  was:
This ticket was initiated as continuation of the dev discussion thread: [New 
Flink team member - Kate Eri (Integration with DL4J 
topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]  
Recently we have proposed the idea to integrate 
[Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. 
It is known that DL models training is resource demanding process, so training 
on CPU could converge much longer than on GPU.  

But not only for DL training GPU usage could be supposed, but also for 
optimization of graph analytics and other typical data manipulations, nice 
overview of GPU related problems is presented [Accelerating Spark workloads 
using 
GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].

Currently the community pointed the following issues to consider:
1)      Flink would like to avoid to write one more time its own GPU support, 
to reduce engineering burden. That’s why such libraries like 
[ND4J|http://nd4j.org/userguide]  should be considered. 
2)      Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to 
optimize linear algebra calculations, ND4J can’t be integrated as is, because 
it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe 
this issue should be simply contributed to ND4J to enable its usage?
3)      The calculations would have to work with both available and not 
available GPUs. If the system detects that GPUs are available, then ideally it 
would exploit them. Thus GPU resource management could be incorporated in 
[FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested).
4)      It was mentioned that as far Flink takes care of shipping data around 
the cluster, also it will perform its dump out to GPU for calculation and load 
back up. In practice, the lack of a persist method for intermediate results 
makes this troublesome (not because of GPUs but for calculating any sort of 
complex algorithm we expect to be able to cache intermediate results).
That’s why the Ticket 
[FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be 
implemented to solve such problem.  
5)      Also it was recommended to take a look at Apache Mahout, at least to 
get the experience with  GPU integration and check its
https://github.com/apache/mahout/tree/master/viennacl-omp
https://github.com/apache/mahout/tree/master/viennacl 

6)      Also experience of Netflix regarding this question could be considered: 
[Distributed Neural Networks with GPUs in the AWS 
Cloud|http://techblog.netflix.com/search/label/CUDA]   

This is considered as master ticket for GPU related ticktes



> Support GPU calculations
> ------------------------
>
>                 Key: FLINK-5782
>                 URL: https://issues.apache.org/jira/browse/FLINK-5782
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.3.0
>            Reporter: Kate Eri
>            Priority: Minor
>
> This ticket was initiated as continuation of the dev discussion thread: [New 
> Flink team member - Kate Eri (Integration with DL4J 
> topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]
>   
> Recently we have proposed the idea to integrate 
> [Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. 
> It is known that DL models training is resource demanding process, so 
> training on CPU could converge much longer than on GPU.  
> But not only for DL training GPU usage could be supposed, but also for 
> optimization of graph analytics and other typical data manipulations, nice 
> overview of GPU related problems is presented [Accelerating Spark workloads 
> using 
> GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].
> Currently the community pointed the following issues to consider:
> 1)    Flink would like to avoid to write one more time its own GPU support, 
> to reduce engineering burden. That’s why such libraries like 
> [ND4J|http://nd4j.org/userguide]  should be considered. 
> 2)    Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to 
> optimize linear algebra calculations, ND4J can’t be integrated as is, because 
> it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe 
> this issue should be simply contributed to ND4J to enable its usage?
> 3)    The calculations would have to work with both available and not 
> available GPUs. If the system detects that GPUs are available, then ideally 
> it would exploit them. Thus GPU resource management could be incorporated in 
> [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only 
> suggested).
> 4)    It was mentioned that as far Flink takes care of shipping data around 
> the cluster, also it will perform its dump out to GPU for calculation and 
> load back up. In practice, the lack of a persist method for intermediate 
> results makes this troublesome (not because of GPUs but for calculating any 
> sort of complex algorithm we expect to be able to cache intermediate results).
> That’s why the Ticket 
> [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be 
> implemented to solve such problem.  
> 5)    Also it was recommended to take a look at Apache Mahout, at least to 
> get the experience with  GPU integration and check its
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl 
> 6)  For now, GPU proposed only for batch calculations optimization, to 
> support GPU for streaming should be started another ticket, because 
> optimization of streaming by GPU requires additional research.     
> 7) Also experience of Netflix regarding this question could be considered: 
> [Distributed Neural Networks with GPUs in the AWS 
> Cloud|http://techblog.netflix.com/search/label/CUDA]   
> This is considered as master ticket for GPU related ticktes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to