Well one of the strength of spark is standardized general distributed processing allowing many different types of processing, such as graph processing, stream processing etc. The limitation is that it is less performant than one system focusing only on one type of processing (eg graph processing). I miss - and this may not be spark specific - some artificial intelligence to manage a cluster, e.g. Predicting workloads, how long a job may run based on previously executed similar jobs etc. Furthermore, many optimizations you have do to manually, e.g. Bloom filters, partitioning etc - if you find here as well some intelligence that does this automatically based on previously executed jobs taking into account that optimizations themselves change over time would be great... You may also explore feature interaction
Le mar. 14 juil. 2015 à 7:19, Shashidhar Rao <raoshashidhar...@gmail.com> a écrit : > Hi, > > I am doing my PHD thesis on large scale machine learning e.g Online > learning, batch and mini batch learning. > > Could somebody help me with ideas especially in the context of Spark and > to the above learning methods. > > Some ideas like improvement to existing algorithms, implementing new > features especially the above learning methods and algorithms that have not > been implemented etc. > > If somebody could help me with some ideas it would really accelerate my > work. > > Plus few ideas on research papers regarding Spark or Mahout. > > Thanks in advance. > > Regards >