Well one of the strength of spark is standardized general distributed
processing allowing many different types of processing, such as graph
processing, stream processing etc. The limitation is that it is less
performant than one system focusing only on one type of processing (eg
graph processing). I miss - and this may not be spark specific - some
artificial intelligence to manage a cluster, e.g. Predicting workloads, how
long a job may run based on previously executed similar jobs etc.
Furthermore, many optimizations you have do to manually, e.g. Bloom
filters, partitioning etc - if you find here as well some intelligence that
does this automatically based on previously executed jobs taking into
account that optimizations themselves change over time would be great...
You may also explore feature interaction

Le mar. 14 juil. 2015 à 7:19, Shashidhar Rao <raoshashidhar...@gmail.com> a
écrit :

> Hi,
>
> I am doing my PHD thesis on large scale machine learning e.g  Online
> learning, batch and mini batch learning.
>
> Could somebody help me with ideas especially in the context of Spark and
> to the above learning methods.
>
> Some ideas like improvement to existing algorithms, implementing new
> features especially the above learning methods and algorithms that have not
> been implemented etc.
>
> If somebody could help me with some ideas it would really accelerate my
> work.
>
> Plus few ideas on research papers regarding Spark or Mahout.
>
> Thanks in advance.
>
> Regards
>

Reply via email to