racle.com/BestPerf/entry/improving_algorithms
>> _in_spark_ml
>> Background:
>>
>> Brad Carlile. Parallelism, compute intensity, and data vectorization.
>> SuperComputing'93, November 1993.
>> <https://blogs.oracle.com/BestPerf/resource/Carlile-app_compute-
puting'93, November 1993.
> <https://blogs.oracle.com/BestPerf/resource/Carlile-app_compute-intensity-1993.pdf>
>
> John McCalpin. 213876927_Memory_Bandwidth_and_Machine_Balance_in_
> Current_High_Performance_Computers 1995
> <https://www.researchgate.net/publication/213876927_Memory_B
nsity-1993.pdf>
John McCalpin.
213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers
1995
<https://www.researchgate.net/publication/213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers>
--
View this message in context:
http://apache-spark-developers-list.1001
@spark.apache.org; Sean Owen
Subject: Re: MLlib mission and goals
On the topic of usability, I think more effort should be put into large scale
testing. We've encountered issues with building large models that are not
apparent in small models, and these issues have made productizing ML/MLLIB
t; Another related area is SparkR. API Parity between SparkR and ML/MLLIB is
> important. We should also pay attention to R users' habits and experiences
> when maintaining API parity.
>
> Miao
>
>
> - Original message -----
> From: Stephen Boesch
> To: Sean
I started working on ML/MLLIB/R since last year. Here are some of my thoughts from a beginner's perspective:
Current ML/MLLIB core algorithms can serve as good implementation examples, which makes adding new algorithms easier. Even a beginner like me, can pick it up quickly and learn how to add n
re: spark-packages.org and "Would these really be better in the core
project?" That was not at all the intent of my input: instead to ask "how
and where to structure/place deployment quality code that yet were *not*
part of the distribution?" The spark packages has no curation whatsoever
: no
I also agree with Joseph and Sean.
With respect to spark-packages. I think the issue is that you have to manually
add it, although it basically fetches the package from Maven Central (or custom
upload).
From an organizational perspective there are other issues. E.g. You have to
download it from
My $0.02, which shouldn't be weighted too much.
I believe the mission as of Spark ML has been to provide the framework, and
then implementation of 'the basics' only. It should have the tools that
cover ~80% of use cases, out of the box, in a pretty well-supported and
tested way.
It's not a goal t
Along the lines of #1: the spark packages seemed to have had a good start
about two years ago: but now there are not more than a handful in general
use - e.g. databricks CSV.
When the available packages are browsed the majority are incomplete, empty,
unmaintained, or unclear.
Any ideas on how to
This thread is split off from the "Feedback on MLlib roadmap process
proposal" thread for discussing the high-level mission and goals for
MLlib. I hope this thread will collect feedback and ideas, not necessarily
lead to huge decisions.
Copying from the previous thread:
*Seth:*
"""
I would love
11 matches
Mail list logo