Are there any plans to generalize the type of VertexId in GraphX?
Our keys are particularly long. We could use the hashCode() trick, but the
chance of collisions is not acceptable. Given our data volume, we have
encountered hashCode() collisions more than once.
I see this Jira, but it is specific
I have to say, I have created a Jira task for it:
[SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRA
| |
| | | | | |
| [SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRAMLlib is all
k-means now, and I think we should add some new clustering algorithms to it
Dear all,
I think MLlib needs more clustering algorithms and DBSCAN is my first
candidate. I am starting to implement it. Any advice?
Muhammad-Ali
If it is a small collection of them on the driver, you can just use
sc.parallelize to create an RDD.
On Tue, Jan 13, 2015 at 7:56 AM, Malith Dhanushka
wrote:
> Hi Reynold,
>
> Thanks for the response. I am just wondering, lets say we have set of Row
> objects. Isn't there a straightforward way
It's not necessary, I will create a PR to remove them.
For larger dict/list/tuple, the pickle approach may have less RPC
calls, better performance.
Davies
On Tue, Jan 13, 2015 at 4:53 AM, Meethu Mathew wrote:
> Hi all,
>
> In the python object to java conversion done in the method _py2java in
>
On Mon, Jan 12, 2015 at 8:14 PM, Meethu Mathew wrote:
> Hi,
>
> This is the function defined in PythonMLLibAPI.scala
> def findPredict(
> data: JavaRDD[Vector],
> wt: Object,
> mu: Array[Object],
> si: Array[Object]): RDD[Array[Double]] = {
> }
>
> So the parameter mu sho
FYI our git repo may be down for a few hours today.
-- Forwarded message --
From: "Tony Stevenson"
Date: Jan 13, 2015 6:49 AM
Subject: [ NOTICE ] Service Downtime Notification - R/W git repos
To:
Cc:
Folks,
Please note than on Thursday 15th at 20:00 UTC the Infrastructure team
wi
Hi everyone,
I am newly to spark, and try to package the spark-core for some
modification. I use IDEA to package the spark-core_2.10 of spark 1.1.1.
When encounter the following error, I check the website
http://www.scalastyle.org/maven.html, and its suggest configuration is to
modify the spark
Hi all,
In the python object to java conversion done in the method _py2java in
spark/python/pyspark/mllib/common.py, why we are doing individual
conversion using MpaConverter,ListConverter? The same can be acheived
using
bytearray(PickleSerializer().dumps(obj))
obj = sc._jvm.SerDe.loads(by
Depends on what the other side is doing. You can create your own RDD
implementation by subclassing RDD, or it might work if you use
sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data
and return an iterator */ ) where n is the number of partitions.
On Tue, Jan 13, 2015 at 12
Hi,
We have a custom datasources API, which connects to various data sources
and exposes them out as a common API. We are now trying to implement the
Spark datasources API released in 1.2.0 to connect Spark for analytics.
Looking at the sources API, we figured out that we should extend a scan
cla
11 matches
Mail list logo