value generated by Spark
with it generated by other implementations?
Regards,
Kazuaki Ishizaki
From: Sean Owen
To: jiayuanm
Cc: dev@spark.apache.org
Date: 2018/07/07 15:46
Subject:Re: [SPARK ML] Minhash integer overflow
I think it probably still does its.job; the hash
I think it probably still does its.job; the hash value can just be
negative. It is likely to be very slightly biased though. Because the
intent doesn't seem to be to allow the overflow it's worth changing to use
longs for the calculation.
On Fri, Jul 6, 2018, 8:36 PM jiayuanm wrote:
> Hi everyon
Sure. JIRA ticket is here: https://issues.apache.org/jira/browse/SPARK-24754.
I'll create the PR.
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.or
@spark.apache.org
Date: 2018/07/07 10:36
Subject:[SPARK ML] Minhash integer overflow
Hi everyone,
I was playing around with LSH/Minhash module from spark ml module. I
noticed
that hash computation is done with Int (see
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache
Hi everyone,
I was playing around with LSH/Minhash module from spark ml module. I noticed
that hash computation is done with Int (see
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala#L69).
Since "a" and "b" are from a uniform distributio