I think it probably still does its.job; the hash value can just be
negative. It is likely to be very slightly biased though. Because the
intent doesn't seem to be to allow the overflow it's worth changing to use
longs for the calculation.
On Fri, Jul 6, 2018, 8:36 PM jiayuanm wrote:
> Hi everyon
Sure. JIRA ticket is here: https://issues.apache.org/jira/browse/SPARK-24754.
I'll create the PR.
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.or
Thank for you reporting this issue. I think this is a bug regarding
integer overflow. IMHO, it would be good to compute hashes with Long.
Would it be possible to create a JIRA entry? Do you want to submit a pull
request, too?
Regards,
Kazuaki Ishizaki
From: jiayuanm
To: dev@spark.apa
Hi everyone,
I was playing around with LSH/Minhash module from spark ml module. I noticed
that hash computation is done with Int (see
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala#L69).
Since "a" and "b" are from a uniform distributio
FYI 6 mo is coming up soon since the last release. We will cut the branch
and code freeze on Aug 1st in order to get 2.4 out on time.
FYI, there's some initial exploring of what it would take to move the HDFS wire
protocol to move from HTrace for OpenTrace for tracing, and wire up the other
stores too
https://issues.apache.org/jira/browse/HADOOP-15566
If anyone has any input/insight or code review capacity, it'd be welcome.