Re: [SPARK ML] Minhash integer overflow

2018-07-07 Thread Kazuaki Ishizaki
value generated by Spark with it generated by other implementations? Regards, Kazuaki Ishizaki From: Sean Owen To: jiayuanm Cc: dev@spark.apache.org Date: 2018/07/07 15:46 Subject:Re: [SPARK ML] Minhash integer overflow I think it probably still does its.job; the hash

Re: [SPARK ML] Minhash integer overflow

2018-07-06 Thread Sean Owen
I think it probably still does its.job; the hash value can just be negative. It is likely to be very slightly biased though. Because the intent doesn't seem to be to allow the overflow it's worth changing to use longs for the calculation. On Fri, Jul 6, 2018, 8:36 PM jiayuanm wrote: > Hi everyon

Re: [SPARK ML] Minhash integer overflow

2018-07-06 Thread jiayuanm
Sure. JIRA ticket is here: https://issues.apache.org/jira/browse/SPARK-24754. I'll create the PR. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.or

Re: [SPARK ML] Minhash integer overflow

2018-07-06 Thread Kazuaki Ishizaki
Thank for you reporting this issue. I think this is a bug regarding integer overflow. IMHO, it would be good to compute hashes with Long. Would it be possible to create a JIRA entry? Do you want to submit a pull request, too? Regards, Kazuaki Ishizaki From: jiayuanm To: dev@spark.apa