Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-22 Thread Seth Hendrickson
I'm looking into this a bit further, thanks for bringing it up! Right now the LSH implementation only uses OR-amplification. The practical consequence of this is that it will select too many candidates when doing approximate near neighbor search and approximate similarity join. When we add AND-ampl

Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

2017-02-01 Thread Seth Hendrickson
In Spark.ML the coefficients are not "pivoted" meaning that they do not set one of the coefficient sets equal to zero. You can read more about it here: https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions You can translate your set of coefficient