shane will be OOO 8-5-15 through 8-18-15

2015-08-04 Thread shane knapp
so i done gone and got myself hitched, and will be disappearing in to the rainy island of kol chang in thailand for the next ~2 weeks. :) this means i will be completely out of contact, and have to leave jenkins in the gentle hands of jon kuroda (a sysadmin here at the lab) and matt massie (my bo

Re: How to help for 1.5 release?

2015-08-04 Thread Patrick Wendell
Hey Meihua, If you are a user of Spark, one thing that is really helpful is to run Spark 1.5 on your workload and report any issues, performance regressions, etc. - Patrick On Mon, Aug 3, 2015 at 11:49 PM, Akhil Das wrote: > I think you can start from here > https://issues.apache.org/jira/brows

Re: Have Friedman's glmnet algo running in Spark

2015-08-04 Thread mike
My friends and I are continuing work on the algorithm. You are right that there are two elements to Friedman's glmnet algorithm. One is the use of coordinate descent for minimizing penalized regression with an absolute value penalty and the other is managing the regularization parameters. Fried

Fwd: Writing streaming data to cassandra creates duplicates

2015-08-04 Thread Priya Ch
Yes...union would be one solution. I am not doing any aggregation hence reduceByKey would not be useful. If I use groupByKey, messages with same key would be obtained in a partition. But groupByKey is very expensive operation as it involves shuffle operation. My ultimate goal is to write the messag

Re: Have Friedman's glmnet algo running in Spark

2015-08-04 Thread Patrick
I have a follow up on this: I see on JIRA that the idea of having a GLMNET implementation was more or less abandoned, since a OWLQN implementation was chosen to construct a model using L1/L2 regularization. However, GLMNET has the property of "returning a multitide of models (corresponding to