Re: Are These Issues Suitable for our Senior Project?

2015-07-09 Thread Feynman Liang
Exciting, thanks for the contribution! I'm currently aware of: - SPARK-8499 is currently in progress (in a duplicate issue); I updated the JIRA to reflect that. - SPARK-5992 has a spark package linked but I'm unclear on whethe

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
There is MulticlassMetrics in MLlib; unfortunately a pipelined version hasn't yet been made for spark-ml. SPARK-7690 is tracking work on this if you are interested in following the development. On Mon, Jul 13, 2015 at 2:16 AM, Olivier Girardot < o

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
t; > Regards, > > Olivier. > > Le lun. 13 juil. 2015 à 21:12, Feynman Liang a > écrit : > >> There is MulticlassMetrics in MLlib; unfortunately a pipelined version >> hasn't yet been made for spark-ml. SPARK-7690 >> <https://issues.apache.org/jira/brow

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
y to help on that ? > > 2015-07-13 22:39 GMT+02:00 Feynman Liang : > >> That is currently tracked by SPARK-3727 >> <https://issues.apache.org/jira/browse/SPARK-3727>. >> >> On Mon, Jul 13, 2015 at 1:16 PM, Olivier Girardot < >> o.girar...@lateral-thoughts.

Re: Contributiona nd choice of langauge

2015-07-14 Thread Feynman Liang
I would suggest starting with some starter tasks

Re:

2015-08-05 Thread Feynman Liang
qualifying_function() will be executed on each partition in parallel; stopping all parallel execution after the first instance satisfying qualifying_function() would mean that you would have to effectively make the computation sequential. On Wed, Aug 5, 2015 at 9:05 AM, Sandeep Giri wrote: > Oka

Re: Data frame with one column

2015-09-14 Thread Feynman Liang
For an example, see the ml-feature word2vec user guide <https://spark.apache.org/docs/latest/ml-features.html#word2vec> On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang wrote: > You could use `Tuple1(x)` instead of `Hack` > > On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander &l

Re: Data frame with one column

2015-09-14 Thread Feynman Liang
You could use `Tuple1(x)` instead of `Hack` On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander < alexander.ula...@hpe.com> wrote: > Dear Spark developers, > > > > I would like to create a dataframe with one column. However, the > createDataFrame method accepts at least a Product: > > > > val dat

Re: ML: embed a transformer

2015-09-14 Thread Feynman Liang
Where did you read that it should be public? The traits in ml.param.shared are meant to be used across internal spark.ml transformer implementations. If your transformer could be included in spark.ml, then I would recommend implementing it there so these package private traits can be reused. Other

Re: Enum parameter in ML

2015-09-14 Thread Feynman Liang
Since PipelineStages are serializable, the params must also be serializable. We also have to keep the Java API in mind. Introducing a new enum Param type may work, but we will have to ensure that Java users can use it without dealing with ClassTags (I believe Scala will create new types for each po

Re: Enum parameter in ML

2015-09-14 Thread Feynman Liang
there will be no problems > for Java users? (I only use Scala API) > > > > Best regards, Alexander > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Monday, September 14, 2015 5:27 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org &g

Re: [MLlib] BinaryLogisticRegressionSummary on test set

2015-09-17 Thread Feynman Liang
We have kept that private because we need to decide on a name for the method which evaluates on a test set (see the TODO comment ); perhaps you could push for this to happen by creating a Jira and pinging jkb

Re: [MLlib] BinaryLogisticRegressionSummary on test set

2015-09-18 Thread Feynman Liang
ssion ? > > Thx. > > On Thu, Sep 17, 2015 at 6:44 PM, Feynman Liang > wrote: > >> We have kept that private because we need to decide on a name for the >> method which evaluates on a test set (see the TODO comment >> <https://github.com/apache/spark/pull/7099/fi

Re: One element per node

2015-09-18 Thread Feynman Liang
rdd.mapPartitions(x => new Iterator(x.head)) On Fri, Sep 18, 2015 at 3:57 PM, Ulanov, Alexander wrote: > Dear Spark developers, > > > > Is it possible (and how to do it if possible) to pick one element per > physical node from an RDD? Let’s say the first element of any partition on > that node.

Re: One element per node

2015-09-18 Thread Feynman Liang
hat I have only one element per executor > (per worker, or per physical node)? > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Friday, September 18, 2015 4:06 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: One element per