GitHub user manishamde opened a pull request:

    https://github.com/apache/spark/pull/79

    MLI-1 Decision Trees

    Joint work with @hirakendu, @etrain, @atalwalkar and @harsha2010.
    
    Key features:
    + Supports binary classification and regression
    + Supports gini, entropy and variance for information gain calculation
    + Supports both continuous and categorical features
    
    The algorithm has gone through several development iterations over the last 
few months leading to a highly optimized implementation. Optimizations include:
    
    1. Level-wise training to reduce passes over the entire dataset.
    2. Bin-wise split calculation to reduce computation overhead.
    3. Aggregation over partitions before combining to reduce communication 
overhead.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishamde/spark tree

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #79
    
----
commit cd53eae11313fd30f71f5ec94b20fe8d4427b8cd
Author: Manish Amde <manish...@gmail.com>
Date:   2013-11-28T10:20:27Z

    skeletal framework
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 92cedce2eb5055e0164c90842d6613c618bfed94
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-02T06:52:29Z

    basic building blocks for intermediate RDD calculation. untested.
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 8bca1e20b703fd90bc6fcdbed5d36b42a0bdf66e
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-09T03:48:39Z

    additional code for creating intermediate RDD
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 0012a77eb02e0a6627b7e3e68ac4d0f29d0885e0
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-10T05:08:44Z

    basic stump working
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 03f534c2f9a8dd739945f92b98a58e93fa5b716a
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-10T06:10:46Z

    some more tests
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit dad0afc85aea64c06b4dd64504b3112c881ae4e6
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-15T08:25:58Z

    decison stump functionality working
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 4798aae63e898fed71e6240462a163ad81ccd64b
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-15T08:45:23Z

    added gain stats class
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 80e8c66dd25ad03c706f4993b10ba4caafa54c18
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-16T01:41:59Z

    working version of multi-level split calculation
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit b0eb866cfd2d98a9281127e02e0c159668ca01f4
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-16T04:42:52Z

    added logic to handle leaf nodes
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 98ec8d57a0a0897b093ced7e3284228ee21ce5f4
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-22T06:39:29Z

    tree building and prediction logic
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 02c595c65f784061b1a78d4cbd5cac5990d1881d
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-22T20:00:17Z

    added command line parsing
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 733d6ddf51ddf440efb1a17c818da6d7fd027c4b
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-22T20:20:50Z

    fixed tests
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 154aa77c925e44a92e8bbf2f55e43cab06e75006
Author: Manish Amde <manish...@gmail.com>
Date:   2013-12-23T06:51:17Z

    enums for configurations
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit b0e3e76c47b1b449c91832aee2a6e94cee0a7c6b
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-12T19:45:47Z

    adding enum for feature type
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit c8f6d60c45ec7ec8cfac94b43fb22d8c294221db
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-12T19:46:55Z

    adding enum for feature type
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit e23c2e5089a2bf2a50c5d3f52e5799bf76ca3a16
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-19T21:23:45Z

    added regression support
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 53108ed6ad241765757c1e4c68189035505b370f
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-20T00:56:15Z

    fixing index for highest bin
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 6df35b9e70701528b13b33820b687f295bcfb3a4
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-21T04:33:52Z

    regression predict logic
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit dbb7ac13d28fba0848062a7bea40c617cb5f2c80
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-23T04:44:23Z

    categorical feature support
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit d504eb1f8a3f7f06226448d42b709f2f7ec6e91c
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-23T05:59:15Z

    more tests for categorical features
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 6b7de78e3a59bef8cbb8aff8b2aeed0cd91ab4a1
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-26T01:53:41Z

    minor refactoring and tests
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit b09dc983f4f05da61479c87617526064b0e3dde8
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-26T22:54:43Z

    minor refactoring
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit c0e522b7d1f5e27c81d682e5c8c97543fb4242be
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-27T03:11:43Z

    updated predict and split threshold logic
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit f067d68f0d951e7f0f089419c506fbd5ce2c2fc1
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-27T03:36:21Z

    minor cleanup
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 5841c2838e6834fc8c767f3c83dba7ef99375fa4
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-27T06:34:49Z

    unit tests for categorical features
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 0dd7659055879be9fbb3280964f87b14c735f225
Author: manishamde <manish...@gmail.com>
Date:   2014-01-27T06:42:06Z

    basic doc
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit dd0c0d799d42c94da3f930065a6c2973143bfd75
Author: Manish Amde <manish...@gmail.com>
Date:   2014-01-27T08:01:43Z

    minor: some docs
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 937277990e80f9a97070c63d39552579f0320fd7
Author: Manish Amde <manish...@gmail.com>
Date:   2014-02-17T03:42:48Z

    code style: max line lenght <= 100
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

commit 84f85d6d0a1fe7ed60149cc6b29a9ff76ef09abd
Author: Manish Amde <manish...@gmail.com>
Date:   2014-02-28T04:57:56Z

    code documentation
    
    Signed-off-by: Manish Amde <manish...@gmail.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to