Re: Improving metadata in Spark JIRA

2015-02-21 Thread Nicholas Chammas
As of right now, there are no more open JIRA issues without an assigned component ! Hurray! [image: yay] Thanks to Sean and others

Google Summer of Code - ideas

2015-02-21 Thread Manoj Kumar
Hello, I've been working on the Spark codebase for quite some time right now, especially on issues related to MLlib and a very small amount of PySpark and SparkSQL (https://github.com/apache/spark/pulls/MechCoder) . I would like to extend my work with Spark as a Google Summer of Code project. I w

Spark SQL - Long running job

2015-02-21 Thread nitin
Hi All, I intend to build a long running spark application which fetches data/tuples from parquet, does some processing(time consuming) and then cache the processed table (InMemoryColumnarTableScan). My use case is good retrieval time for SQL query(benefits of Spark SQL optimizer) and data compres

Re: GSOC2015

2015-02-21 Thread Manoj Kumar
Hi, For the starters task you can filter by Documentation and "Priority-Minor" and "Priority-Trivial" over here, since those are most probably the easiest things to fix, https://issues.apache.org/jira/browse/SPARK/ . You can also filter based on your expertise, i.e MLlib (for Machine Learning), Sp

GSOC2015

2015-02-21 Thread magellane a
Hi Since we're approaching the GSOC2015 application process I have some questions: 1) Will your organization be a part of GSOC2015 and what are the projects that you will be interested in? 2) Since I'm not a contributor to apache spark, what are some starter tasks I can work on to gain facility wi