Hi All, I'm excited to announce a new module in Spark (SPARK-1251). After an initial review we've merged this as Spark as an alpha component to be included in Spark 1.0. This new component adds some exciting features, including:
- schema-aware RDD programming via an experimental DSL - native Parquet support - support for executing SQL against RDDs The pull request itself contains more information: https://github.com/apache/spark/pull/146 You can also find the documentation for this new component here: http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html This contribution was lead by Michael Ambrust with work from several other contributors who I'd like to highlight here: Yin Huai, Cheng Lian, Andre Schumacher, Timothy Chen, Henry Cook, and Mark Hamstra. - Reynold