Congrats! That's a really impressive and useful addition to spark. I just recently discovered a similar feature in pandas and really enjoyed using it.
Regards, Heiko > Am 21.03.2014 um 02:11 schrieb Reynold Xin <r...@databricks.com>: > > Hi All, > > I'm excited to announce a new module in Spark (SPARK-1251). After an > initial review we've merged this as Spark as an alpha component to be > included in Spark 1.0. This new component adds some exciting features, > including: > > - schema-aware RDD programming via an experimental DSL > - native Parquet support > - support for executing SQL against RDDs > > The pull request itself contains more information: > https://github.com/apache/spark/pull/146 > > You can also find the documentation for this new component here: > http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html > > > This contribution was lead by Michael Ambrust with work from several other > contributors who I'd like to highlight here: Yin Huai, Cheng Lian, Andre > Schumacher, Timothy Chen, Henry Cook, and Mark Hamstra. > > > - Reynold