[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

marmbrus Fri, 14 Mar 2014 16:35:25 -0700

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/146


    SPARK-1251 Support for optimizing and executing structured queries

    This pull request adds support to Spark for working with structured data 
using a simple SQL dialect, HiveQL and a Scala Query DSL.
    
    *This is being contributed as a new __alpha component__ to Spark and does 
not modify Spark core or other components.*
    
    The code is broken into three primary components:
     - Catalyst (sql/catalyst) - An implementation-agnostic framework for 
manipulating trees of relational operators and expressions.  
     - Execution (sql/core) - A query planner / execution engine for 
translating Catalystâs logical query plans into Spark RDDs.  This component 
also includes a new public interface, SqlContext, that allows users to execute 
SQL or structured scala queries against existing RDDs and Parquet files.
     - Hive Metastore Support (sql/hive) - An extension of SqlContext called 
HiveContext that allows users to write queries using a subset of HiveQL and 
access data from a Hive Metastore using Hive SerDes.  There are also wrappers 
that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
    
    A more complete design of this new component can be found in [the 
associated JIRA](https://spark-project.atlassian.net/browse/SPARK-1251).
    
    [An updated version of the Spark documentation, including API Docs for all 
three 
sub-components,](http://www.cs.berkeley.edu/~marmbrus/sparkdocs/_site/sql-programming-guide.html)
 is also available for review.
    
    With this PR comes support for inferring the schema of existing RDDs that 
contain case classes.  Using this information, developers can now express 
structured queries that are automatically compiled into RDD operations.
    
    ```scala
    // Define the schema using a case class.
    case class Person(name: String, age: String)
    val people: RDD[Person] =
      sc.textFile("people.txt").map(_.split(",")).map(p => Person(p(0), 
p(1).toInt))
    
    // The following is the same as 'SELECT name FROM people WHERE age >= 10 && 
age <= 19'
    val teenagers = people.where('age >= 10).where('age <= 
19).select('name).toRdd
    ```
    
    RDDs can also be registered as Tables, allowing SQL queries to be written 
over them.
    ```scala
    people.registerAsTable("people")
    val teenagers = sql("SELECT name FROM people WHERE age >= 10 && age <= 19")
    ```
    
    The results of queries are themselves RDDs and support standard RDD 
operations:
    ```scala
    teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
    ```
    
    Finally, with the optional Hive support, users can read and write data 
located in existing Apache Hive deployments using HiveQL.
    ```scala
    sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
    sql("LOAD DATA LOCAL INPATH 'src/main/resources/kv1.txt' INTO TABLE src")
    
    // Queries are expressed in HiveQL
    sql("SELECT key, value FROM src").collect().foreach(println)
    ```
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark catalyst

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #146
    
----
commit 5dab0bc9e94b7b81d03e8b2bc22a72897a907d37
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-28T23:31:07Z

    Merge pull request #26 from liancheng/serdeAndPartitionPruning
    
    Hive SerDe support and partition pruning optimization

commit 677eb073f635815a2aa22a49ed466b84c785d6ed
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-29T00:14:18Z

    Update test whitelist.

commit d4f539a9a7c0210b609e68a0fa49b1d2922b1205
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-29T04:15:38Z

    blacklist mr and user specific tests.

commit 4c89d6ea16c4de05d45a8336ef3808d96cc3abe4
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-29T04:43:31Z

    Merge pull request #27 from marmbrus/moreTests
    
    Update test whitelist.

commit ebb56faaec54c970fa49e8c575facfd6658e37ea
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-29T06:27:35Z

    add travis config

commit 8ee41be08034e1a66ec13a0ed66a1b59a3ad0aaa
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-01-30T14:38:45Z

    Minor refactoring

commit 2486fb71dc89f915c4f54a95e42211c79fc99e4c
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-01-30T14:39:00Z

    Fixed spelling

commit 61e729cc21afcafe64af1befee2efb54271bf6d8
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-01-30T14:39:37Z

    Added ColumnPrunings strategy and test cases

commit 605255eb979416edc19c005f0bc7b8d5f13dd44b
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-30T22:55:06Z

    Added scalastyle checker.

commit 08e4d0589056f3ae6e117689596420bbf7fbbbc2
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-30T23:59:55Z

    First round of style cleanup.

commit 7213a2c466d7e30cabb2a2fd07bc81a8d7e36cfe
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-31T00:14:32Z

    style fix for Hive.scala.

commit 5c1e60043c4b60529936f93a1536d021f28a2460
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-31T00:18:55Z

    Added hash code implementation for AttributeReference

commit 7e24436da3de67e3b33c310d0c761b2c8e3d11bd
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-31T00:34:59Z

    Removed dependency on JDK 7 (nio.file).

commit 41bbee67d888f8773a1b02ecc5abd957cda033ee
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-01-31T05:31:15Z

    Merge remote-tracking branch 'upstream/master' into exchangeOperator
    
    Conflicts:
        build.sbt
        src/main/scala/catalyst/execution/SharkInstance.scala

commit f47c2f6f3572cb15da916c0efab7839e485ec905
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-01-31T06:32:00Z

    set outputPartitioning in BroadcastNestedLoopJoin

commit d91e276fb303a878bb54ba156a3087c204f0e167
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-31T21:40:59Z

    Remove dependence on HIVE_HOME for running tests.  This was done by moving 
all the hive query test (from branch-0.12) and data files into src/test/hive.  
These are used by default when HIVE_HOME is not set.

commit bce024d4a4d7bd8ef3443dcf9dcd367afeaf1837
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-31T22:54:10Z

    Merge remote-tracking branch 'databricks/master' into style
    Disable if brace checking as it errors in single line functional cases 
unlike the style guide.
    
    Conflicts:
        src/main/scala/catalyst/execution/TestShark.scala

commit d20b565a36533245d0357b18332e8c8658821a2e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-31T23:10:04Z

    fix if style

commit 807b2d7ce15ef78f73acfe4950a8fd14b6784545
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T00:03:46Z

    check style and publish docs with travis

commit d3a3d48d6ad2aa3562b0859f2af13dd8d8b75fd7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T00:12:33Z

    add testing to travis

commit 271e483d65dc41a4feb6f9f4018379094c4ff0bf
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T00:28:47Z

    Update build status icon.
    
    [no ci]

commit 6015f932176c291556e13d0e08abd42ad8fdddab
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T00:38:19Z

    Merge pull request #29 from rxin/style
    
    Scala style checker & style fixes

commit fc67b5078c23c88b6387cf2b948d84a99cc87e08
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-01T00:46:18Z

    Check for a Sort operator with the global flag set instead of an Exchange 
operator with a RangePartitioning.

commit 235cbb436756cfeb915fe1864b66277c067b5abd
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-01T00:57:14Z

    Merge remote-tracking branch 'upstream/master' into exchangeOperator
    
    Conflicts:
        src/main/scala/catalyst/execution/aggregates.scala
        src/main/scala/catalyst/expressions/Evaluate.scala

commit 45b334b4d06d254c3b9a8f03b2e64f14b48a3c88
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-01T01:11:07Z

    fix comments

commit e079f2b32d3391bdfe835ca66dde7eaedf5df5c0
Author: Timothy Chen <tnac...@gmail.com>
Date:   2014-01-16T06:53:00Z

    Add GenericUDAF wrapper and HiveUDAFFunction

commit 8e0931f1ca55aff597132c6a27ed058866680db5
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-28T22:15:03Z

    Cast to avoid using deprecated hive API.

commit b1151a8a13b6a3cd1dfa53115b67610955112d66
Author: Timothy Chen <tnac...@gmail.com>
Date:   2014-01-29T17:58:26Z

    Fix load data regex

commit 5b7afd8f7b2f77f3e97b94228fee6f6b92c858be
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T19:57:06Z

    Merge pull request #10 from yhuai/exchangeOperator
    
    Exchange operator

commit 6eb59608a17ace6a39638a1fdf24241403642578
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:09:02Z

    Merge remote-tracking branch 'databricks/master' into udafs
    
    Conflicts:
        src/main/scala/catalyst/execution/aggregates.scala

commit 41b41f3c6ff0b06e6ac76a6a17c929c3bae8be8a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T09:39:11Z

    Only cast unresolved inserts.

commit 63003e90fb70e13d22ad7e260e29897286a7776b
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:37:58Z

    Fix spacing.

commit 2de89d0807307f0944d79fb525d18bc2464ebf49
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:38:18Z

    Merge pull request #13 from tnachen/master
    
    Add GenericUDAF wrapper and HiveUDAFFunction

commit cb775ac99241f26461a19646b9c6db660a6a2eeb
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-12T22:15:44Z

    get rid of SharkContext singleton

commit dfb67aa73ce15d9a9c355afaa1d690b3aad41843
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-13T01:47:55Z

    add test case

commit 19bfd74f9b7a3cc9dc7b7cc6477908abbd6826d9
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-01-22T07:08:31Z

    store hive output in circular buffer

commit 1590568ddbeee565bc483ccfe089b287433643a4
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T01:57:48Z

    add log4j.properties

commit b649c20a124ef2e7cd8c026ffb06be759d608cec
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T05:13:30Z

    fix test logging / caching.

commit 784536466cc3fe69ea230f0e63f7c4cd670fdadc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T05:13:40Z

    deactivate concurrent test.

commit ea6f37f740a5dfef3ca0c2f82e4c26ed3171851c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T05:13:53Z

    fix style.

commit 82163e3e3c21804898e576e3a224e3a644e75d27
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T06:26:58Z

    special case handling of partitionKeys when casting insert into tables

commit 9c22b4ebdda3955a88800dcf0dec0d14748394e7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:13:44Z

    Support for parsing nested types.

commit efa72170ebe27d84cb5ae2efeaed4054ceca1f9c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:19:31Z

    Support for reading structs in HiveTableScan.

commit d670e41dfaf93bc322079d5e93b938c2f868932c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:19:47Z

    Print nested fields like hive does.

commit dc6463acaccfbdf3bae41ca746b678cb3b70cf9a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:20:11Z

    Support for resolving access to nested fields using "." notation.

commit 67094413d86c0d03fbb717a99916b9c906552d67
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:20:26Z

    Evaluation for accessing nested fields.

commit da7ae9da830a5260478a5d9cd4959bb5f3565df2
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:21:11Z

    Add boolean writable that was breaking udf_regexp test.  Not sure how this 
was passing before...

commit 6420c7c23b1fcbae009ce97c5dd2dc9ece75f0a0
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-01T02:28:56Z

    Memoize the ordinal in the GetField expression.

commit 1579eecca917152c542a68149eddd636131dbb2f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T09:39:11Z

    Only cast unresolved inserts.

commit cf8d99257ad87063bca4bc3a2d5a09b54a2cf2b1
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:00:51Z

    Use built in functions for creating temp directory.

commit c654f19ef6fec54537a4e704234b63c65c7e0d1e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:01:51Z

    Support for list and maps in hive table scan.

commit c3feda75938565b85ff401aeb29bdcb44e7accdc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:02:06Z

    use toArray.

commit a9388fb7274fe40b9d10eb8d4a3c97c32d365187
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:02:29Z

    printing for map types.

commit bbec500c4fc9a12cbc18b607147aa751308f4288
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T20:02:52Z

    update test coverage, new golden

commit 35a70fbfd93b83856f86ea52bc1b3a850076960f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T21:28:05Z

    multi-letter field names.

commit 2c6deb37b104b5272d99917b6933a749da99d06e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-02T21:28:23Z

    improve printing compatibility.

commit 5b33216d197ad7c649e36f9f9a2a48143120aeae
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T00:21:23Z

    work on decimal support.

commit 5b3d2c80546848a9c6bf830c22ec5f029dca790f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T00:21:40Z

    implement distinct.

commit 3f9e519a16f9dc9f3eabda3ad91d80c088e3f384
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T00:30:14Z

    use names w/ boolean args

commit 3734a9416c1156030a7c2af9e43d9209ca17aa59
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T00:31:03Z

    only quote string types.

commit 5e54aa6dab3e3ed0f2e702abc038eee5f17fcb38
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T01:38:52Z

    quotes for struct field names.

commit e4def6b2c917ebf28b3a11fc1aad690c2fddd55f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T01:39:19Z

    set dataType for HiveGenericUdfs.

commit aa430e7ba7fd748619bd4b1959ca165ec2b13a5c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T06:58:51Z

    Update .travis.yml

commit 7661b6ce6b8cb1cfc816e87d0644cfc063dce921
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T07:21:24Z

    blacklist machines specific tests

commit 72a003dd3dce58331205465fb43bbb9a412156c4
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T07:41:45Z

    revert regex change

commit 9c0677866e24293525602a8e76860b4785950c39
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-03T08:11:21Z

    fix serialization issues, add JavaStringObjectInspector.

commit 92e415878439ceb94e3d41de75bc26acfe92a24d
Author: Reynold Xin <r...@apache.org>
Date:   2014-02-03T18:30:55Z

    Merge pull request #32 from marmbrus/tooManyProjects
    
    Fix a bug in PreInsertionCasts rule.

commit 692a4779af0a269ae1f16006ab129c00af2a6c5c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:36:48Z

    Support for wrapping arrays to be written into hive tables.

commit ac9d7de4f973d4809d435d098def4de12c1c0dbc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:37:06Z

    Resolve *s in Transform clauses.

commit 7a0f543431b196f78da2f473fd2f0d3e3764d0c3
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:37:21Z

    Avoid propagating types from unresolved nodes.

commit 010accb872f179b97b6cc6e971a7e9f17ec2de73
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:37:39Z

    add tinyint to metastore type parser.

commit e7933e912356e686ce36cc8a52dc813a7cc8c430
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:38:13Z

    fix casting bug when working with fractional expressions.

commit 25288d055a0bcf251e64c8653442f1ee5b466e70
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:38:38Z

    Implement [] for arrays and maps.

commit ab9a131818884dd2258174956fdca65bd14dfd42
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:38:58Z

    when UDFs fail they should return null.

commit 1679554ae68dfc91212ebaf8401efaf6088d61a9
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:39:12Z

    add toString for if and IS NOT NULL.

commit ab5bff387f2ced791527b4c20b2c30dc7da6c190
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T02:39:28Z

    Support for get item of map types.

commit 42ec4af79020a5952bf59a5e44d6852eef5d4b41
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T03:07:17Z

    improve complex type support in hive udfs/udafs.

commit 44d343ca60aa1fbcd78217a39ea86a74098e0ef3
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T03:09:38Z

    Merge remote-tracking branch 'databricks/master' into complex
    
    Conflicts:
        src/main/scala/catalyst/analysis/Analyzer.scala

commit e3c10bd5649658995c3a347ebe1ab434fad50cdc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T08:57:55Z

    update whitelist.

commit 389525dedbc7c6c83d6686a7661c98354f60425e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T18:44:35Z

    update golden, blacklist mr.

commit 2f276049070ccd873368441e652c0d6a2d3e2551
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T19:23:12Z

    Address comments / style errors.

commit cb57459ce009bdf8e58e7eaf1c301279b5a07ce7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-04T19:24:33Z

    blacklist machine specific test.

commit 67128b8bf07a5deaacd1a9214c1fa58d0bfcba85
Author: Reynold Xin <r...@apache.org>
Date:   2014-02-04T21:16:20Z

    Merge pull request #30 from marmbrus/complex
    
    Initial support for reading / accessing / printing nested fields.

commit b4be6a5411cd3d25919bc71563da44638660ecb6
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T00:53:46Z

    better logging when applying rules.

commit ccdb07a18c62c7c955400e3253d81adbd6e8f42e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T00:54:23Z

    Fix bug where averages of strings are turned into sums of strings.  Remove 
a blank line.

commit d8cb805193f7d8ffe96efc423bb86f781ea3ef41
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T01:50:48Z

    Implement partial aggregation.

commit f94345cb0ed64b8566da623e765a04cac6739733
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T02:44:38Z

    fix doc link

commit e1999f927a41eae4a9affe2728296a1a9ee06cb8
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-05T04:38:11Z

    Use Deserializer and Serializer instead of AbstractSerDe.

commit 32b615b52e7c202b29e1242952092d09f3332745
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T09:36:12Z

    add override to asPartial.

commit 883006dd16cbd1ddb61f164ad28a8237f4c6becc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T09:46:39Z

    improve tests.

commit cab1a84b4811064fe217b0cd56d3fe9c48210b6a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T10:01:08Z

    Fix PartialAggregate inheritance.

commit dc6353be64bfe9c6522403a5a4124423cd62e22b
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T10:03:58Z

    turn off deprecation

commit 8017afb101b214635dcd1b372afcd21379c340f5
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-05T18:40:23Z

    fix copy paste error.

commit 5479066a011a8dff6da8c68c8452cdeffb4cc3e8
Author: Reynold Xin <r...@apache.org>
Date:   2014-02-05T19:22:52Z

    Merge pull request #36 from marmbrus/partialAgg
    
    Implement partial aggregation.

commit 5e4d9b453658dece7afa987ab9b07bf2c12b4999
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T00:16:19Z

    Merge pull request #35 from marmbrus/smallFixes
    
    A few small bug fixes and improvements.

commit 02ff8e4462793d8f37365f44cb2f269f619d72da
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-07T13:41:42Z

    Correctly parse the db name and table name in a CTAS query.

commit 8841eb888d16edbb1bd34175ee13b664468e78b7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:01:51Z

    Rename Transform -> ScriptTransformation.

commit acb956646de2a05475ff5086b5967e0e657f8aa0
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:03:56Z

    Correctly type attributes of CTAS.

commit 016b48990ef37b32d1bd4b1d4790afbe15e7db57
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:04:17Z

    fix typo.

commit bea4b7f1c3b091386bb8cacad8f8c2e154c579b7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:04:40Z

    Add SumDistinct.

commit ea76cf9bf5e07dfa5435fa99ae1e0623a7c89262
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:05:13Z

    Add NoRelation to planner.

commit dd00b7e8df7356be40379ec560f2f476f74e1a8e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:11:33Z

    initial implementation of generators.

commit ba8897fd60a6555d2a52ea5fb3d8c32981ed2296
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:12:16Z

    Merge remote-tracking branch 'yin/parseDBNameInCTAS' into lateralView

commit 0ce61b0f3d110567693bb340df6f5bdd6ee41a2c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:44:19Z

    Docs for GenericHiveUdtf.

commit 740febb71c94e40f436cb3ea5ebc81b0cda4db26
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T22:44:33Z

    Tests for tgfs.

commit db92adc5ff5a0712d5104aad00cad67b520070b4
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T23:58:28Z

    more tests passing. clean up logging.

commit ff5ea3f209eed028365a2b680dd7093340e355c8
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-07T23:59:41Z

    new golden

commit 5cc367cdb9946b092c53ff1473ac3f784c0112d3
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-08T01:34:34Z

    use berkeley instead of cloudbees

commit b376d15652bd0372d1713429468d874614a9dd7a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-08T01:42:32Z

    fix newlines at EOF

commit 7123225ae5e96dc7be38b13c2f2bcc86a19249ad
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-08T01:44:01Z

    Correctly parse the db name and table name in INSERT queries.

commit 2897deb146c498bfc7ebcb80e3835ecb9899cfeb
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-08T02:31:20Z

    fix scaladoc

commit 0e6c1d712f95ce0268dc71b28a64c2bd29c81b27
Author: Reynold Xin <r...@apache.org>
Date:   2014-02-08T06:40:54Z

    Merge pull request #38 from yhuai/parseDBNameInCTAS
    
    Correctly parse the db name and table name of a table

commit 341116cb450ff72af793a5bd84d73ca2203200cb
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-08T20:09:59Z

    address comments.

commit 7785ee62e47c93390213ff3f1a8a67a293d878a6
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-10T23:14:49Z

    Tighten visibility based on comments.

commit 964368f3b21c79ec86eb7c0389c43768fb4c1b01
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-11T00:04:01Z

    Merge pull request #39 from marmbrus/lateralView
    
    Add support for lateral views, TGFs and Hive UDTFs

commit dce0593034a30b802d9be2cf98590e9955df1b47
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-11T00:04:56Z

    move golden answer to the source code directory.

commit 9329820a9a85697a9bfad11b6f7266c07eb59235
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-11T00:28:23Z

    add golden answer files to repository

commit a7ad05855a376af7c7cdb89bb114cccba9e6b9b1
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-11T02:02:05Z

    Merge pull request #40 from marmbrus/includeGoldens
    
    Include golden hive answers in the source repository

commit 2407a21180d261138454d23926786dcc20e88d1e
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-12T00:29:11Z

    Added optimized logical plan to debugging output

commit cf691df0b020840be8bfaf0e29a7db4ef049b6f6
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-12T00:30:14Z

    Added the PhysicalOperation to generalize ColumnPrunings

commit f235914e3572919f5cb056b8a6794eb0623f5617
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-12T09:14:22Z

    Test case udf_regex and udf_like need BooleanWritable registered

commit f0c3742583d9a99bfc0f36c4fe9e2a497412c580
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-12T09:23:07Z

    Refactored PhysicalOperation
    
    The old version is implemented in a top down tail recursive manner, which 
cannot cover an uncommon corner case like:
    
        Filter (with aliases)
         Project ...
          MetastoreRelation
    
    In this case, the aliases are not in-lined/substituted because no aliases 
are collected yet.  It is now covered by the new version which is implemented 
in a bottom up recursive manner and collects all necessary aliases before 
in-lining/substitution.

commit 5720d2bd2cd08c2ecbff32391ed88080cecd7359
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-12T09:39:09Z

    Fixed comment typo

commit bc9a12ce63f14f34aa9d74086f3485a6d338cf66
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-13T23:18:26Z

    Move hive test files.

commit 7588a57feb1870c718be645e428d1f2371b9e722
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-13T23:19:28Z

    Break into 3 major components and move everything into the 
org.apache.spark.sql package.

commit 1f7d00aab0b9bd56dd4e4b71c9979f9e4e559d8b
Author: Reynold Xin <r...@apache.org>
Date:   2014-02-14T06:29:29Z

    Merge pull request #41 from marmbrus/splitComponents
    
    Break catalyst into 3 major components and move everything into 
org.apache.spark.sql

commit 887f928aac6f649ed5f97c644dafd715a9b450a4
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-14T10:38:57Z

    Merge remote-tracking branch 'upstream/master' into SerDeNew

commit 678341a50b793b09658b823fa1bdc61a9293d770
Author: Mark Hamstra <markhams...@gmail.com>
Date:   2014-02-14T18:21:24Z

    Replaced non-ascii text

commit 5ae010ff20ed811962e6f13920d1ef43bfc2a14b
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-14T19:14:33Z

    Merge pull request #42 from markhamstra/non-ascii
    
    Replaced non-ascii text

commit 1f6260d77223aaf23c2bbb112b52803bea061e42
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-14T20:45:29Z

    Fixed package name and test suite name in Makefile

commit b6de691f13d66dadc7b72c9eb19acccaf75b8ee9
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-14T22:15:35Z

    Merge pull request #43 from liancheng/fixMakefile
    
    Fixed package name and test suite name in Makefile

commit 7f206b5aa577bc4ca8aeb82d2438ad43316eb996
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-14T22:34:23Z

    Add support for hive TABLESAMPLE PERCENT.

commit ed3a1d15b80768817e9259e31499df53587c51b2
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-14T23:45:32Z

    Load data directly into Hive.

commit 59e37a31efba400649685c4cedf648d1b0c86d0b
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-14T23:56:06Z

    Merge remote-tracking branch 'upstream/master' into SerDeNew
    
    Conflicts:
        build.sbt
        
shark/src/main/scala/org/apache/spark/sql/shark/HiveMetastoreCatalog.scala

commit 346f828dc37df3a1681e6ebf2a5940a609ead50a
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-15T00:38:52Z

    Move SharkHadoopWriter to the correct location.

commit a9c318853d4bb02965252810656999be060682dd
Author: Timothy Chen <tnac...@gmail.com>
Date:   2014-02-15T01:06:00Z

    Fix udaf struct return

commit 69adf7298edb74a9ecd704932276d988d1c8ba5d
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-15T01:22:13Z

    Set cloneRecords to false.

commit 566fd6685fec88b88223f4b47af04eb39a69d28e
Author: Timothy Chen <tnac...@apache.org>
Date:   2014-02-15T02:09:30Z

    Whitelist tests and add support for Binary type

commit 9ad474d877ae1a6dcc6a7769c2effed4c3a15029
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-15T02:56:30Z

    Merge pull request #44 from marmbrus/sampling
    
    Add support for hive TABLESAMPLE PERCENT.

commit 3cb4f2e16662c54806474d0de2fbd9021133ae08
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-15T02:57:29Z

    Merge pull request #45 from tnachen/master
    
    Fix udaf struct return

commit 8506c176f7e18011df50e25f8ea98d30a57f0ccd
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-15T03:20:41Z

    Address review feedback.

commit 3bb272ddc69472120bb0915308451576565cecf6
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-15T03:26:42Z

    move org.apache.spark.sql package.scala to the correct location.

commit 1596e1b14e8e2741758c6370bb29d32830476a7f
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-15T04:09:25Z

    Cleanup imports to make IntelliJ happy.

commit 5495faba864ee7ef1f8649bca02eacb7479a3b2a
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-15T10:01:02Z

    Remove cloneRecords which is no longer needed.

commit bdab5edd65140cd18c2dc29b00fa914d624dd999
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-15T10:03:28Z

    Add a TODO for loading data into partitioned tables.

commit 35c9a8a11fed8ae8f7aa8d345b4bc0c53f413ab8
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-15T20:57:39Z

    Merge pull request #46 from marmbrus/reviewFeedback
    
    Address review feedback from previous PR.

commit 563bb22bd30b021e2bc276e2ed454f5296877a63
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T00:26:05Z

    Set compression info in FileSinkDesc.

commit e08962779a195b991c2478647c65923f4ddd23b4
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T00:26:23Z

    Code style.

commit 45ffb86df7c877c78de0470fbb66fae6be3bcf23
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T00:28:11Z

    Merge remote-tracking branch 'upstream/master' into SerDeNew

commit eea75c522fbf9ead1ef4280e3420d3a6685b7a0c
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T11:24:15Z

    Correctly set codec.

commit 428aff5f15a1954a983f049ade8986816d87e73c
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T12:39:24Z

    Distinguish `INSERT INTO` and `INSERT OVERWRITE`.

commit a40d6d628384c172c1d1d7a4bd4011c3cb8f2b6b
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T14:09:23Z

    Loading the static partition specified in a INSERT INTO/OVERWRITE query.

commit 334aacee2432fbc6c51644df08f4899d340a2ef4
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-16T14:11:45Z

    New golden files.

commit d00260be188368ce943f2ffe7d087a7eff2f5f41
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-17T00:26:19Z

    Strips backticks from partition keys.

commit 555fb1d1e965d19c6e7dc28027361868b3492c0f
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-17T06:51:16Z

    Correctly set the extension for a text file.

commit feb022c1e77aac1f6b224cfc56bfd851762a0ca6
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-17T06:51:55Z

    Partitioning key should be case insensitive.

commit a1a47760b718bfecc7e4b1adacb3a179f936825c
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-17T10:46:13Z

    Update comments.

commit 017872cef3d771acab5fb3efc570dc1798e44f6d
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-17T10:46:31Z

    Remove stats20 from whitelist.

commit 128a9f8b8082b3ed0659dfe6c41dbd7cbf04ff71
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-18T04:58:08Z

    Minor changes.

commit f670c8c7adf6a3bc5c1e20850070b15e041f9285
Author: Yin Huai <huaiyin....@gmail.com>
Date:   2014-02-18T09:35:01Z

    Throw a NotImplementedError for not supported clauses in a CTAS query.

commit c5a4fabbe9a67c0bc3063314f7c5efd001aba52d
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-16T13:39:24Z

    Merge branch 'master' into columnPruning
    
    Conflicts:
        
shark/src/test/scala/org/apache/spark/sql/shark/execution/HiveQuerySuite.scala
        
shark/src/test/scala/org/apache/spark/sql/shark/execution/PartitionPruningSuite.scala
        src/main/scala/catalyst/execution/FunctionRegistry.scala
        src/main/scala/catalyst/execution/SharkInstance.scala
        src/main/scala/catalyst/execution/planningStrategies.scala

commit 2682f72adde85870de6b7bc20e0df0622340cdb0
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-18T12:14:06Z

    Merge remote-tracking branch 'origin/master' into columnPruning

commit 54f165b5f8814b9a9572f315b17505ef896b723a
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-18T12:19:26Z

    Fixed spelling typo in two golden answer file names

commit cf4db596d1ef8edcaa4f5e42648ddc57e4dc38e6
Author: Lian, Cheng <rhythm.m...@gmail.com>
Date:   2014-02-18T16:32:20Z

    Added golden answers for PruningSuite

commit f22df3aa73b75babca50ee0884bd064497bfe836
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-18T19:05:19Z

    Merge pull request #37 from yhuai/SerDe
    
    Support ORCSerDe

commit 9990ec7dcce26174f326172f1d662cc758d4e130
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-18T19:07:34Z

    Merge pull request #28 from liancheng/columnPruning
    
    Column pruning optimization together with some minor refactoring

commit 29effadbc188c5e6604a9e3a7460d9abde2c2fce
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:30:20Z

    Include alias in attributes that are produced by overridden tables.

commit c9116a6aa873e88c6b72d6ddc5d935af7c083f15
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:31:16Z

    Add combiner to avoid NPE when spark performs external aggregation.

commit 8c01c2475ef87d589263ba215f26530346b9868d
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:31:42Z

    Move definition of Row out of execution to top level sql package.

commit 4905b2b0b5f5cc8c123b41ccbb2daec117f73fad
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:33:17Z

    Add more efficient TopK that avoids global sort for logical Sort => 
StopAfter.

commit 532dd3748c262cdeea2f9f7977ba3a875e8b73fe
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:34:06Z

    Allow the local warehouse path to be specified.

commit a4308954350a578dae8d8d4d49ac7ec52c2d0fe7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:34:35Z

    Planning for logical Repartition operators.

commit 5fe7de411c437d958d414d5530c56aceb6f6bfc3
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:36:09Z

    Move table creation out of rule into a separate function.

commit b9225114460f9d628738b690fc0b33ba81a3c019
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:37:06Z

    Fix insertion of nested types into hive tables.

commit 18a861b108eb20afa1a87ee04324de829478b4d2
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:38:06Z

    Correctly convert nested products into nested rows when turning scala data 
into catalyst data.

commit df88f01e1d449433e2f149dbaea90a9611848ff9
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:38:44Z

    add a simple test for aggregation

commit 6e04e5b944113bc2c0cb528dcac1ccf3276109e2
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T21:39:14Z

    Add insertIntoTable to the DSL.

commit 24eaa79764253a2771c980728037e17bbef17b50
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T22:22:06Z

    fix > 100 chars

commit d393d2abebc03408fc43dbd835105134fa256463
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T22:41:37Z

    Review Comments: Add comment to map that adds a sub query.

commit 2225431005040fd6bb0b71f125057b40ef8c0493
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T23:18:21Z

    Merge pull request #48 from marmbrus/minorFixes
    
    Several minor fixes for bugs found during benchmarking.

commit 3ac941623b9b9cc860de890a781578b21b3accae
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-25T00:24:39Z

    Merge support for working with schema-ed RDDs using catalyst in as a spark 
subproject.

commit f5e7492c267758c80b7ad3e4c74b3b20b34ec9e0
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-25T22:44:02Z

    Add Apache license.  Make naming more consistent.

commit 5f2963c053f39ef4298598be918a4758c1c32a13
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-27T23:20:05Z

    naming and continuous compilation fixes.

commit 4d57d0e7b0e929d14c9d4218d5b63a03e176d04d
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-27T23:37:26Z

    Fix test execution on travis.

commit 7413ac22622a991eac5fba33cbaeee2008f324f0
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-28T00:04:41Z

    make test downloading quieter.

commit 608a29ea363e4093e605b2ecdcf3d55f4109e30d
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-28T02:22:58Z

    Add hive as a repl dependency

commit c3343868f8cc8b1054513fe6619c9bb193e8816a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-24T22:29:16Z

    Initial support for generating schema's based on case classes.

commit b33e47ede48e9803fe213ec71d9a3ccea804b69a
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-16T14:09:02Z

    First commit of Parquet import of primitive column types

commit 99a920916fa7f03669d86a9b9cf7482fedcaf318
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-16T17:54:44Z

    Expanding ParquetQueryTests to cover all primitive types

commit eb0e521572c500e79de2dc5c3aa188b222490681
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-17T13:28:37Z

    Fixing package names and other problems that came up after the rebase

commit 6ad05b34ecf9d457fd95c8e7f8f74ed979048cb9
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-19T11:06:53Z

    Moving ParquetRelation to spark.sql core

commit a11e36428f3ea166825cbeb39ea23e86046dd26a
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-19T14:12:30Z

    Adding Parquet RowWriteSupport

commit 0f17d7b6fcea76b991da1790cf39b97d5543eee1
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-19T14:26:55Z

    Rewriting ParquetRelation tests with RowWriteSupport

commit 6a6bf9844e1c25e3f3360cc4c479f5db66e2bea7
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-19T16:31:40Z

    Added column projections to ParquetTableScan

commit f347273cb9d8f6e6c43eb3ef5e54507025ecc1cd
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-20T17:01:37Z

    Adding ParquetMetaData extraction, fixing schema projection

commit 75262eec5e21400011359dbf3f2825cbd7be461d
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-24T09:27:25Z

    Integrating operations on Parquet files into SharkStrategies

commit 18fdc441ab3fc17535512f86cb77651d91596bdd
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-26T10:12:15Z

    Reworking Parquet metadata in relation and adding CREATE TABLE AS for 
Parquet tables

commit 3a0a552a5950f99f80bc178818103e393cfa775c
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-26T12:55:31Z

    Reorganizing Parquet table operations

commit 332119573ba934e7fd8cb1f7adcd0d3bd791a1c2
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-02-27T07:41:21Z

    Fixing one import in ParquetQueryTests.scala

commit 61e3bfbbb2fe4894fa5c2d7c27f1da6cec903819
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-02T11:45:59Z

    Adding WriteToFile operator and rewriting ParquetQuerySuite

commit c863bed3d17abf9cd3da7cee8637d77b088a192d
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-02T14:28:23Z

    Codestyle checks

commit 3ac9eb05d0cec3cca166503cb4dc417168694012
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-02T18:23:06Z

    Rebasing to new main branch

commit 3bda72db9384b0f67cfbfbe22eb2674be113ceda
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-02T20:59:23Z

    Adding license banner to new files

commit d7fbc3a591110dae76121c1095a32ab4788ae005
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-27T02:00:12Z

    Several performance enhancements and simplifications of the expression 
evaluation framework.
    
    * Removed the Evaluate singleton in favor of placing expression evaluation 
code in each expression.
    * Instead of passing in a Seq of input rows we now take a single row.  A 
mutable JoinedRow wrapper can be used in the relatively rare cases where 
expressions need to be evaluated on multiple input rows.
    * GenericRow now takes a raw Array[Any] instead of a Seq.  Since GenericRow 
itself is a Seq wrapper, this avoids the creation of an unnecessary object.
    * A new concept called MutableLiteral can be used to evaluate aggregate 
expressions in-place, instead of needing to build new literal trees for each 
update.  This part is more of a WIP as we still incur boxing, however this is a 
strict improvement over what was there before.

commit 296fe5036105b7e519501f58e0fb0204023c23f2
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-27T20:30:56Z

    Address review feedback.

commit 6fdefe65478d950d3f30f6591df361558886d187
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-03T20:33:45Z

    Port sbt improvements from master.

commit da9afbda89776602acb5dfa10d1c0a654f9d77dd
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-03T20:43:32Z

    Add byte wrappers for hive UDFS.

commit 7b9d14263a4cbf5d39216c86a41b546c607b4a20
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-03T19:41:35Z

    Update travis to increase permgen size.

commit 99e61fbfa386dc11f4b0df2134d8b714c57ad3ba
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-03T21:36:20Z

    Merge pull request #51 from marmbrus/expressionEval
    
    Several performance enhancements and simplifications of the expression 
evaluation framework.

commit 8d5da5ed977b1c867b5b78f05523d89d5552b387
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-02-27T03:02:17Z

    modify compute-classpath.sh to include datanucleus jars explicitly

commit 6d315bb168443eba98d978ae65c386ff27629bfc
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-03-05T03:48:37Z

    Added Row.unapplySeq to extract fields from a Row object.

commit 70e489d277470b5ed84d856af96b1167a0f892b6
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-03-05T04:13:19Z

    Fixed a spelling typo

commit 1ce01c7ad99d6c5d666c8b601c8f3527ab0ebe9f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-05T08:59:26Z

    Merge pull request #56 from liancheng/unapplySeqForRow
    
    Added Row.unapplySeq to extract fields from a Row object.

commit 0040ae6d53e4298402b1ddcbcbcea6bc2b78e7d7
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-05T09:11:54Z

    Feedback from code review

commit 9d419a632ace9064519b83f28d851dbd2707e99c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-05T19:23:51Z

    Merge remote-tracking branch 'catalyst/catalystIntegration' into 
parquet_support

commit 7d0f13e9c8a2c336a2089affaad594943573577d
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-05T19:28:03Z

    Update parquet support with master.

commit 3c3f9624a4c3041a0d8b68bc4e218ea6e0eef769
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-05T20:17:34Z

    Fix a bug due to array reuse.  This will need to be revisited after we 
merge the mutable row PR.

commit c9f8fb3fbb6b45ede70c7b2e285668fdf1e48582
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-06T01:11:30Z

    Merge pull request #53 from AndreSchumacher/parquet_support
    
    Parquet support

commit d37139320dd35c91c22903a919aa177ae68e4cf7
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-05T02:54:21Z

    Add a framework for dealing with mutable rows to reduce the number of 
object allocations that occur in the critical path.

commit 959bdf0bb5362d6387e1748dd16b62f6abfe4801
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-06T02:05:25Z

    Don't silently swallow all KryoExceptions, only the one that indicates the 
end of a stream.

commit 9049cf0d432662cb40c7e31688049d9a1db6e732
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-06T02:06:53Z

    Extend MutablePair interface to support easy syntax for in-place updates.  
Also add a constructor so that it can be serialized out-of-the-box.

commit d9943336fda9c31fda202ed13e5c06b074214539
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-06T02:08:15Z

    Remove copies before shuffle, this required changing the default shuffle 
serialization.

commit ba28849fa9ec163dc39889cd7f3d683f28692b33
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-06T02:23:05Z

    code review comments.

commit c2a658d1d18ee821d83b89de43992f444a0d5dbb
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-06T18:07:38Z

    Merge pull request #55 from marmbrus/mutableRows
    
    Add a framework for dealing with mutable rows.

commit 54637ecce8ea9a9af3b41ce4a7a719249bcff2f2
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-09T19:11:58Z

    First part of second round of code review feedback

commit 5bacdc0e5c18bc6a4aee6bc2da8ac8d2a29751a0
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-09T20:35:39Z

    Moving towards mutable rows inside ParquetRowSupport

commit 7ca4b4e34d466fd64243b80300fab28af09936e9
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-11T17:56:40Z

    Improving checks in Parquet tests

commit aeaef544dda49dae87385f8bdd31e2a61719dfd2
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-11T18:33:00Z

    Removing unnecessary Row copying and reverting some changes to MutableRow

commit 7386a9f386298d8428055cfae5784f78cac44ada
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-11T18:34:45Z

    Initial example programs using spark sql.

commit f0ba39efd308339293b8cd4e397731f4b959ff65
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-11T18:54:52Z

    Merge remote-tracking branch 'origin/master' into maven
    
    Conflicts:
        project/SparkBuild.scala
        sbt/sbt-launch-lib.bash

commit 7233a7452fc36d3a9d7e7afcd560e9aad73bbf6c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-11T22:19:08Z

    initial support for maven builds

commit 3447c3edb7a83163a5668c68a246bc04216a0e71
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-13T19:15:50Z

    Don't override the metastore / warehouse in non-local/test hive context.

commit 3386e4fd6715c133c5fb04e7b5b3d59af4b2ae53
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-13T19:32:06Z

    Merge pull request #58 from AndreSchumacher/parquet_fixes
    
    Parquet fixes

commit 1a4bbd9f2b471e67d99cfa3e9a62406ed1b29723
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-13T20:51:55Z

    Merge pull request #60 from marmbrus/maven
    
    Basic support for maven, update spark.

commit f93aa39fdd3cabc3377c92bc650a6f23469c3291
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-03-14T16:25:21Z

    Better handling of path names in ParquetRelation
    
    Previously incomplete path names (with missing URI field) were passed
    to Parquet. Also two rules were moved from HiveStrategies to
    SparkStrategies.

commit 5d710747a2f334755bf8a72ff841e42d9344299b
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T16:59:46Z

    Merge pull request #62 from AndreSchumacher/parquet_file_fixes
    
    Better handling of path names in ParquetRelation

commit 8b35e0ac28080a4470d7e7eb6d0d3145de12d4e2
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-13T20:53:54Z

    address feedback, work on DSL

commit d2d9678a63ffa61d5a2abd37bb667371ce8641ba
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T02:08:27Z

    Make sure hive isn't in the assembly jar.  Create a separate, optional Hive 
assembly that is used when present.

commit 9eb029405a8ba39fe7b40736702ce1443b9b149c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T02:50:43Z

    Bring expressions implicits into SqlContext.

commit f7d992db7ba126455069f48ce3fef2f95544095d
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T05:48:59Z

    Naming / spelling.

commit ce8073b32d5a8713c5ad494baa1026c103e2882d
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T06:25:59Z

    clean up implicits.

commit 2f224546a0c3e0713de359727e92d727bd41091e
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T06:26:15Z

    WIP: Parquet example.

commit c01470fa14e75fbbea72b0c244515d1f2cdb26cb
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T17:07:50Z

    Clean up example

commit 013f62a2eb59e76510d06d6e8b2ab6a882bdb598
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T17:31:34Z

    Fix documentation / code style.

commit c2efad69d2013c4a8557874b9b1260ea7ae8dafc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T20:14:01Z

    First draft of SQL documentation.

commit e5e1d6bc80ce4faf4965b140c931ec1c277874bd
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T20:14:24Z

    Remove travis configuration.

commit 1d0eb63b2a0f0cee2924287c583e1c62a9a83784
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T20:28:40Z

    update changes with spark core

commit 6978dd8ed0b242103bb4af4c6c7c031d960b1285
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T21:03:34Z

    update docs, add apache license

commit 9dffbfa855128e31b3bed95fa9deec8fea85710a
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-14T21:51:25Z

    Style fixes. Add downloading of test cases to jenkins.

commit adcf1a46fe02dbc3b32c8997ebf50af0e5ff1555
Author: Henry Cook <henry.m.cook+git...@gmail.com>
Date:   2014-03-14T23:14:10Z

    Update sql-programming-guide.md
    
    Minor typos

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

Reply via email to