GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/146
SPARK-1251 Support for optimizing and executing structured queries This pull request adds support to Spark for working with structured data using a simple SQL dialect, HiveQL and a Scala Query DSL. *This is being contributed as a new __alpha component__ to Spark and does not modify Spark core or other components.* The code is broken into three primary components: - Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions. - Execution (sql/core) - A query planner / execution engine for translating Catalystâs logical query plans into Spark RDDs. This component also includes a new public interface, SqlContext, that allows users to execute SQL or structured scala queries against existing RDDs and Parquet files. - Hive Metastore Support (sql/hive) - An extension of SqlContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs. A more complete design of this new component can be found in [the associated JIRA](https://spark-project.atlassian.net/browse/SPARK-1251). [An updated version of the Spark documentation, including API Docs for all three sub-components,](http://www.cs.berkeley.edu/~marmbrus/sparkdocs/_site/sql-programming-guide.html) is also available for review. With this PR comes support for inferring the schema of existing RDDs that contain case classes. Using this information, developers can now express structured queries that are automatically compiled into RDD operations. ```scala // Define the schema using a case class. case class Person(name: String, age: String) val people: RDD[Person] = sc.textFile("people.txt").map(_.split(",")).map(p => Person(p(0), p(1).toInt)) // The following is the same as 'SELECT name FROM people WHERE age >= 10 && age <= 19' val teenagers = people.where('age >= 10).where('age <= 19).select('name).toRdd ``` RDDs can also be registered as Tables, allowing SQL queries to be written over them. ```scala people.registerAsTable("people") val teenagers = sql("SELECT name FROM people WHERE age >= 10 && age <= 19") ``` The results of queries are themselves RDDs and support standard RDD operations: ```scala teenagers.map(t => "Name: " + t(0)).collect().foreach(println) ``` Finally, with the optional Hive support, users can read and write data located in existing Apache Hive deployments using HiveQL. ```scala sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") sql("LOAD DATA LOCAL INPATH 'src/main/resources/kv1.txt' INTO TABLE src") // Queries are expressed in HiveQL sql("SELECT key, value FROM src").collect().foreach(println) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark catalyst Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #146 ---- commit 5dab0bc9e94b7b81d03e8b2bc22a72897a907d37 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-28T23:31:07Z Merge pull request #26 from liancheng/serdeAndPartitionPruning Hive SerDe support and partition pruning optimization commit 677eb073f635815a2aa22a49ed466b84c785d6ed Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-29T00:14:18Z Update test whitelist. commit d4f539a9a7c0210b609e68a0fa49b1d2922b1205 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-29T04:15:38Z blacklist mr and user specific tests. commit 4c89d6ea16c4de05d45a8336ef3808d96cc3abe4 Author: Reynold Xin <r...@apache.org> Date: 2014-01-29T04:43:31Z Merge pull request #27 from marmbrus/moreTests Update test whitelist. commit ebb56faaec54c970fa49e8c575facfd6658e37ea Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-29T06:27:35Z add travis config commit 8ee41be08034e1a66ec13a0ed66a1b59a3ad0aaa Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-01-30T14:38:45Z Minor refactoring commit 2486fb71dc89f915c4f54a95e42211c79fc99e4c Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-01-30T14:39:00Z Fixed spelling commit 61e729cc21afcafe64af1befee2efb54271bf6d8 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-01-30T14:39:37Z Added ColumnPrunings strategy and test cases commit 605255eb979416edc19c005f0bc7b8d5f13dd44b Author: Reynold Xin <r...@apache.org> Date: 2014-01-30T22:55:06Z Added scalastyle checker. commit 08e4d0589056f3ae6e117689596420bbf7fbbbc2 Author: Reynold Xin <r...@apache.org> Date: 2014-01-30T23:59:55Z First round of style cleanup. commit 7213a2c466d7e30cabb2a2fd07bc81a8d7e36cfe Author: Reynold Xin <r...@apache.org> Date: 2014-01-31T00:14:32Z style fix for Hive.scala. commit 5c1e60043c4b60529936f93a1536d021f28a2460 Author: Reynold Xin <r...@apache.org> Date: 2014-01-31T00:18:55Z Added hash code implementation for AttributeReference commit 7e24436da3de67e3b33c310d0c761b2c8e3d11bd Author: Reynold Xin <r...@apache.org> Date: 2014-01-31T00:34:59Z Removed dependency on JDK 7 (nio.file). commit 41bbee67d888f8773a1b02ecc5abd957cda033ee Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-01-31T05:31:15Z Merge remote-tracking branch 'upstream/master' into exchangeOperator Conflicts: build.sbt src/main/scala/catalyst/execution/SharkInstance.scala commit f47c2f6f3572cb15da916c0efab7839e485ec905 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-01-31T06:32:00Z set outputPartitioning in BroadcastNestedLoopJoin commit d91e276fb303a878bb54ba156a3087c204f0e167 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-31T21:40:59Z Remove dependence on HIVE_HOME for running tests. This was done by moving all the hive query test (from branch-0.12) and data files into src/test/hive. These are used by default when HIVE_HOME is not set. commit bce024d4a4d7bd8ef3443dcf9dcd367afeaf1837 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-31T22:54:10Z Merge remote-tracking branch 'databricks/master' into style Disable if brace checking as it errors in single line functional cases unlike the style guide. Conflicts: src/main/scala/catalyst/execution/TestShark.scala commit d20b565a36533245d0357b18332e8c8658821a2e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-31T23:10:04Z fix if style commit 807b2d7ce15ef78f73acfe4950a8fd14b6784545 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T00:03:46Z check style and publish docs with travis commit d3a3d48d6ad2aa3562b0859f2af13dd8d8b75fd7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T00:12:33Z add testing to travis commit 271e483d65dc41a4feb6f9f4018379094c4ff0bf Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T00:28:47Z Update build status icon. [no ci] commit 6015f932176c291556e13d0e08abd42ad8fdddab Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T00:38:19Z Merge pull request #29 from rxin/style Scala style checker & style fixes commit fc67b5078c23c88b6387cf2b948d84a99cc87e08 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-01T00:46:18Z Check for a Sort operator with the global flag set instead of an Exchange operator with a RangePartitioning. commit 235cbb436756cfeb915fe1864b66277c067b5abd Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-01T00:57:14Z Merge remote-tracking branch 'upstream/master' into exchangeOperator Conflicts: src/main/scala/catalyst/execution/aggregates.scala src/main/scala/catalyst/expressions/Evaluate.scala commit 45b334b4d06d254c3b9a8f03b2e64f14b48a3c88 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-01T01:11:07Z fix comments commit e079f2b32d3391bdfe835ca66dde7eaedf5df5c0 Author: Timothy Chen <tnac...@gmail.com> Date: 2014-01-16T06:53:00Z Add GenericUDAF wrapper and HiveUDAFFunction commit 8e0931f1ca55aff597132c6a27ed058866680db5 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-28T22:15:03Z Cast to avoid using deprecated hive API. commit b1151a8a13b6a3cd1dfa53115b67610955112d66 Author: Timothy Chen <tnac...@gmail.com> Date: 2014-01-29T17:58:26Z Fix load data regex commit 5b7afd8f7b2f77f3e97b94228fee6f6b92c858be Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T19:57:06Z Merge pull request #10 from yhuai/exchangeOperator Exchange operator commit 6eb59608a17ace6a39638a1fdf24241403642578 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:09:02Z Merge remote-tracking branch 'databricks/master' into udafs Conflicts: src/main/scala/catalyst/execution/aggregates.scala commit 41b41f3c6ff0b06e6ac76a6a17c929c3bae8be8a Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T09:39:11Z Only cast unresolved inserts. commit 63003e90fb70e13d22ad7e260e29897286a7776b Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:37:58Z Fix spacing. commit 2de89d0807307f0944d79fb525d18bc2464ebf49 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:38:18Z Merge pull request #13 from tnachen/master Add GenericUDAF wrapper and HiveUDAFFunction commit cb775ac99241f26461a19646b9c6db660a6a2eeb Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-12T22:15:44Z get rid of SharkContext singleton commit dfb67aa73ce15d9a9c355afaa1d690b3aad41843 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-13T01:47:55Z add test case commit 19bfd74f9b7a3cc9dc7b7cc6477908abbd6826d9 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-01-22T07:08:31Z store hive output in circular buffer commit 1590568ddbeee565bc483ccfe089b287433643a4 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T01:57:48Z add log4j.properties commit b649c20a124ef2e7cd8c026ffb06be759d608cec Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T05:13:30Z fix test logging / caching. commit 784536466cc3fe69ea230f0e63f7c4cd670fdadc Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T05:13:40Z deactivate concurrent test. commit ea6f37f740a5dfef3ca0c2f82e4c26ed3171851c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T05:13:53Z fix style. commit 82163e3e3c21804898e576e3a224e3a644e75d27 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T06:26:58Z special case handling of partitionKeys when casting insert into tables commit 9c22b4ebdda3955a88800dcf0dec0d14748394e7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:13:44Z Support for parsing nested types. commit efa72170ebe27d84cb5ae2efeaed4054ceca1f9c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:19:31Z Support for reading structs in HiveTableScan. commit d670e41dfaf93bc322079d5e93b938c2f868932c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:19:47Z Print nested fields like hive does. commit dc6463acaccfbdf3bae41ca746b678cb3b70cf9a Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:20:11Z Support for resolving access to nested fields using "." notation. commit 67094413d86c0d03fbb717a99916b9c906552d67 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:20:26Z Evaluation for accessing nested fields. commit da7ae9da830a5260478a5d9cd4959bb5f3565df2 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:21:11Z Add boolean writable that was breaking udf_regexp test. Not sure how this was passing before... commit 6420c7c23b1fcbae009ce97c5dd2dc9ece75f0a0 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-01T02:28:56Z Memoize the ordinal in the GetField expression. commit 1579eecca917152c542a68149eddd636131dbb2f Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T09:39:11Z Only cast unresolved inserts. commit cf8d99257ad87063bca4bc3a2d5a09b54a2cf2b1 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:00:51Z Use built in functions for creating temp directory. commit c654f19ef6fec54537a4e704234b63c65c7e0d1e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:01:51Z Support for list and maps in hive table scan. commit c3feda75938565b85ff401aeb29bdcb44e7accdc Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:02:06Z use toArray. commit a9388fb7274fe40b9d10eb8d4a3c97c32d365187 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:02:29Z printing for map types. commit bbec500c4fc9a12cbc18b607147aa751308f4288 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T20:02:52Z update test coverage, new golden commit 35a70fbfd93b83856f86ea52bc1b3a850076960f Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T21:28:05Z multi-letter field names. commit 2c6deb37b104b5272d99917b6933a749da99d06e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-02T21:28:23Z improve printing compatibility. commit 5b33216d197ad7c649e36f9f9a2a48143120aeae Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T00:21:23Z work on decimal support. commit 5b3d2c80546848a9c6bf830c22ec5f029dca790f Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T00:21:40Z implement distinct. commit 3f9e519a16f9dc9f3eabda3ad91d80c088e3f384 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T00:30:14Z use names w/ boolean args commit 3734a9416c1156030a7c2af9e43d9209ca17aa59 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T00:31:03Z only quote string types. commit 5e54aa6dab3e3ed0f2e702abc038eee5f17fcb38 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T01:38:52Z quotes for struct field names. commit e4def6b2c917ebf28b3a11fc1aad690c2fddd55f Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T01:39:19Z set dataType for HiveGenericUdfs. commit aa430e7ba7fd748619bd4b1959ca165ec2b13a5c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T06:58:51Z Update .travis.yml commit 7661b6ce6b8cb1cfc816e87d0644cfc063dce921 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T07:21:24Z blacklist machines specific tests commit 72a003dd3dce58331205465fb43bbb9a412156c4 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T07:41:45Z revert regex change commit 9c0677866e24293525602a8e76860b4785950c39 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-03T08:11:21Z fix serialization issues, add JavaStringObjectInspector. commit 92e415878439ceb94e3d41de75bc26acfe92a24d Author: Reynold Xin <r...@apache.org> Date: 2014-02-03T18:30:55Z Merge pull request #32 from marmbrus/tooManyProjects Fix a bug in PreInsertionCasts rule. commit 692a4779af0a269ae1f16006ab129c00af2a6c5c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:36:48Z Support for wrapping arrays to be written into hive tables. commit ac9d7de4f973d4809d435d098def4de12c1c0dbc Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:37:06Z Resolve *s in Transform clauses. commit 7a0f543431b196f78da2f473fd2f0d3e3764d0c3 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:37:21Z Avoid propagating types from unresolved nodes. commit 010accb872f179b97b6cc6e971a7e9f17ec2de73 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:37:39Z add tinyint to metastore type parser. commit e7933e912356e686ce36cc8a52dc813a7cc8c430 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:38:13Z fix casting bug when working with fractional expressions. commit 25288d055a0bcf251e64c8653442f1ee5b466e70 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:38:38Z Implement [] for arrays and maps. commit ab9a131818884dd2258174956fdca65bd14dfd42 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:38:58Z when UDFs fail they should return null. commit 1679554ae68dfc91212ebaf8401efaf6088d61a9 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:39:12Z add toString for if and IS NOT NULL. commit ab5bff387f2ced791527b4c20b2c30dc7da6c190 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T02:39:28Z Support for get item of map types. commit 42ec4af79020a5952bf59a5e44d6852eef5d4b41 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T03:07:17Z improve complex type support in hive udfs/udafs. commit 44d343ca60aa1fbcd78217a39ea86a74098e0ef3 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T03:09:38Z Merge remote-tracking branch 'databricks/master' into complex Conflicts: src/main/scala/catalyst/analysis/Analyzer.scala commit e3c10bd5649658995c3a347ebe1ab434fad50cdc Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T08:57:55Z update whitelist. commit 389525dedbc7c6c83d6686a7661c98354f60425e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T18:44:35Z update golden, blacklist mr. commit 2f276049070ccd873368441e652c0d6a2d3e2551 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T19:23:12Z Address comments / style errors. commit cb57459ce009bdf8e58e7eaf1c301279b5a07ce7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-04T19:24:33Z blacklist machine specific test. commit 67128b8bf07a5deaacd1a9214c1fa58d0bfcba85 Author: Reynold Xin <r...@apache.org> Date: 2014-02-04T21:16:20Z Merge pull request #30 from marmbrus/complex Initial support for reading / accessing / printing nested fields. commit b4be6a5411cd3d25919bc71563da44638660ecb6 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T00:53:46Z better logging when applying rules. commit ccdb07a18c62c7c955400e3253d81adbd6e8f42e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T00:54:23Z Fix bug where averages of strings are turned into sums of strings. Remove a blank line. commit d8cb805193f7d8ffe96efc423bb86f781ea3ef41 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T01:50:48Z Implement partial aggregation. commit f94345cb0ed64b8566da623e765a04cac6739733 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T02:44:38Z fix doc link commit e1999f927a41eae4a9affe2728296a1a9ee06cb8 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-05T04:38:11Z Use Deserializer and Serializer instead of AbstractSerDe. commit 32b615b52e7c202b29e1242952092d09f3332745 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T09:36:12Z add override to asPartial. commit 883006dd16cbd1ddb61f164ad28a8237f4c6becc Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T09:46:39Z improve tests. commit cab1a84b4811064fe217b0cd56d3fe9c48210b6a Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T10:01:08Z Fix PartialAggregate inheritance. commit dc6353be64bfe9c6522403a5a4124423cd62e22b Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T10:03:58Z turn off deprecation commit 8017afb101b214635dcd1b372afcd21379c340f5 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-05T18:40:23Z fix copy paste error. commit 5479066a011a8dff6da8c68c8452cdeffb4cc3e8 Author: Reynold Xin <r...@apache.org> Date: 2014-02-05T19:22:52Z Merge pull request #36 from marmbrus/partialAgg Implement partial aggregation. commit 5e4d9b453658dece7afa987ab9b07bf2c12b4999 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T00:16:19Z Merge pull request #35 from marmbrus/smallFixes A few small bug fixes and improvements. commit 02ff8e4462793d8f37365f44cb2f269f619d72da Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-07T13:41:42Z Correctly parse the db name and table name in a CTAS query. commit 8841eb888d16edbb1bd34175ee13b664468e78b7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:01:51Z Rename Transform -> ScriptTransformation. commit acb956646de2a05475ff5086b5967e0e657f8aa0 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:03:56Z Correctly type attributes of CTAS. commit 016b48990ef37b32d1bd4b1d4790afbe15e7db57 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:04:17Z fix typo. commit bea4b7f1c3b091386bb8cacad8f8c2e154c579b7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:04:40Z Add SumDistinct. commit ea76cf9bf5e07dfa5435fa99ae1e0623a7c89262 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:05:13Z Add NoRelation to planner. commit dd00b7e8df7356be40379ec560f2f476f74e1a8e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:11:33Z initial implementation of generators. commit ba8897fd60a6555d2a52ea5fb3d8c32981ed2296 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:12:16Z Merge remote-tracking branch 'yin/parseDBNameInCTAS' into lateralView commit 0ce61b0f3d110567693bb340df6f5bdd6ee41a2c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:44:19Z Docs for GenericHiveUdtf. commit 740febb71c94e40f436cb3ea5ebc81b0cda4db26 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T22:44:33Z Tests for tgfs. commit db92adc5ff5a0712d5104aad00cad67b520070b4 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T23:58:28Z more tests passing. clean up logging. commit ff5ea3f209eed028365a2b680dd7093340e355c8 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-07T23:59:41Z new golden commit 5cc367cdb9946b092c53ff1473ac3f784c0112d3 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-08T01:34:34Z use berkeley instead of cloudbees commit b376d15652bd0372d1713429468d874614a9dd7a Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-08T01:42:32Z fix newlines at EOF commit 7123225ae5e96dc7be38b13c2f2bcc86a19249ad Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-08T01:44:01Z Correctly parse the db name and table name in INSERT queries. commit 2897deb146c498bfc7ebcb80e3835ecb9899cfeb Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-08T02:31:20Z fix scaladoc commit 0e6c1d712f95ce0268dc71b28a64c2bd29c81b27 Author: Reynold Xin <r...@apache.org> Date: 2014-02-08T06:40:54Z Merge pull request #38 from yhuai/parseDBNameInCTAS Correctly parse the db name and table name of a table commit 341116cb450ff72af793a5bd84d73ca2203200cb Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-08T20:09:59Z address comments. commit 7785ee62e47c93390213ff3f1a8a67a293d878a6 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-10T23:14:49Z Tighten visibility based on comments. commit 964368f3b21c79ec86eb7c0389c43768fb4c1b01 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-11T00:04:01Z Merge pull request #39 from marmbrus/lateralView Add support for lateral views, TGFs and Hive UDTFs commit dce0593034a30b802d9be2cf98590e9955df1b47 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-11T00:04:56Z move golden answer to the source code directory. commit 9329820a9a85697a9bfad11b6f7266c07eb59235 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-11T00:28:23Z add golden answer files to repository commit a7ad05855a376af7c7cdb89bb114cccba9e6b9b1 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-11T02:02:05Z Merge pull request #40 from marmbrus/includeGoldens Include golden hive answers in the source repository commit 2407a21180d261138454d23926786dcc20e88d1e Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-12T00:29:11Z Added optimized logical plan to debugging output commit cf691df0b020840be8bfaf0e29a7db4ef049b6f6 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-12T00:30:14Z Added the PhysicalOperation to generalize ColumnPrunings commit f235914e3572919f5cb056b8a6794eb0623f5617 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-12T09:14:22Z Test case udf_regex and udf_like need BooleanWritable registered commit f0c3742583d9a99bfc0f36c4fe9e2a497412c580 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-12T09:23:07Z Refactored PhysicalOperation The old version is implemented in a top down tail recursive manner, which cannot cover an uncommon corner case like: Filter (with aliases) Project ... MetastoreRelation In this case, the aliases are not in-lined/substituted because no aliases are collected yet. It is now covered by the new version which is implemented in a bottom up recursive manner and collects all necessary aliases before in-lining/substitution. commit 5720d2bd2cd08c2ecbff32391ed88080cecd7359 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-12T09:39:09Z Fixed comment typo commit bc9a12ce63f14f34aa9d74086f3485a6d338cf66 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-13T23:18:26Z Move hive test files. commit 7588a57feb1870c718be645e428d1f2371b9e722 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-13T23:19:28Z Break into 3 major components and move everything into the org.apache.spark.sql package. commit 1f7d00aab0b9bd56dd4e4b71c9979f9e4e559d8b Author: Reynold Xin <r...@apache.org> Date: 2014-02-14T06:29:29Z Merge pull request #41 from marmbrus/splitComponents Break catalyst into 3 major components and move everything into org.apache.spark.sql commit 887f928aac6f649ed5f97c644dafd715a9b450a4 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-14T10:38:57Z Merge remote-tracking branch 'upstream/master' into SerDeNew commit 678341a50b793b09658b823fa1bdc61a9293d770 Author: Mark Hamstra <markhams...@gmail.com> Date: 2014-02-14T18:21:24Z Replaced non-ascii text commit 5ae010ff20ed811962e6f13920d1ef43bfc2a14b Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-14T19:14:33Z Merge pull request #42 from markhamstra/non-ascii Replaced non-ascii text commit 1f6260d77223aaf23c2bbb112b52803bea061e42 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-14T20:45:29Z Fixed package name and test suite name in Makefile commit b6de691f13d66dadc7b72c9eb19acccaf75b8ee9 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-14T22:15:35Z Merge pull request #43 from liancheng/fixMakefile Fixed package name and test suite name in Makefile commit 7f206b5aa577bc4ca8aeb82d2438ad43316eb996 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-14T22:34:23Z Add support for hive TABLESAMPLE PERCENT. commit ed3a1d15b80768817e9259e31499df53587c51b2 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-14T23:45:32Z Load data directly into Hive. commit 59e37a31efba400649685c4cedf648d1b0c86d0b Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-14T23:56:06Z Merge remote-tracking branch 'upstream/master' into SerDeNew Conflicts: build.sbt shark/src/main/scala/org/apache/spark/sql/shark/HiveMetastoreCatalog.scala commit 346f828dc37df3a1681e6ebf2a5940a609ead50a Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-15T00:38:52Z Move SharkHadoopWriter to the correct location. commit a9c318853d4bb02965252810656999be060682dd Author: Timothy Chen <tnac...@gmail.com> Date: 2014-02-15T01:06:00Z Fix udaf struct return commit 69adf7298edb74a9ecd704932276d988d1c8ba5d Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-15T01:22:13Z Set cloneRecords to false. commit 566fd6685fec88b88223f4b47af04eb39a69d28e Author: Timothy Chen <tnac...@apache.org> Date: 2014-02-15T02:09:30Z Whitelist tests and add support for Binary type commit 9ad474d877ae1a6dcc6a7769c2effed4c3a15029 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-15T02:56:30Z Merge pull request #44 from marmbrus/sampling Add support for hive TABLESAMPLE PERCENT. commit 3cb4f2e16662c54806474d0de2fbd9021133ae08 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-15T02:57:29Z Merge pull request #45 from tnachen/master Fix udaf struct return commit 8506c176f7e18011df50e25f8ea98d30a57f0ccd Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-15T03:20:41Z Address review feedback. commit 3bb272ddc69472120bb0915308451576565cecf6 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-15T03:26:42Z move org.apache.spark.sql package.scala to the correct location. commit 1596e1b14e8e2741758c6370bb29d32830476a7f Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-15T04:09:25Z Cleanup imports to make IntelliJ happy. commit 5495faba864ee7ef1f8649bca02eacb7479a3b2a Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-15T10:01:02Z Remove cloneRecords which is no longer needed. commit bdab5edd65140cd18c2dc29b00fa914d624dd999 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-15T10:03:28Z Add a TODO for loading data into partitioned tables. commit 35c9a8a11fed8ae8f7aa8d345b4bc0c53f413ab8 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-15T20:57:39Z Merge pull request #46 from marmbrus/reviewFeedback Address review feedback from previous PR. commit 563bb22bd30b021e2bc276e2ed454f5296877a63 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T00:26:05Z Set compression info in FileSinkDesc. commit e08962779a195b991c2478647c65923f4ddd23b4 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T00:26:23Z Code style. commit 45ffb86df7c877c78de0470fbb66fae6be3bcf23 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T00:28:11Z Merge remote-tracking branch 'upstream/master' into SerDeNew commit eea75c522fbf9ead1ef4280e3420d3a6685b7a0c Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T11:24:15Z Correctly set codec. commit 428aff5f15a1954a983f049ade8986816d87e73c Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T12:39:24Z Distinguish `INSERT INTO` and `INSERT OVERWRITE`. commit a40d6d628384c172c1d1d7a4bd4011c3cb8f2b6b Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T14:09:23Z Loading the static partition specified in a INSERT INTO/OVERWRITE query. commit 334aacee2432fbc6c51644df08f4899d340a2ef4 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-16T14:11:45Z New golden files. commit d00260be188368ce943f2ffe7d087a7eff2f5f41 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-17T00:26:19Z Strips backticks from partition keys. commit 555fb1d1e965d19c6e7dc28027361868b3492c0f Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-17T06:51:16Z Correctly set the extension for a text file. commit feb022c1e77aac1f6b224cfc56bfd851762a0ca6 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-17T06:51:55Z Partitioning key should be case insensitive. commit a1a47760b718bfecc7e4b1adacb3a179f936825c Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-17T10:46:13Z Update comments. commit 017872cef3d771acab5fb3efc570dc1798e44f6d Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-17T10:46:31Z Remove stats20 from whitelist. commit 128a9f8b8082b3ed0659dfe6c41dbd7cbf04ff71 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-18T04:58:08Z Minor changes. commit f670c8c7adf6a3bc5c1e20850070b15e041f9285 Author: Yin Huai <huaiyin....@gmail.com> Date: 2014-02-18T09:35:01Z Throw a NotImplementedError for not supported clauses in a CTAS query. commit c5a4fabbe9a67c0bc3063314f7c5efd001aba52d Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-16T13:39:24Z Merge branch 'master' into columnPruning Conflicts: shark/src/test/scala/org/apache/spark/sql/shark/execution/HiveQuerySuite.scala shark/src/test/scala/org/apache/spark/sql/shark/execution/PartitionPruningSuite.scala src/main/scala/catalyst/execution/FunctionRegistry.scala src/main/scala/catalyst/execution/SharkInstance.scala src/main/scala/catalyst/execution/planningStrategies.scala commit 2682f72adde85870de6b7bc20e0df0622340cdb0 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-18T12:14:06Z Merge remote-tracking branch 'origin/master' into columnPruning commit 54f165b5f8814b9a9572f315b17505ef896b723a Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-18T12:19:26Z Fixed spelling typo in two golden answer file names commit cf4db596d1ef8edcaa4f5e42648ddc57e4dc38e6 Author: Lian, Cheng <rhythm.m...@gmail.com> Date: 2014-02-18T16:32:20Z Added golden answers for PruningSuite commit f22df3aa73b75babca50ee0884bd064497bfe836 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-18T19:05:19Z Merge pull request #37 from yhuai/SerDe Support ORCSerDe commit 9990ec7dcce26174f326172f1d662cc758d4e130 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-18T19:07:34Z Merge pull request #28 from liancheng/columnPruning Column pruning optimization together with some minor refactoring commit 29effadbc188c5e6604a9e3a7460d9abde2c2fce Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:30:20Z Include alias in attributes that are produced by overridden tables. commit c9116a6aa873e88c6b72d6ddc5d935af7c083f15 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:31:16Z Add combiner to avoid NPE when spark performs external aggregation. commit 8c01c2475ef87d589263ba215f26530346b9868d Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:31:42Z Move definition of Row out of execution to top level sql package. commit 4905b2b0b5f5cc8c123b41ccbb2daec117f73fad Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:33:17Z Add more efficient TopK that avoids global sort for logical Sort => StopAfter. commit 532dd3748c262cdeea2f9f7977ba3a875e8b73fe Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:34:06Z Allow the local warehouse path to be specified. commit a4308954350a578dae8d8d4d49ac7ec52c2d0fe7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:34:35Z Planning for logical Repartition operators. commit 5fe7de411c437d958d414d5530c56aceb6f6bfc3 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:36:09Z Move table creation out of rule into a separate function. commit b9225114460f9d628738b690fc0b33ba81a3c019 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:37:06Z Fix insertion of nested types into hive tables. commit 18a861b108eb20afa1a87ee04324de829478b4d2 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:38:06Z Correctly convert nested products into nested rows when turning scala data into catalyst data. commit df88f01e1d449433e2f149dbaea90a9611848ff9 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:38:44Z add a simple test for aggregation commit 6e04e5b944113bc2c0cb528dcac1ccf3276109e2 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T21:39:14Z Add insertIntoTable to the DSL. commit 24eaa79764253a2771c980728037e17bbef17b50 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T22:22:06Z fix > 100 chars commit d393d2abebc03408fc43dbd835105134fa256463 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T22:41:37Z Review Comments: Add comment to map that adds a sub query. commit 2225431005040fd6bb0b71f125057b40ef8c0493 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T23:18:21Z Merge pull request #48 from marmbrus/minorFixes Several minor fixes for bugs found during benchmarking. commit 3ac941623b9b9cc860de890a781578b21b3accae Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-25T00:24:39Z Merge support for working with schema-ed RDDs using catalyst in as a spark subproject. commit f5e7492c267758c80b7ad3e4c74b3b20b34ec9e0 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-25T22:44:02Z Add Apache license. Make naming more consistent. commit 5f2963c053f39ef4298598be918a4758c1c32a13 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-27T23:20:05Z naming and continuous compilation fixes. commit 4d57d0e7b0e929d14c9d4218d5b63a03e176d04d Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-27T23:37:26Z Fix test execution on travis. commit 7413ac22622a991eac5fba33cbaeee2008f324f0 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-28T00:04:41Z make test downloading quieter. commit 608a29ea363e4093e605b2ecdcf3d55f4109e30d Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-28T02:22:58Z Add hive as a repl dependency commit c3343868f8cc8b1054513fe6619c9bb193e8816a Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-24T22:29:16Z Initial support for generating schema's based on case classes. commit b33e47ede48e9803fe213ec71d9a3ccea804b69a Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-16T14:09:02Z First commit of Parquet import of primitive column types commit 99a920916fa7f03669d86a9b9cf7482fedcaf318 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-16T17:54:44Z Expanding ParquetQueryTests to cover all primitive types commit eb0e521572c500e79de2dc5c3aa188b222490681 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-17T13:28:37Z Fixing package names and other problems that came up after the rebase commit 6ad05b34ecf9d457fd95c8e7f8f74ed979048cb9 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-19T11:06:53Z Moving ParquetRelation to spark.sql core commit a11e36428f3ea166825cbeb39ea23e86046dd26a Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-19T14:12:30Z Adding Parquet RowWriteSupport commit 0f17d7b6fcea76b991da1790cf39b97d5543eee1 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-19T14:26:55Z Rewriting ParquetRelation tests with RowWriteSupport commit 6a6bf9844e1c25e3f3360cc4c479f5db66e2bea7 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-19T16:31:40Z Added column projections to ParquetTableScan commit f347273cb9d8f6e6c43eb3ef5e54507025ecc1cd Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-20T17:01:37Z Adding ParquetMetaData extraction, fixing schema projection commit 75262eec5e21400011359dbf3f2825cbd7be461d Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-24T09:27:25Z Integrating operations on Parquet files into SharkStrategies commit 18fdc441ab3fc17535512f86cb77651d91596bdd Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-26T10:12:15Z Reworking Parquet metadata in relation and adding CREATE TABLE AS for Parquet tables commit 3a0a552a5950f99f80bc178818103e393cfa775c Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-26T12:55:31Z Reorganizing Parquet table operations commit 332119573ba934e7fd8cb1f7adcd0d3bd791a1c2 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-02-27T07:41:21Z Fixing one import in ParquetQueryTests.scala commit 61e3bfbbb2fe4894fa5c2d7c27f1da6cec903819 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-02T11:45:59Z Adding WriteToFile operator and rewriting ParquetQuerySuite commit c863bed3d17abf9cd3da7cee8637d77b088a192d Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-02T14:28:23Z Codestyle checks commit 3ac9eb05d0cec3cca166503cb4dc417168694012 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-02T18:23:06Z Rebasing to new main branch commit 3bda72db9384b0f67cfbfbe22eb2674be113ceda Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-02T20:59:23Z Adding license banner to new files commit d7fbc3a591110dae76121c1095a32ab4788ae005 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-27T02:00:12Z Several performance enhancements and simplifications of the expression evaluation framework. * Removed the Evaluate singleton in favor of placing expression evaluation code in each expression. * Instead of passing in a Seq of input rows we now take a single row. A mutable JoinedRow wrapper can be used in the relatively rare cases where expressions need to be evaluated on multiple input rows. * GenericRow now takes a raw Array[Any] instead of a Seq. Since GenericRow itself is a Seq wrapper, this avoids the creation of an unnecessary object. * A new concept called MutableLiteral can be used to evaluate aggregate expressions in-place, instead of needing to build new literal trees for each update. This part is more of a WIP as we still incur boxing, however this is a strict improvement over what was there before. commit 296fe5036105b7e519501f58e0fb0204023c23f2 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-27T20:30:56Z Address review feedback. commit 6fdefe65478d950d3f30f6591df361558886d187 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-03T20:33:45Z Port sbt improvements from master. commit da9afbda89776602acb5dfa10d1c0a654f9d77dd Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-03T20:43:32Z Add byte wrappers for hive UDFS. commit 7b9d14263a4cbf5d39216c86a41b546c607b4a20 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-03T19:41:35Z Update travis to increase permgen size. commit 99e61fbfa386dc11f4b0df2134d8b714c57ad3ba Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-03T21:36:20Z Merge pull request #51 from marmbrus/expressionEval Several performance enhancements and simplifications of the expression evaluation framework. commit 8d5da5ed977b1c867b5b78f05523d89d5552b387 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-02-27T03:02:17Z modify compute-classpath.sh to include datanucleus jars explicitly commit 6d315bb168443eba98d978ae65c386ff27629bfc Author: Cheng Lian <lian.cs....@gmail.com> Date: 2014-03-05T03:48:37Z Added Row.unapplySeq to extract fields from a Row object. commit 70e489d277470b5ed84d856af96b1167a0f892b6 Author: Cheng Lian <lian.cs....@gmail.com> Date: 2014-03-05T04:13:19Z Fixed a spelling typo commit 1ce01c7ad99d6c5d666c8b601c8f3527ab0ebe9f Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-05T08:59:26Z Merge pull request #56 from liancheng/unapplySeqForRow Added Row.unapplySeq to extract fields from a Row object. commit 0040ae6d53e4298402b1ddcbcbcea6bc2b78e7d7 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-05T09:11:54Z Feedback from code review commit 9d419a632ace9064519b83f28d851dbd2707e99c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-05T19:23:51Z Merge remote-tracking branch 'catalyst/catalystIntegration' into parquet_support commit 7d0f13e9c8a2c336a2089affaad594943573577d Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-05T19:28:03Z Update parquet support with master. commit 3c3f9624a4c3041a0d8b68bc4e218ea6e0eef769 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-05T20:17:34Z Fix a bug due to array reuse. This will need to be revisited after we merge the mutable row PR. commit c9f8fb3fbb6b45ede70c7b2e285668fdf1e48582 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-06T01:11:30Z Merge pull request #53 from AndreSchumacher/parquet_support Parquet support commit d37139320dd35c91c22903a919aa177ae68e4cf7 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-05T02:54:21Z Add a framework for dealing with mutable rows to reduce the number of object allocations that occur in the critical path. commit 959bdf0bb5362d6387e1748dd16b62f6abfe4801 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-06T02:05:25Z Don't silently swallow all KryoExceptions, only the one that indicates the end of a stream. commit 9049cf0d432662cb40c7e31688049d9a1db6e732 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-06T02:06:53Z Extend MutablePair interface to support easy syntax for in-place updates. Also add a constructor so that it can be serialized out-of-the-box. commit d9943336fda9c31fda202ed13e5c06b074214539 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-06T02:08:15Z Remove copies before shuffle, this required changing the default shuffle serialization. commit ba28849fa9ec163dc39889cd7f3d683f28692b33 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-06T02:23:05Z code review comments. commit c2a658d1d18ee821d83b89de43992f444a0d5dbb Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-06T18:07:38Z Merge pull request #55 from marmbrus/mutableRows Add a framework for dealing with mutable rows. commit 54637ecce8ea9a9af3b41ce4a7a719249bcff2f2 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-09T19:11:58Z First part of second round of code review feedback commit 5bacdc0e5c18bc6a4aee6bc2da8ac8d2a29751a0 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-09T20:35:39Z Moving towards mutable rows inside ParquetRowSupport commit 7ca4b4e34d466fd64243b80300fab28af09936e9 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-11T17:56:40Z Improving checks in Parquet tests commit aeaef544dda49dae87385f8bdd31e2a61719dfd2 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-11T18:33:00Z Removing unnecessary Row copying and reverting some changes to MutableRow commit 7386a9f386298d8428055cfae5784f78cac44ada Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-11T18:34:45Z Initial example programs using spark sql. commit f0ba39efd308339293b8cd4e397731f4b959ff65 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-11T18:54:52Z Merge remote-tracking branch 'origin/master' into maven Conflicts: project/SparkBuild.scala sbt/sbt-launch-lib.bash commit 7233a7452fc36d3a9d7e7afcd560e9aad73bbf6c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-11T22:19:08Z initial support for maven builds commit 3447c3edb7a83163a5668c68a246bc04216a0e71 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-13T19:15:50Z Don't override the metastore / warehouse in non-local/test hive context. commit 3386e4fd6715c133c5fb04e7b5b3d59af4b2ae53 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-13T19:32:06Z Merge pull request #58 from AndreSchumacher/parquet_fixes Parquet fixes commit 1a4bbd9f2b471e67d99cfa3e9a62406ed1b29723 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-13T20:51:55Z Merge pull request #60 from marmbrus/maven Basic support for maven, update spark. commit f93aa39fdd3cabc3377c92bc650a6f23469c3291 Author: Andre Schumacher <andre.schumac...@iki.fi> Date: 2014-03-14T16:25:21Z Better handling of path names in ParquetRelation Previously incomplete path names (with missing URI field) were passed to Parquet. Also two rules were moved from HiveStrategies to SparkStrategies. commit 5d710747a2f334755bf8a72ff841e42d9344299b Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T16:59:46Z Merge pull request #62 from AndreSchumacher/parquet_file_fixes Better handling of path names in ParquetRelation commit 8b35e0ac28080a4470d7e7eb6d0d3145de12d4e2 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-13T20:53:54Z address feedback, work on DSL commit d2d9678a63ffa61d5a2abd37bb667371ce8641ba Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T02:08:27Z Make sure hive isn't in the assembly jar. Create a separate, optional Hive assembly that is used when present. commit 9eb029405a8ba39fe7b40736702ce1443b9b149c Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T02:50:43Z Bring expressions implicits into SqlContext. commit f7d992db7ba126455069f48ce3fef2f95544095d Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T05:48:59Z Naming / spelling. commit ce8073b32d5a8713c5ad494baa1026c103e2882d Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T06:25:59Z clean up implicits. commit 2f224546a0c3e0713de359727e92d727bd41091e Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T06:26:15Z WIP: Parquet example. commit c01470fa14e75fbbea72b0c244515d1f2cdb26cb Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T17:07:50Z Clean up example commit 013f62a2eb59e76510d06d6e8b2ab6a882bdb598 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T17:31:34Z Fix documentation / code style. commit c2efad69d2013c4a8557874b9b1260ea7ae8dafc Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T20:14:01Z First draft of SQL documentation. commit e5e1d6bc80ce4faf4965b140c931ec1c277874bd Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T20:14:24Z Remove travis configuration. commit 1d0eb63b2a0f0cee2924287c583e1c62a9a83784 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T20:28:40Z update changes with spark core commit 6978dd8ed0b242103bb4af4c6c7c031d960b1285 Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T21:03:34Z update docs, add apache license commit 9dffbfa855128e31b3bed95fa9deec8fea85710a Author: Michael Armbrust <mich...@databricks.com> Date: 2014-03-14T21:51:25Z Style fixes. Add downloading of test cases to jenkins. commit adcf1a46fe02dbc3b32c8997ebf50af0e5ff1555 Author: Henry Cook <henry.m.cook+git...@gmail.com> Date: 2014-03-14T23:14:10Z Update sql-programming-guide.md Minor typos ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---