[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

mateiz Sun, 16 Mar 2014 01:07:25 -0700

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/146#issuecomment-37751624
  
    Hey Michael, I really like the docs and API for this!
    
    I tried this out in spark-shell though and saw a few errors:
    * The built-in SQL seems to be case-sensitive -- it complained about 
"select * from foo" but not "SELECT * FROM foo"
    * When trying to do `[ExecutedQuery].rdd.collect()`, I got 
`NotSerializableException: SqlContext`
    
    Maybe these are just due to the shell environment, I'm not sure if they'd 
happen in standalone jobs.
    
    Also, I have some comments on the API to make it more consistent with our 
other stuff (more later):
    * Capitalize SQL in SQLContext, to match some of the other names where we 
capitalize acronyms
    * Instead of making registerAsTable an implicit conversion on RDDs, make it 
a method of SQLContext, so that the API looks the same in Java and Python. 
(There may be other places where we can do this.)
    * Do we really want loadFile to infer the file type? It might be better to 
have loadParquetFile and allow other types in the future
    
    Finally, add ScalaDoc comments to all the public methods on things like 
SQLContext.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

Reply via email to