Re: aliasing aggregate columns?

2015-04-17 Thread elliott cordo
*DataFrame[count: bigint]* On Thu, Apr 16, 2015 at 9:32 PM, elliott cordo wrote: > FYI.. the problem is that column names spark generates are not able to be > referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)").. > any idea how to alias these final aggregate

Re: aliasing aggregate columns?

2015-04-17 Thread elliott cordo
do: .agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")}) On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo wrote: > Hi Guys - > > Having trouble figuring out the semantics for using the alias function on >

aliasing aggregate columns?

2015-04-15 Thread elliott cordo
Hi Guys - Having trouble figuring out the semantics for using the alias function on the final sum and count aggregations? >>> cool_summary = reviews.select(reviews.user_id, cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"}) >>> cool_summary DataFram

Re: trouble with jdbc df in python

2015-03-25 Thread elliott cordo
mage: Inline image 1] On Wed, Mar 25, 2015 at 6:12 PM, Michael Armbrust wrote: > Try: > > db = sqlContext.load(source="jdbc", url="jdbc:postgresql://localhost/xx", > dbtables="mstr.d_customer") > > > On Wed, Mar 25, 2015 at 2:19 PM, elliot

trouble with jdbc df in python

2015-03-25 Thread elliott cordo
if i run the following: db = sqlContext.load("jdbc", url="jdbc:postgresql://localhost/xx", dbtables="mstr.d_customer") i get the error: py4j.protocol.Py4JJavaError: An error occurred while calling o28.load. : java.io.FileNotFoundException: File file:/Users/elliottcordo/jdbc does not exist Seem

Re: JdbcRdd for Python

2015-01-02 Thread elliott cordo
27;s the case I'd love to > see it implemented. > > From: elliott cordo > Date: Friday, January 2, 2015 at 8:17 AM > To: "user@spark.apache.org" > Subject: JdbcRdd for Python > > Hi All - > > Is JdbcRdd currently supported? Having trouble finding any info or > examples? > > >

JdbcRdd for Python

2015-01-02 Thread elliott cordo
Hi All - Is JdbcRdd currently supported? Having trouble finding any info or examples?

hiveContext.jsonFile fails with "Unexpected close marker "

2014-12-24 Thread elliott cordo
I have generally been impressed with the way jsonFile "eats" just about any json data model.. but getting this error when i try to ingest this file: "Unexpected close marker ']': expected '}" Here are the commands from the pyspark shell: from pyspark.sql import HiveContext hiveContext = HiveCont