*DataFrame[count: bigint]*
On Thu, Apr 16, 2015 at 9:32 PM, elliott cordo
wrote:
> FYI.. the problem is that column names spark generates are not able to be
> referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)")..
> any idea how to alias these final aggregate
do:
.agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")})
On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo
wrote:
> Hi Guys -
>
> Having trouble figuring out the semantics for using the alias function on
>
Hi Guys -
Having trouble figuring out the semantics for using the alias function on
the final sum and count aggregations?
>>> cool_summary = reviews.select(reviews.user_id,
cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"})
>>> cool_summary
DataFram
mage: Inline image 1]
On Wed, Mar 25, 2015 at 6:12 PM, Michael Armbrust
wrote:
> Try:
>
> db = sqlContext.load(source="jdbc", url="jdbc:postgresql://localhost/xx",
> dbtables="mstr.d_customer")
>
>
> On Wed, Mar 25, 2015 at 2:19 PM, elliot
if i run the following:
db = sqlContext.load("jdbc", url="jdbc:postgresql://localhost/xx",
dbtables="mstr.d_customer")
i get the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o28.load.
: java.io.FileNotFoundException: File file:/Users/elliottcordo/jdbc does
not exist
Seem
27;s the case I'd love to
> see it implemented.
>
> From: elliott cordo
> Date: Friday, January 2, 2015 at 8:17 AM
> To: "user@spark.apache.org"
> Subject: JdbcRdd for Python
>
> Hi All -
>
> Is JdbcRdd currently supported? Having trouble finding any info or
> examples?
>
>
>
Hi All -
Is JdbcRdd currently supported? Having trouble finding any info or
examples?
I have generally been impressed with the way jsonFile "eats" just about any
json data model.. but getting this error when i try to ingest this file:
"Unexpected close marker ']': expected '}"
Here are the commands from the pyspark shell:
from pyspark.sql import HiveContext
hiveContext = HiveCont