I'm on a road block trying to understand why Spark doesn't work for a
colleague of mine on his Windows 7 laptop.
I have pretty much the same setup and everything works fine.
I googled the error message and didn't get anything that resovled it.
Here is the exception message (after running spark 1
Hi
I'm looking for some benchmarks on joining data frames where most of the
data is in HDFS (e.g. in parquet) and some "reference" or "metadata" is
still in RDBMS. I am only looking at the very first join before any caching
happens, and I assume there will be loss of parallelization because JDBCRDD
I understand that the following are equivalent
df.filter('account === "acct1")
sql("select * from tempTableName where account = 'acct1'")
But is Spark SQL "smart" to also push filter predicates down for the
initial load?
e.g.
sqlContext.read.jdbc(…).filter('account=== "acct1")
to
> generate new query with “where account = acct1”
>
> Thanks.
>
> Zhan Zhang
>
> On Nov 18, 2015, at 11:36 AM, Eran Medan wrote:
>
> I understand that the following are equivalent
>
> df.filter('account === "acct1")
>
> sql("selec
Remember that article that went viral on HN? (Where a guy showed how GraphX
/ Giraph / GraphLab / Spark have worse performance on a 128 cluster than on
a 1 thread machine? if not here is the article -
http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)
Well as you may recall
Hi everyone,
I had a lot of questions today, sorry if I'm spamming the list, but I
thought it's better than posting all questions in one thread. Let me know
if I should throttle my posts ;)
Here is my question:
When I try to have a case class that has Any in it (e.g. I have a property
map and va
, but I don't think it's some kind
> of argument against distributed computing.
>
>
> On Fri, Mar 27, 2015 at 6:32 PM, Eran Medan
> wrote:
> > Remember that article that went viral on HN? (Where a guy showed how
> GraphX
> > / Giraph / GraphLab / Spark have
e.
>
> Thanks for the PR btw :)
>
> On Fri, Mar 27, 2015 at 2:31 PM, Eran Medan
> wrote:
>
>> Hi everyone,
>>
>> I had a lot of questions today, sorry if I'm spamming the list, but I
>> thought it's better than posting all questions in one thr
Hi Everyone!
I'm trying to understand how Spark's cache work.
Here is my naive understanding, please let me know if I'm missing something:
val rdd1 = sc.textFile("some data")
rdd.cache() //marks rdd as cached
val rdd2 = rdd1.filter(...)
val rdd3 = rdd1.map(...)
rdd2.saveAsTextFile("...")
rdd3.sa