Hi Spark community,
We're excited about Spark at Adobe Research and have
just open sourced an example project writing and reading
Thrift objects to Parquet with Spark.
The project is on GitHub, and we're happy for any feedback:
https://github.com/adobe-research/spark-parquet-thrift-example
Regar
Hi Spark community,
We're excited about Spark at Adobe Research and have
just open sourced a project we use to automatically provision
a Spark cluster and submit applications.
The project is on GitHub, and we're happy for any feedback
from the community:
https://github.com/adobe-research/spark-clu
Sean Owen-2 wrote
> Can you not just filter the range you want, then groupBy
> timestamp/86400 ? That sounds like your solution 1 and is about as
> fast as it gets, I think. Are you thinking you would have to filter
> out each day individually from there, and that's why it would be slow?
> I don't
ssimanta wrote
>> Solution 2 is to map the objects into a pair RDD where the
>> key is the number of the day in the interval, then group by
>> key, collect, and parallelize the resulting grouped data.
>> However, I worry collecting large data sets is going to be
>> a serious performance bottleneck.
Hi, I have an RDD that represents data over a time interval and I want
to select some subinterval of my data and partition it by day
based on a unix time field in the data.
What is the best way to do this with Spark?
I have currently implemented 2 solutions, both which seem suboptimal.
Solution 1
Hi, I think this is a bug in Spark, because changing my program to using
a main method instead of using the App trait fixes this problem.
I've filed this as SPARK-2175, apologies if this turns out to be a
duplicate.
https://issues.apache.org/jira/browse/SPARK-2175
Regards,
Brandon.
--
View thi
Hi, I'm consistently getting NullPointerExceptions when trying to use
String val objects defined in my main application -- even for broadcast
vals!
I'm deploying on a standalone cluster with a master and 4 workers on the
same machine, which is not the machine I'm submitting from.
The following exa