The github repo is https://github.com/datastax/spark-cassandra-connector
The talk video and slides should be uploaded soon on spark summit website
On Wednesday, June 8, 2016, Chanh Le wrote:
> Thanks, I'll look into it. Any luck to get link related to.
>
> On Thu, Jun 9, 2016, 12
Try using the datastax package. There was a great talk on spark summit
about it. It will take care of the boiler plate code and you can focus on
real business value
On Wednesday, June 8, 2016, Chanh Le wrote:
> Hi everyone,
> I tested the partition by columns of data frame but it’s not good I me
I am executing a spark job on a cluster as a yarn-client(Yarn cluster not
an option due to permission issues).
- num-executors 800
- spark.akka.frameSize=1024
- spark.default.parallelism=25600
- driver-memory=4G
- executor-memory=32G.
- My input size is around 1.5TB.
My problem