Thanks for the recommendations. I had been focused on solving the problem
"within Spark" but a distributed database sounds like a better solution.
Jeff
On Sat, Aug 29, 2015 at 11:47 PM, Ted Yu wrote:
> Not sure if the race condition you mentioned is related to Cassandra's
> data consistency mod
We are using Cassandra for similar kind of problem and it works well... You
need to take care of race condition between updating the store and looking
up the store...
On Aug 29, 2015 1:31 AM, "Ted Yu" wrote:
> +1 on Jason's suggestion.
>
> bq. this large variable is broadcast many times during th
+1 on Jason's suggestion.
bq. this large variable is broadcast many times during the lifetime
Please consider making this large variable more granular. Meaning, reduce
the amount of data transferred between the key value store and your app
during update.
Cheers
On Fri, Aug 28, 2015 at 12:44 PM,
You could try using an external key value store (like HBase, Redis) and
perform lookups/updates inside of your mappers (you'd need to create the
connection within a mapPartitions code block to avoid the connection
setup/teardown overhead)?
I haven't done this myself though, so I'm just throwing th