Hi Michael,
Yes, you can use Alluxio to share Spark RDDs. Here is a blog post about
getting started with Spark and Alluxio (
http://www.alluxio.com/2016/04/getting-started-with-alluxio-and-spark/),
and some documentation (
http://alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html).
On 5/16/2016 12:12 PM, Michael Segel wrote:
For one use case.. we were considering using the thrift server as a way to
allow multiple clients access shared RDDs.
Within the Thrift Context, we create an RDD and expose it as a hive table.
The question is… where does the RDD exist. On the Thrift
Thanks for the response.
That’s what I thought, but I didn’t want to assume anything.
(You know what happens when you ass u me … :-)
Not sure about Tachyon though. Its a thought, but I’m very conservative when
it comes to design choices.
> On May 16, 2016, at 5:21 PM, John Trengrove
>
If you are wanting to share RDDs it might be a good idea to check out
Tachyon / Alluxio.
For the Thrift server, I believe the datasets are located in your Spark
cluster as RDDs and you just communicate with it via the Thrift
JDBC Distributed Query Engine connector.
2016-05-17 5:12 GMT+10:00 Micha
For one use case.. we were considering using the thrift server as a way to
allow multiple clients access shared RDDs.
Within the Thrift Context, we create an RDD and expose it as a hive table.
The question is… where does the RDD exist. On the Thrift service node itself,
or is that just a ref