Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Jake Russ
Hi Mich, Thank you for the explanation, that makes sense, and is helpful for me to understand the bigger picture between Spark/RDBMS. Happy to know I’m already following best practice. Cheers, Jake From: Mich Talebzadeh Date: Monday, August 21, 2017 at 6:44 PM To: Jake Russ Cc: "

Update MySQL table via Spark/SparkR?

2017-08-21 Thread Jake Russ
Hi everyone, I’m currently using SparkR to read data from a MySQL database, perform some calculations, and then write the results back to MySQL. Is it still true that Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the internet that Spark’s DataFrameWriter does not suppo

Apparent bug in KryoSerializer

2015-12-31 Thread Russ
The ScalaTest code that is enclosed at the end of this email message demonstrates what appears to be a bug in the KryoSerializer.  This code was executed from IntelliJ IDEA (community edition) under Mac OS X 10.11.2 The KryoSerializer is enabled by updating the original SparkContext  (that is su

How to register a Tuple3 with KryoSerializer?

2015-12-30 Thread Russ
I need to register with KryoSerializer a Tuple3 that is generated by a call to the sortBy() method that eventually calls collect() from Partitioner.RangePartitioner.sketch(). The IntelliJ Idea debugger indicates that the for the Tuple3 are java.lang.Integer, java.lang.Integer and long[].  So, th

building a distributed k-d tree with spark

2015-12-22 Thread Russ
associated source code, so if anyone has suggestions for improvement, please feel free to communicate them to me. Thanks, Russ Brown

Re: Indexing Support

2015-10-18 Thread Russ Weeks
Distributed R-Trees are not very common. Most "big data" spatial solutions collapse multi-dimensional data into a distributed one-dimensional index using a space-filling curve. Many implementations exist outside of Spark for eg. Hbase or Accumulo. It's simple enough to write a map function that tak

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Russ Weeks
an encoded representation of an entire logical row; it's a useful convenience if you can be sure that your rows always fit in memory. I haven't tested it since Spark 1.0.1 but I doubt anything important has changed. Regards, -Russ On Thu, Mar 26, 2015 at 11:41 AM, David Holiday wrote: >

Re: Reading from HBase is too slow

2014-09-29 Thread Russ Weeks
ll be better for your cluster. -Russ On Mon, Sep 29, 2014 at 7:43 PM, Nan Zhu wrote: > can you look at your HBase UI to check whether your job is just reading > from a single region server? > > Best, > > -- > Nan Zhu > > On Monday, September 29, 2014 at 10:21 PM, Tao X

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks
No, they do not implement Serializable. There are a couple of places where I've had to do a Text->String conversion but generally it hasn't been a problem. -Russ On Wed, Sep 24, 2014 at 10:27 AM, Steve Lewis wrote: > Do your custom Writable classes implement Serializable - I thi

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks
I use newAPIHadoopRDD with AccumuloInputFormat. It produces a PairRDD using Accumulo's Key and Value classes, both of which extend Writable. Works like a charm. I use the same InputFormat for all my MR jobs. -Russ On Wed, Sep 24, 2014 at 9:33 AM, Steve Lewis wrote: > I tried newAPIHa

Re: Spark + AccumuloInputFormat

2014-09-10 Thread Russ Weeks
query time down to 30s from 18 minutes and I'm seeing much better utilization of my accumulo tablet servers. -Russ On Tue, Sep 9, 2014 at 5:13 PM, Russ Weeks wrote: > Hi, > > I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. > Not sure if I shoul

Re: Accumulo and Spark

2014-09-10 Thread Russ Weeks
ation(), AccumuloInputFormat.class, Key.class, Value.class); } There's tons of docs around how to operate on a JavaPairRDD. But you're right, there's hardly anything at all re. how to plug accumulo into spark. -Russ On Wed, Sep 10, 2014 at 1:17 PM, Megavolt wrote: > I've

Spark + AccumuloInputFormat

2014-09-09 Thread Russ Weeks
, I only ever see a maximum of 2 tablet servers with active scans. Since the data is spread across all the tablet servers, I hoped to see 8! I realize there are a lot of moving parts here but I'd any advice about where to start looking. Using Spark 1.0.1 with Accumulo 1.6. Thanks! -Russ