Hi Mich,
Thank you for the explanation, that makes sense, and is helpful for me to
understand the bigger picture between Spark/RDBMS.
Happy to know I’m already following best practice.
Cheers,
Jake
From: Mich Talebzadeh
Date: Monday, August 21, 2017 at 6:44 PM
To: Jake Russ
Cc: "
Hi everyone,
I’m currently using SparkR to read data from a MySQL database, perform some
calculations, and then write the results back to MySQL. Is it still true that
Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the
internet that Spark’s DataFrameWriter does not suppo
The ScalaTest code that is enclosed at the end of this email message
demonstrates what appears to be a bug in the KryoSerializer. This code was
executed from IntelliJ IDEA (community edition) under Mac OS X 10.11.2
The KryoSerializer is enabled by updating the original SparkContext (that is
su
I need to register with KryoSerializer a Tuple3 that is generated by a call to
the sortBy() method that eventually calls collect() from
Partitioner.RangePartitioner.sketch().
The IntelliJ Idea debugger indicates that the for the Tuple3 are
java.lang.Integer, java.lang.Integer and long[]. So, th
associated source code, so
if anyone has suggestions for improvement, please feel free to communicate them
to me.
Thanks,
Russ Brown
Distributed R-Trees are not very common. Most "big data" spatial solutions
collapse multi-dimensional data into a distributed one-dimensional index
using a space-filling curve. Many implementations exist outside of Spark
for eg. Hbase or Accumulo. It's simple enough to write a map function that
tak
an encoded representation of
an entire logical row; it's a useful convenience if you can be sure that
your rows always fit in memory.
I haven't tested it since Spark 1.0.1 but I doubt anything important has
changed.
Regards,
-Russ
On Thu, Mar 26, 2015 at 11:41 AM, David Holiday
wrote:
>
ll be better for your cluster.
-Russ
On Mon, Sep 29, 2014 at 7:43 PM, Nan Zhu wrote:
> can you look at your HBase UI to check whether your job is just reading
> from a single region server?
>
> Best,
>
> --
> Nan Zhu
>
> On Monday, September 29, 2014 at 10:21 PM, Tao X
No, they do not implement Serializable. There are a couple of places where
I've had to do a Text->String conversion but generally it hasn't been a
problem.
-Russ
On Wed, Sep 24, 2014 at 10:27 AM, Steve Lewis wrote:
> Do your custom Writable classes implement Serializable - I thi
I use newAPIHadoopRDD with AccumuloInputFormat. It produces a PairRDD using
Accumulo's Key and Value classes, both of which extend Writable. Works like
a charm. I use the same InputFormat for all my MR jobs.
-Russ
On Wed, Sep 24, 2014 at 9:33 AM, Steve Lewis wrote:
> I tried newAPIHa
query time down to 30s from 18 minutes and I'm seeing much better
utilization of my accumulo tablet servers.
-Russ
On Tue, Sep 9, 2014 at 5:13 PM, Russ Weeks wrote:
> Hi,
>
> I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat.
> Not sure if I shoul
ation(),
AccumuloInputFormat.class, Key.class, Value.class);
}
There's tons of docs around how to operate on a JavaPairRDD. But you're
right, there's hardly anything at all re. how to plug accumulo into spark.
-Russ
On Wed, Sep 10, 2014 at 1:17 PM, Megavolt wrote:
> I've
, I only ever
see a maximum of 2 tablet servers with active scans. Since the data is
spread across all the tablet servers, I hoped to see 8!
I realize there are a lot of moving parts here but I'd any advice about
where to start looking.
Using Spark 1.0.1 with Accumulo 1.6.
Thanks!
-Russ
13 matches
Mail list logo