>> Using large memory for executors (*--executor-memory 120g*).
Not really a good advice.
On Thu, Apr 2, 2015 at 9:17 AM, Cheng, Hao wrote:
> Spark SQL tries to load the entire partition data and organized as
> In-Memory HashMaps, it does eat large memory if there are not many
> duplicated gro
enough space and you have no idea why does application report "no
space left on device".
Just a guess.
-Vladimir Rodionov
On Tue, Feb 24, 2015 at 8:34 AM, Joe Wass wrote:
> I'm running a cluster of 3 Amazon EC2 machines (small number because it's
> expensive when exper
>> We service templated queries from the appserver, i.e. user fills
>>out some forms, dropdowns: we translate to a query.
and
>>The target data
>>size is about a billion records, 20'ish fields, distributed throughout a
>>year (about 50GB on disk as CSV, uncompressed).
tells me that proprietary i
.
One of the options here is try to reduce JVM heap size and reduce data size
per JVM instance.
-Vladimir Rodionov
On Thu, Oct 30, 2014 at 5:22 AM, Ilya Ganelin wrote:
> The split is something like 30 million into 2 milion partitions. The
> reason that it becomes tractable is that
There is doc on MapR:
http://doc.mapr.com/display/MapR/Accessing+MapR-FS+in+Java+Applications
-Vladimir Rodionov
On Wed, Oct 1, 2014 at 3:00 PM, Addanki, Santosh Kumar <
santosh.kumar.adda...@sap.com> wrote:
> Hi
>
>
>
> We were using Horton 2.4.1 as our Hadoop distribu
Yes, its in 0.98. CDH is free (w/o subscription) and sometimes its worth
upgrading to the latest version (which is 0.98 based).
-Vladimir Rodionov
On Wed, Oct 1, 2014 at 9:52 AM, Ted Yu wrote:
> As far as I know, that feature is not in CDH 5.0.0
>
> FYI
>
> On Wed, Oct 1,
Using TableInputFormat is not the fastest way of reading data from HBase.
Do not expect 100s of Mb per sec. You probably should take a look at M/R
over HBase snapshots.
https://issues.apache.org/jira/browse/HBASE-8369
-Vladimir Rodionov
On Wed, Oct 1, 2014 at 8:17 AM, Tao Xiao wrote:
> I
HBase TableInputFormat creates input splits one per each region. You can
not achieve high level of parallelism unless you have 5-10 regions per RS
at least. What does it mean? You probably have too few regions. You can
verify that in HBase Web UI.
-Vladimir Rodionov
On Mon, Sep 29, 2014 at 7:21
Hi, users
1. Disk based cache eviction policy? The same LRU?
2. What is the scope of a cached RDD? Does it survive application? What
happen if I run Java app next time? Will RRD be created or read from cache?
If , answer is YES, then ...
3. Is there are any way to invalidate cached RDD automat