Thanks Steve,
For NN it all depends how fast you want a start-up. 1GB of NameNode
memory accommodates around 42T so if you are talking about 100GB of NN
memory then SSD may make sense to speed up the start-up. Raid 10 is the
best one that one can get assuming all internal disks.
In general it is
> On 17 Mar 2016, at 12:28, Mich Talebzadeh wrote:
>
> Thanks Steve,
>
> For NN it all depends how fast you want a start-up. 1GB of NameNode memory
> accommodates around 42T so if you are talking about 100GB of NN memory then
> SSD may make sense to speed up the start-up. Raid 10 is the best
On 11 Mar 2016, at 16:25, Mich Talebzadeh
mailto:mich.talebza...@gmail.com>> wrote:
Hi Steve,
My argument has always been that if one is going to use Solid State Disks
(SSD), it makes sense to have it for NN disks start-up from fsimage etc.
Obviously NN lives in memory. Would you also reromme
Thank you for info Steve.
I always believed (IMO) that there is an optimal position where one can
plot the projected NN memory (assuming 1GB--> 40TB of data) to the number
of nodes. For example heuristically how many nodes would be sufficient for
1PB of storage with nodes each having 512GB of mem
Hi Steve,
My argument has always been that if one is going to use Solid State Disks
(SSD), it makes sense to have it for NN disks start-up from fsimage etc.
Obviously NN lives in memory. Would you also rerommend RAID10 (mirroring &
striping) for NN disks?
Thanks
Dr Mich Talebzadeh
LinkedI
On 10 Mar 2016, at 22:15, Ashok Kumar
mailto:ashok34...@yahoo.com.invalid>> wrote:
Hi,
We intend to use 5 servers which will be utilized for building Bigdata Hadoop
data warehouse system (not using any propriety distribution like Hortonworks or
Cloudera or others).
I'd argue that life is i
Hi,
Bear in mind that you typically need 1GB of NameNode memory for 1 million
blocks. So if you have 128MB block size, you can store 128 * 1E6 / (3
*1024) = 41,666GB of data for every 1GB. Number 3 comes from the fact that
the block is replicated three times. In other words just under 42TB of
data
Ashok,
Cluster nodes has enough memory but CPU cores are less. 512GB / 16 = 32
GB. For 1 core the cluster has 32GB memory. Either their should be more
cores available to use efficiently the
available memory or don't configure a higher executor memory which will
cause lot of GC.
Thanks,
Prabhu