-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49877/
-----------------------------------------------------------

(Updated July 14, 2016, 2:37 a.m.)


Review request for samza, Boris Shkolnik, Chris Pettitt, Fred Ji, Jake Maes, Yi 
Pan (Data Infrastructure), and Navina Ramesh.


Repository: samza


Description
-------

This feature introduces physical memory monitoring in SamzaContainer.

Context:
Often memory used by the SamzaContainer process includes 
A. JVM Heap memory: This is where all JVM variables live.
B. Native memory: This memory lives out of the JVM heap and is not visible to 
the JVM. Examples include used by RocksDb, native libraries that user code 
depends on etc.

User jobs could be killed by Yarn if their total memory (A+B) exceeds the 
configured maximum of yarn.container.memory.mb.

Currently, while our existing metrics provide visibility into [A] via JMX, we 
don't have visibility into [B]. (as it's totally external to the JVM). 

This feature uses Linux ProcFS to provide a complete view of the memory (both A 
& B) to help Samza users understand memory better. (Schedulers like Apache Yarn 
that require a holistic view of memory (A+B) also use ProcFS. For the curious, 
here's the Yarn implementation - 
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-yarn-common/0.23.1/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
 that inspired this idea)

Scope: The scope of this RB is only to Linux distributions. (Mac based 
implementation may be a separate change list.)


Diffs (updated)
-----

  build.gradle ba4a9d14fe24e1ff170873920cd5eeef656955af 
  checkstyle/import-control.xml 325c38131047836dc8aedaea4187598ef3ba7666 
  
samza-core/src/main/java/org/apache/samza/container/disk/PollingScanDiskSpaceMonitor.java
 50c85007123dd568ef90cf028af33a93a4470cb6 
  
samza-core/src/main/java/org/apache/samza/container/host/PosixCommandBasedStatisticsGetter.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/host/ProcfsBasedStatisticsGetter.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/host/StatisticsMonitorImpl.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/host/SystemStatistics.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/host/SystemStatisticsGetter.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/host/SystemStatisticsMonitor.java
 PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
18c09224bbae959342daf9b2b7a7d971cc224f48 
  
samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
 2044ce01ffded8434e762d99355d5df43642c66b 
  
samza-core/src/test/java/org/apache/samza/container/host/TestStatisticsMonitorImpl.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/49877/diff/


Testing
-------

1. Unit tests with mock PROC-FS snapshots of processes
2. Deployed actual jobs on my dev box. 
   2.1 Obtained the operating system's view of the container memory using 'ps' 
and other tools.
   2.2 Verified that the total memory reported by the monitor is the same as 
the OS's view of memory[2.1]
3. Tested on various Linux distributions I could find internally:
    - RHEL release 6.4, 6.5, 6.6 (Santiago)


Thanks,

Jagadish Venkatraman

Reply via email to