Thanks for the updates! Best,
Haoyuan On Fri, May 8, 2015 at 8:40 AM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Just a followup on this Thread . > > I tried Hierarchical Storage on Tachyon ( > http://tachyon-project.org/Hierarchy-Storage-on-Tachyon.html ) , and that > seems to have worked and I did not see any any Spark Job failed due to > BlockNotFoundException. > below is my Hierarchical Storage settings.. > > -Dtachyon.worker.hierarchystore.level.max=2 > -Dtachyon.worker.hierarchystore.level0.alias=MEM > -Dtachyon.worker.hierarchystore.level0.dirs.path=$TACHYON_RAM_FOLDER > > > -Dtachyon.worker.hierarchystore.level0.dirs.quota=$TACHYON_WORKER_MEMORY_SIZE > -Dtachyon.worker.hierarchystore.level1.alias=HDD > -Dtachyon.worker.hierarchystore.level1.dirs.path=/mnt/tachyon > -Dtachyon.worker.hierarchystore.level1.dirs.quota=50GB > -Dtachyon.worker.allocate.strategy=MAX_FREE > -Dtachyon.worker.evict.strategy=LRU > > Regards, > Dibyendu > > On Thu, May 7, 2015 at 1:46 PM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com> wrote: > > > Dear All , > > > > I have been playing with Spark Streaming on Tachyon as the OFF_HEAP block > > store . Primary reason for evaluating Tachyon is to find if Tachyon can > > solve the Spark BlockNotFoundException . > > > > In traditional MEMORY_ONLY StorageLevel, when blocks are evicted , jobs > > failed due to block not found exception and storing blocks in > > MEMORY_AND_DISK is not a good option either as it impact the throughput a > > lot . > > > > > > To test how Tachyon behave , I took the latest spark 1.4 from master , > and > > used Tachyon 0.6.4 and configured Tachyon in Fault Tolerant Mode . > Tachyon > > is running in 3 Node AWS x-large cluster and Spark is running in 3 node > AWS > > x-large cluster. > > > > I have used the low level Receiver based Kafka consumer ( > > https://github.com/dibbhatt/kafka-spark-consumer) which I have written > > to pull from Kafka and write Blocks to Tachyon > > > > > > I found there is similar improvement in throughput (as MEMORY_ONLY case ) > > but very good overall memory utilization (as it is off heap store) . > > > > > > But I found one issue on which I need to clarification . > > > > > > In Tachyon case also , I find BlockNotFoundException , but due to a > > different reason . What I see TachyonBlockManager.scala put the blocks > in > > WriteType.TRY_CACHE configuration . And because of this Blocks ate > evicted > > from Tachyon Cache and when Spark try to find the block it throws > > BlockNotFoundException . > > > > I see a pull request which discuss the same .. > > > > https://github.com/apache/spark/pull/158#discussion_r11195271 > > > > > > When I modified the WriteType to CACHE_THROUGH , BlockDropException is > > gone , but it again impact the throughput .. > > > > > > Just curious to know , if Tachyon has any settings which can solve the > > Block Eviction from Cache to Disk, other than explicitly setting > > CACHE_THROUGH ? > > > > Regards, > > Dibyendu > > > > > > > -- Haoyuan Li CEO, Tachyon Nexus <http://www.tachyonnexus.com/> AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/