our one cached RDD in this run has id 3


******************* onStageSubmitted **********************
rddInfo: RDD "2" (2) Storage: StorageLevel(false, false, false, false, 1);
CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize: 0.0
B; DiskSize: 0.0 B
_rddInfoMap: Map(2 -> RDD "2" (2) Storage: StorageLevel(false, false,
false, false, 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
B;TachyonSize: 0.0 B; DiskSize: 0.0 B)


******************* onTaskEnd **********************
_rddInfoMap: Map(2 -> RDD "2" (2) Storage: StorageLevel(false, false,
false, false, 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
B;TachyonSize: 0.0 B; DiskSize: 0.0 B)
storageStatusList: List(StorageStatus(BlockManagerId(<driver>,
192.168.3.169, 34330, 0),579325132,Map()))


******************* onStageCompleted **********************
_rddInfoMap: Map()


******************* onStageSubmitted **********************
rddInfo: RDD "7" (7) Storage: StorageLevel(false, false, false, false, 1);
CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize: 0.0
B; DiskSize: 0.0 B
_rddInfoMap: Map(7 -> RDD "7" (7) Storage: StorageLevel(false, false,
false, false, 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
B;TachyonSize: 0.0 B; DiskSize: 0.0 B)


******************* updateRDDInfo **********************


******************* onTaskEnd **********************
_rddInfoMap: Map(7 -> RDD "7" (7) Storage: StorageLevel(false, false,
false, false, 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
B;TachyonSize: 0.0 B; DiskSize: 0.0 B)
storageStatusList: List(StorageStatus(BlockManagerId(<driver>,
192.168.3.169, 34330, 0),579325132,Map(rdd_3_0 ->
BlockStatus(StorageLevel(false, true, false, true, 1),19944,0,0))))


******************* onStageCompleted **********************
_rddInfoMap: Map()



On Tue, Apr 8, 2014 at 4:20 PM, Koert Kuipers <ko...@tresata.com> wrote:

> 1) at the end of the callback
>
> 2) yes we simply expose sc.getRDDStorageInfo to the user via REST
>
> 3) yes exactly. we define the RDDs at startup, all of them are cached.
> from that point on we only do calculations on these cached RDDs.
>
> i will add some more println statements for storageStatusList
>
>
>
> On Tue, Apr 8, 2014 at 4:01 PM, Andrew Or <and...@databricks.com> wrote:
>
>> Hi Koert,
>>
>> Thanks for pointing this out. However, I am unable to reproduce this
>> locally. It seems that there is a discrepancy between what the
>> BlockManagerUI and the SparkContext think is persisted. This is strange
>> because both sources ultimately derive this information from the same place
>> - by doing sc.getExecutorStorageStatus. I have a couple of questions for
>> you:
>>
>> 1) In your print statements, do you print them in the beginning or at the
>> end of each callback? It would be good to keep them at the end, since in
>> the beginning the data structures have not been processed yet.
>> 2) You mention that you get the RDD info through your own API. How do you
>> get this information? Is it through sc.getRDDStorageInfo?
>> 3) What did your application do to produce this behavior? Did you make an
>> RDD, persist it once, and then use it many times afterwards or something
>> similar?
>>
>> It would be super helpful if you could also print out what
>> StorageStatusListener's storageStatusList looks like by the end of each
>> onTaskEnd. I will continue to look into this on my side, but do let me know
>> once you have any updates.
>>
>> Andrew
>>
>>
>> On Tue, Apr 8, 2014 at 11:26 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> yet at same time i can see via our own api:
>>>
>>>     "storageInfo": {
>>>         "diskSize": 0,
>>>         "memSize": 19944,
>>>         "numCachedPartitions": 1,
>>>         "numPartitions": 1
>>>     }
>>>
>>>
>>>
>>> On Tue, Apr 8, 2014 at 2:25 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> i put some println statements in BlockManagerUI
>>>>
>>>> i have RDDs that are cached in memory. I see this:
>>>>
>>>>
>>>> ******************* onStageSubmitted **********************
>>>> rddInfo: RDD "2" (2) Storage: StorageLevel(false, false, false, false,
>>>> 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize:
>>>> 0.0 B; DiskSize: 0.0 B
>>>> _rddInfoMap: Map(2 -> RDD "2" (2) Storage: StorageLevel(false, false,
>>>> false, false, 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
>>>> B;TachyonSize: 0.0 B; DiskSize: 0.0 B)
>>>>
>>>>
>>>> ******************* onTaskEnd **********************
>>>> Map(2 -> RDD "2" (2) Storage: StorageLevel(false, false, false, false,
>>>> 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize:
>>>> 0.0 B; DiskSize: 0.0 B)
>>>>
>>>>
>>>> ******************* onStageCompleted **********************
>>>> Map()
>>>>
>>>> ******************* onStageSubmitted **********************
>>>> rddInfo: RDD "7" (7) Storage: StorageLevel(false, false, false, false,
>>>> 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize:
>>>> 0.0 B; DiskSize: 0.0 B
>>>> _rddInfoMap: Map(7 -> RDD "7" (7) Storage: StorageLevel(false, false,
>>>> false, false, 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0
>>>> B;TachyonSize: 0.0 B; DiskSize: 0.0 B)
>>>>
>>>> ******************* onTaskEnd **********************
>>>> Map(7 -> RDD "7" (7) Storage: StorageLevel(false, false, false, false,
>>>> 1); CachedPartitions: 0; TotalPartitions: 1; MemorySize: 0.0 B;TachyonSize:
>>>> 0.0 B; DiskSize: 0.0 B)
>>>>
>>>> ******************* onStageCompleted **********************
>>>> Map()
>>>>
>>>>
>>>> The storagelevels you see here are never the ones of my RDDs. and
>>>> apparently updateRDDInfo never gets called (i had println in there too).
>>>>
>>>>
>>>> On Tue, Apr 8, 2014 at 2:13 PM, Koert Kuipers <ko...@tresata.com>wrote:
>>>>
>>>>> yes i am definitely using latest
>>>>>
>>>>>
>>>>> On Tue, Apr 8, 2014 at 1:07 PM, Xiangrui Meng <men...@gmail.com>wrote:
>>>>>
>>>>>> That commit fixed the exact problem you described. That is why I want
>>>>>> to confirm that you switched to the master branch. bin/spark-shell 
>>>>>> doesn't
>>>>>> detect code changes, so you need to run ./make-distribution.sh to
>>>>>> re-compile Spark first. -Xiangrui
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 8, 2014 at 9:57 AM, Koert Kuipers <ko...@tresata.com>wrote:
>>>>>>
>>>>>>> sorry, i meant to say: note that for a cached rdd in the spark shell
>>>>>>> it all works fine. but something is going wrong with the
>>>>>>> SPARK-APPLICATION-UI in our applications that extensively cache and 
>>>>>>> re-use
>>>>>>> RDDs
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 8, 2014 at 12:55 PM, Koert Kuipers <ko...@tresata.com>wrote:
>>>>>>>
>>>>>>>> note that for a cached rdd in the spark shell it all works fine.
>>>>>>>> but something is going wrong with the spark-shell in our applications 
>>>>>>>> that
>>>>>>>> extensively cache and re-use RDDs
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 8, 2014 at 12:33 PM, Koert Kuipers 
>>>>>>>> <ko...@tresata.com>wrote:
>>>>>>>>
>>>>>>>>> i tried again with latest master, which includes commit below, but
>>>>>>>>> ui page still shows nothing on storage tab.
>>>>>>>>>  koert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> commit ada310a9d3d5419e101b24d9b41398f609da1ad3
>>>>>>>>> Author: Andrew Or <andrewo...@gmail.com>
>>>>>>>>> Date:   Mon Mar 31 23:01:14 2014 -0700
>>>>>>>>>
>>>>>>>>>     [Hot Fix #42] Persisted RDD disappears on storage page if
>>>>>>>>> re-used
>>>>>>>>>
>>>>>>>>>     If a previously persisted RDD is re-used, its information
>>>>>>>>> disappears from the Storage page.
>>>>>>>>>
>>>>>>>>>     This is because the tasks associated with re-using the RDD do
>>>>>>>>> not report the RDD's blocks as updated (which is correct). On stage 
>>>>>>>>> submit,
>>>>>>>>> however, we overwrite any existing
>>>>>>>>>
>>>>>>>>>     Author: Andrew Or <andrewo...@gmail.com>
>>>>>>>>>
>>>>>>>>>     Closes #281 from andrewor14/ui-storage-fix and squashes the
>>>>>>>>> following commits:
>>>>>>>>>
>>>>>>>>>     408585a [Andrew Or] Fix storage UI bug
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Apr 7, 2014 at 4:21 PM, Koert Kuipers 
>>>>>>>>> <ko...@tresata.com>wrote:
>>>>>>>>>
>>>>>>>>>> got it thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 7, 2014 at 4:08 PM, Xiangrui Meng 
>>>>>>>>>> <men...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> This is fixed in https://github.com/apache/spark/pull/281.
>>>>>>>>>>> Please try
>>>>>>>>>>> again with the latest master. -Xiangrui
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 7, 2014 at 1:06 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > i noticed that for spark 1.0.0-SNAPSHOT which i checked out a
>>>>>>>>>>> few days ago
>>>>>>>>>>> > (apr 5) that the "application detail ui" no longer shows any
>>>>>>>>>>> RDDs on the
>>>>>>>>>>> > storage tab, despite the fact that they are definitely cached.
>>>>>>>>>>> >
>>>>>>>>>>> > i am running spark in standalone mode.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to