Just thought I would report back a bit more on this - 

The Unicode change doesn’t work in my case (possibly not for command line Pharo 
as well) as I get an error where OS filename’s need unicode support (actually I 
think this is where its trying to write to stdout, but I didn’t dig more into 
this):

Error: Instances of UndefinedObject are not indexable
UndefinedObject(Object)>>error:
UndefinedObject(Object)>>errorNotIndexable
UndefinedObject(Object)>>size
Unicode class>>isLetter:
Character>>isLetter
Path class>>isAbsoluteWindowsPath:
Path class>>from:delimiter:
MacStore(FileSystemStore)>>pathFromString:
FileSystem>>pathFromString:
ByteString(String)>>asPathWith:
FileSystem>>pathFromObject:
FileSystem>>referenceTo:
ByteString(String)>>asFileReference
FileStream class>>fullName:
FileStream class>>fileNamed:
SmalltalkImage>>openLog

I was able to improve on Guille’s warning about how to safely clear up 
monticello/metacello (and not use become: String new) with the following (I 
actually think Metacello should provide a #cleanUp method, so I raised a pr for 
consideration)
logger cr; nextPutAll: '>Removing Clearing MC Registry'.
MetacelloProjectRegistration resetRegistry.

I’m then able to be by image down from 22mb to 13.8 (which is pretty good).

As a further experiment I also noticed that there is a fair amount of space 
trapped in Protocols and ClassOrganisation - so I tried clearing those out (as 
they are lazily cached) with:
Smalltalk allClassesAndTraits do: [:c | c basicOrganization: nil ].
This seems to give me a further 1mb back (but I have’t tried performance tests 
on this, but my naive assumption is that in a running system that isn’t 
adding/manipulating code - that I don’t think Protocols are used?). So I’m now 
at 21mb.

Tim

> On 16 Aug 2017, at 10:53, Guillermo Polito <guillermopol...@gmail.com> wrote:
> 
> 
> 
> On Wed, Aug 16, 2017 at 11:46 AM, Tim Mackinnon <tim@testit.works 
> <mailto:tim@testit.works>> wrote:
> Hi, tracing through your changes - it looks like:
> 
> Smalltalk cleanUp: true except: #() confirming: false.
> Takes care of all the non-unicode changes you proposed (and it seems like its 
> a known cleanup protocol).
> 
> I based my script on #cleanupForRelease ^^. But I did not just blindly 
> execute it as is because I wanted to understand the implications of each line.
>  
> I wonder if the Unicode change is worth it/risky as many web based services I 
> might connect to with Zinc do support Unicode so maybe I should keep that one 
> in. (I will for now - might verify how much of a difference it really makes)
> 
> No, it should not break any encoding/decoding. The changes I proposed will 
> just nil out two things:
> 
>  - the uppercase/lowercase mapping unicode tables that says for each 
> codepoint if the codepoint is uppercase/lowercase and allows transformations 
> from/to uppercase/lowercase. This means that these may not work as expected:
> 
>          aChar asLowercase
>          aChar asUppercase
>          aChar toLowercase
>          aChar toUppercase
> 
>  - the unicode classification table that says if a character is letter or 
> digit, and so on. This means that these may not work as expected:
>      
>         aChar isLetter
>         aChar isDigit
>         aChar isAlphaNumeric
>  
> I think my next port of call is cleanUp for Monticello/Metacello as I see a 
> fair amount of that stuff floating around in my image (after I’ve used it to 
> bootstrap my code).
> 
> Tim
> 
>> On 16 Aug 2017, at 02:32, Guillermo Polito <guillermopol...@gmail.com 
>> <mailto:guillermopol...@gmail.com>> wrote:
>> 
>> Actually it happens first that monticello is "nicely" coupled with the 
>> changeset system and logs all the source code loaded in change sets :D :/ 
>> ¬¬. Also, the first two strings in terms of size are related to unicode 
>> tables (we should put them in files instead of in the image and load them on 
>> demand), and the two biggest arrays also to unicode. I just tried the 
>> following in a clean bootstrapped "minimal" image (metacello):
>> 
>> "Careful, this will make that #isLetter, #isUppercase #isLowercase, 
>> #toLowercase and #toUppercase only work on ascii"
>> Character characterSet: nil.
>> Unicode classPool at: #GeneralCategory put: nil.
>> Unicode classPool at: #DecimalProperty put: nil.
>> 
>> UnicodeDefinition removeFromSystem.
>> ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ].
>> ChangeSet resetCurrentToNewUnnamedChangeSet.
>> MCDefinition clearInstances.
>> Undeclared removeUnreferencedKeys.
>> Smalltalk garbageCollect.
>> 
>> like this:
>> 
>> ./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image eval --save "Character 
>> characterSet: nil. Unicode classPool at: #GeneralCategory put: nil. Unicode 
>> classPool at: #DecimalProperty put: nil. UnicodeDefinitions 
>> removeFromSystem. ChangeSet removeChangeSetsNamedSuchThat: [ :each | true ]. 
>> ChangeSet resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. 
>> Undeclared removeUnreferencedKeys. Smalltalk garbageCollect."
>> 
>> and my image went down from 11MB to 6.6MB (7.0 MB if I don't change back to 
>> ascii with the first three lines)
>> 
>> Then I tried a tally:
>> 
>> ./vm/pharo Pharo7.0-metacello-32bit-fa236b7.image save spacetally
>> 
>> ./vm/pharo spacetally.image eval --save "repo := MCFileTreeRepository new 
>> directory: '../src' asFileReference. version := repo 
>> loadVersionFromFileNamed: 'Tool-Profilers.package'. version load."
>> 
>> re-clean since i loaded some packages
>> 
>> ./vm/pharo spacetally.image eval --save "ChangeSet 
>> removeChangeSetsNamedSuchThat: [ :each | true ]. ChangeSet 
>> resetCurrentToNewUnnamedChangeSet. MCDefinition clearInstances. Undeclared 
>> removeUnreferencedKeys. Smalltalk garbageCollect."
>> 
>> This image is now 6.6MB (7.1MB with the unicode large arrays), 4.1% of 
>> strings (274k) what seems reasonable. Remaining big strings are Pharo's 
>> licence, the buffer of the changes file and then some class comments 
>> (shouldn't they be fetched from disk as any other method source code?).
>> 
>> Making again a tally shows that ~30% of the space is taken by Arrays and 
>> 21.9% by compiled methods. But, BUT! :) I have ~30k arrays and lots of 
>> collections also:
>> 
>> "MethodDictionary"              2872 +
>> "IdentitySet"                         12781 + 
>> "OrderedCollection"             4398 + 
>> "Set"                                     2959 +
>> "Dictionary"                          1997 +
>> "IdentityDictionary"               454
>> -----------------------------------------------
>> 25461
>> 
>> So there are ~5k arrays that are used outside collections.
>> 
>> Worth exploring a bit more I think.
>> 
>> On Wed, Aug 16, 2017 at 1:23 AM, Guillermo Polito <guillermopol...@gmail.com 
>> <mailto:guillermopol...@gmail.com>> wrote:
>> 
>> 
>> On Tue, Aug 15, 2017 at 11:26 PM, Tim Mackinnon <tim@testit.works 
>> <mailto:tim@testit.works>> wrote:
>> Hi Guille/Ben - I got a quick moment to try the SpaceTally (aside: it seems 
>> very convoluted to load a single package into the image, I was trying to 
>> avoid having to create a baselineOf for something so simple - I ended up 
>> with:
>> 
>> I know, I also believe we have to simplify this. In any case, baselines are 
>> healthy as they allow to also express dependencies. Otherwise you'll end up 
>> loading dependencies by hand. We'll fix this soon I hope.
>>  
>> 
>> repo := MCFileTreeRepository new directory: './bootstrap' asFileReference.
>> version := repo loadVersionFromFileNamed: 'Tool-Profilers.package'.
>> version load.
>> 
>> Anyway - in my minimal image, like in the fat image there seems to be a 
>> surprising amount of bytestrings (4mb worth?). I think that might need some 
>> digging into? It seems like a lot somehow. Although Ben’s neat experiment of 
>> zipping strings shows that’s not a real route.
>> 
>> In a deployed minimal image - maybe I can get rid of some other things like 
>> MethodChangeRecords or MCMethodDefiniion’s (but they are smaller wins - but 
>> noticeable)
>> 
>> Class                                          code space # instances  inst 
>> space     percent   inst average size
>> ByteString                                           2640       37365       
>> 4823848       21.50              129.10
>> Array                                                3742       53002       
>> 3961944       17.60               74.75
>> CompiledMethod                                      19159       30481       
>> 2912968       13.00               95.57
>> Association                                          1148       58348       
>> 1867136        8.30               32.00
>> MethodChangeRecord                                    431       34312       
>> 1097984        4.90               32.00
>> ByteArray                                            4605         290        
>> 908728        4.00             3133.54
>> ByteSymbol                                           1698       22689        
>> 840168        3.70               37.03
>> IdentitySet                                           408       19076        
>> 610432        2.70               32.00
>> MethodDictionary                                     3310        3520        
>> 608688        2.70              172.92
>> WeakArray                                            1758        3024        
>> 597824        2.70              197.69
>> MCMethodDefinition                                   4318        6659        
>> 426176        1.90               64.00
>> Protocol                                             1679        8382        
>> 268224        1.20               32.00
>> OrderedCollection                                    6555        5509        
>> 220360        1.00               40.00 
>> 
>> As an aside - my Gitlab project is public, the scripts that load things up 
>> are in ./scripts (build.sh, and minimal.st <http://minimal.st/> and 
>> loadlocal.st <http://loadlocal.st/>)
>> 
>> Tim
>> 
>>> On 15 Aug 2017, at 08:02, Guillermo Polito <guillermopol...@gmail.com 
>>> <mailto:guillermopol...@gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Mon, Aug 14, 2017 at 4:42 PM, Tim Mackinnon <tim@testit.works 
>>> <mailto:tim@testit.works>> wrote:
>>> Hi Guille - just running SpaceTally on my dev image to get a feel for it. 
>>> It turns out that in the minimal images you’ve been creating, its not 
>>> loaded (makes sense).
>>> 
>>> Yup, it's loaded afterwards.
>>> 
>>> All packages are loaded through metacello baselines. We should start 
>>> refactoring and making standalone projects, each one with a baseline for 
>>> himself, and his own dependencies described.
>>> 
>>> I was checking on your gitlab and I have probably no access: how are you 
>>> finally loading packages in the bootstrap image? Can you share that with us 
>>> in text? I'd like to improve that situation.
>>>  
>>> I’m wondering if there is an easy way to import it in (I guess that package 
>>> should be in the Pharo git tree I cloned to get Fuel loaded right? Or is 
>>> there a separate standalone source?).
>>> 
>>> Yes it is, you can get the package programatically doing 
>>> 
>>> SpaceTally package name
>>> 
>>> And furthermore, get the baseline that currently is loading by doing
>>> 
>>> package := SpaceTally package name.
>>> BaselineOf subclasses select: [ :e | 
>>>     e project version packages anySatisfy: [ :p | p name = package ]].
>>>  
>>> 
>>> Thanks for all the support, and your email about why the contexts stack up 
>>> is very well received (I will comment over there).
>>> 
>>> By the way - it looks like Martin Fowler picked up on this announcement - 
>>> so maybe we might get some interest from his mass of followers.
>>> 
>>> Tim
>>> 
>>>> On 14 Aug 2017, at 10:49, Guillermo Polito <guillermopol...@gmail.com 
>>>> <mailto:guillermopol...@gmail.com>> wrote:
>>>> 
>>>> Hi Tim,
>>>> 
>>>> On Mon, Aug 14, 2017 at 11:41 AM, Tim Mackinnon <tim@testit.works 
>>>> <mailto:tim@testit.works>> wrote:
>>>> Hey guys, thanks for your enthusiasm around this - and I cannot stress 
>>>> enough how this was only possible because of the work that has gone into 
>>>> making Pharo (in particular the 64bit image, as well as having a minimal 
>>>> image, and some great blog posts on serialising contexts) as well as the 
>>>> patience from everyone in answering questions and helping me get it all 
>>>> working.
>>>> 
>>>> I’m still quite keen to get my execution time back down under 800ms and 
>>>> I’d like to actually get back to writing a few skills to automate a few 
>>>> things around my house.
>>>> 
>>>> To Answer Denis’ question - 
>>>> 
>>>> My final footprint is 30.4mb - thats composed of a 22mb image (with a 
>>>> simple example that pulls in Fuel, ZTimestamp and the S3 Library which 
>>>> depends on XMLParser) and then the VM (from which I removed obvious dll’s).
>>>> 
>>>> In my original experiments with a 6.0 minimal image - I did manage to get 
>>>> to a 13.4mb image (which started out as 12mb original size, and then 
>>>> loaded in STON and had only a simple clock example). I think the sweet 
>>>> spot is around 20mb total footprint as that seems to get me into the 
>>>> 450ms-900ms range.
>>>> 
>>>> The 7.0 min image now starts out at 15mb and then I’m not sure why loading 
>>>> Fuel, S3 and XMLParser takes 7mb (it seems big to me - but I’ve not dug 
>>>> into that).
>>>> 
>>>> You can do further space analysis using the following expression
>>>> 
>>>> SpaceTally  new printSpaceAnalysis
>>>> 
>>>> You can do that in an eval and check what's taking space. With measures we 
>>>> can iterate and improve :).
>>>>  
>>>> I’ve also found (and this on the back of unserialising the context in my 
>>>> example) that the way we build images has 15+ saved stack sessions that 
>>>> have saved on top of each other from the way we build up the images. I 
>>>> don’t yet know the implications of size/speed of these - but we need a 
>>>> better way of folding executions when we snapshot headless images. I’m 
>>>> also not clear if there are any other startup tasks that take precious 
>>>> time (this also has implications for our fat development images as they 
>>>> take much longer to appear than they really should).
>>>> 
>>>> I'm working on this as I'm writing this mail ;)
>>>> 
>>>> https://pharo.fogbugz.com/f/cases/20309 
>>>> <https://pharo.fogbugz.com/f/cases/20309>
>>>> https://github.com/pharo-project/pharo/pull/196 
>>>> <https://github.com/pharo-project/pharo/pull/196> 
>>>> 
>>>> I'll write down the implications further in a different thread.
>>>> 
>>>> 
>>>> I’ll be exploring some of these size/speed tradeoff’s in follow on 
>>>> messages.
>>>> 
>>>> But once again, a big thanks - I’ve not enjoyed programming like this for 
>>>> ages.
>>>> 
>>>> Tim
>>>> 
>>>>> On 12 Aug 2017, at 16:26, Ben Coman <b...@openinworld.com 
>>>>> <mailto:b...@openinworld.com>> wrote:
>>>>> 
>>>>> hi Tim,  
>>>>> 
>>>>> That is.....      AWESOME!
>>>>> 
>>>>> Very nice delivery - it flowed well with great narration. 
>>>>> 
>>>>> I loved @2:17 "this is the interesting piece, because PharoLambda has 
>>>>> serialized the execution context of its application and saved it into [my 
>>>>> S3 bucket] ... [then on the local machine] rematerializes a debugger [on 
>>>>> that context]."
>>>>> 
>>>>> There is a clarity in your video presentation that really may intrigue 
>>>>> outsiders. As a community we should push this on the usual hacker forums 
>>>>> - ycombinator could be a good starting point (but I'm locked out of my 
>>>>> account there).  
>>>>> An enticing title could be...
>>>>> "Debugging Lambdas by re-materializing saved execution contexts on your 
>>>>> local machine."
>>>>> 
>>>>> cheers -ben
>>>>> 
>>>>> On Fri, Aug 11, 2017 at 3:37 PM, Denis Kudriashov <dionisi...@gmail.com 
>>>>> <mailto:dionisi...@gmail.com>> wrote:
>>>>> This is cool Tim.
>>>>> 
>>>>> So what image size you deployed at the end?
>>>>> 
>>>>> 2017-08-10 15:47 GMT+02:00 Tim Mackinnon <tim@testit.works 
>>>>> <mailto:tim@testit.works>>:
>>>>> I just wanted to thank everyone for their help in getting my pet project 
>>>>> further along, so that now I can announce that PharoLambda is now working 
>>>>> with the V7 minimal image and also supports post mortem debugging by 
>>>>> saving a zipped fuel context onto S3.
>>>>> 
>>>>> This latter item is particularly satisfying as at a recent serverless 
>>>>> conference (JeffConf) there was a panel where poor development tools on 
>>>>> serverless platforms was highlighted as a real problem.
>>>>> 
>>>>> In our community we’ve had these kinds of tools at our fingertips for 
>>>>> ages - but I don’t think the wider development community has really 
>>>>> noticed. Debugging something short lived like a Lambda execution is quite 
>>>>> startling, as the current answer is “add more logging”, and we all know 
>>>>> that sucks. To this end, I’ve created a little screencast showing this in 
>>>>> action - and it was pretty cool because it was a real example I 
>>>>> encountered when I got everything working and was trying my test 
>>>>> application out.
>>>>> 
>>>>> I’ve also put a bit of work into tuning the excellent GitLab CI tools, so 
>>>>> that I can cache many of the artefacts used between different build runs 
>>>>> (this might also be of interest to others using CI systems).
>>>>> 
>>>>> The Gitlab project is on: https://gitlab.com/macta/PharoLambda 
>>>>> <https://gitlab.com/macta/PharoLambda>
>>>>> And the screencast: https://www.youtube.com/watch?v=bNNCT1hLA3E 
>>>>> <https://www.youtube.com/watch?v=bNNCT1hLA3E>
>>>>> 
>>>>> Tim
>>>>> 
>>>>> 
>>>>>> On 15 Jul 2017, at 00:39, Tim Mackinnon <tim@testit.works 
>>>>>> <mailto:tim@testit.works>> wrote:
>>>>>> 
>>>>>> Hi - I’ve been playing around with getting Pharo to run well on AWS 
>>>>>> Lambda. It’s early days, but I though it might be interesting to share 
>>>>>> what I’ve learned so far.
>>>>>> 
>>>>>> Usage examples and code at https://gitlab.com/macta/PharoLambda 
>>>>>> <https://gitlab.com/macta/PharoLambda>
>>>>>> 
>>>>>> With help from many of the folks here, I’ve been able to get a simple 
>>>>>> example to run in 500ms-1200ms with a minimal Pharo 6 image. You can 
>>>>>> easily try it out yourself. This seems slightly better than what the 
>>>>>> GoLang folks have been able to do.
>>>>>> 
>>>>>> Tim
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>>    
>>>> Guille Polito
>>>> 
>>>> Research Engineer
>>>> French National Center for Scientific Research - http://www.cnrs.fr 
>>>> <http://www.cnrs.fr/>
>>>> 
>>>> 
>>>> Web: http://guillep.github.io <http://guillep.github.io/>
>>>> Phone: +33 06 52 70 66 13 <tel:+33%206%2052%2070%2066%2013>
>>> 
>>> 
>>> 
>>> -- 
>>>    
>>> Guille Polito
>>> 
>>> Research Engineer
>>> French National Center for Scientific Research - http://www.cnrs.fr 
>>> <http://www.cnrs.fr/>
>>> 
>>> 
>>> Web: http://guillep.github.io <http://guillep.github.io/>
>>> Phone: +33 06 52 70 66 13 <tel:+33%206%2052%2070%2066%2013>
>> 
>> 
>> 
>> -- 
>>    
>> Guille Polito
>> 
>> Research Engineer
>> French National Center for Scientific Research - http://www.cnrs.fr 
>> <http://www.cnrs.fr/>
>> 
>> 
>> Web: http://guillep.github.io <http://guillep.github.io/>
>> Phone: +33 06 52 70 66 13 <tel:+33%206%2052%2070%2066%2013>
>> 
>> 
>> -- 
>>    
>> Guille Polito
>> 
>> Research Engineer
>> French National Center for Scientific Research - http://www.cnrs.fr 
>> <http://www.cnrs.fr/>
>> 
>> 
>> Web: http://guillep.github.io <http://guillep.github.io/>
>> Phone: +33 06 52 70 66 13 <tel:+33%206%2052%2070%2066%2013>
> 
> 
> 
> -- 
>    
> Guille Polito
> 
> Research Engineer
> French National Center for Scientific Research - http://www.cnrs.fr 
> <http://www.cnrs.fr/>
> 
> 
> Web: http://guillep.github.io <http://guillep.github.io/>
> Phone: +33 06 52 70 66 13

Reply via email to