Don Stewart ha scritto:
manlio_perillo:
Hi.
After some work I have managed to implement two simple programs that
parse the Netflix Prize data set.
For details about the Netflix Prize, there was a post by Kenneth Hoste
some time ago.
I have cabalized the program, and made available here:
manlio_perillo:
> Hi.
>
> After some work I have managed to implement two simple programs that
> parse the Netflix Prize data set.
>
> For details about the Netflix Prize, there was a post by Kenneth Hoste
> some time ago.
>
> I have cabalized the program, and made available here:
> http://hask
Manlio Perillo ha scritto:
Manlio Perillo ha scritto:
[...]
I have executed the program, using the same RTS flags as yours:
real6m13.523s
user0m53.931s
sys0m7.812s
815 MB usage
This is an huge improvement!
Using UArray and empty + insert:
real 5m40.732s
user 0m5
Manlio Perillo ha scritto:
[...]
I have executed the program, using the same RTS flags as yours:
real6m13.523s
user0m53.931s
sys0m7.812s
815 MB usage
This is an huge improvement!
Now I have to check if using insert will further improve memory usage.
And ... surprise!
Kenneth Hoste ha scritto:
[...]
The 26m/700MB I mentioned on my blog was on my ancient PowerBook G4
(1.5GHz PowerPC G4, 1.25G).
I redid the same experiment on our iMac (Core2 Duo, 2.0 GHz, 3.0G), i.e.:
- read in all the data
- count the number of keys in the IntMap (which should be 17,770, i.e
Hello Austin,
Monday, March 2, 2009, 11:51:52 PM, you wrote:
>> let's calculate. if at GC moment your program has allocated 100 mb of
>> memory and only 50 mb was not a garbage, then memory usage will be 150
>> mb
> ? A copying collector allocates a piece of memory (say 10mb) which is
> used as
Excerpts from Bulat Ziganshin's message of Mon Mar 02 10:14:35 -0600 2009:
> let's calculate. if at GC moment your program has allocated 100 mb of
> memory and only 50 mb was not a garbage, then memory usage will be 150
> mb
? A copying collector allocates a piece of memory (say 10mb) which is
use
Hi Kenneth,
I've thrown my current code online at http://boegel.kejo.be/files/Netflix_read-and-parse_24-02-2009.hs
,
let me know if it's helpful in any way...
Maybe you could set up a darcs repo for this, such that we can submit
patches against your code?
-- Andy
Hello Kenneth,
Monday, March 2, 2009, 11:14:27 PM, you wrote:
> I think my approach is turning out better because I'm:
> - building up the IntMap using 'empty' and 'insert', instead of
> combining 17,770 'singleton' IntMaps
>(which probably results better GC behavior)
i don't read into de
On Mar 2, 2009, at 19:13 , Manlio Perillo wrote:
Manlio Perillo ha scritto:
[...]
> moreover, you may set up"growing factor". with a g.f. of
1.5, for example, memory will be collected once heap will become
1.5x
larger than real memory usage after last GC. this effectively
guarantees that me
Manlio Perillo ha scritto:
[...]
> moreover, you may set up"growing factor". with a g.f. of
1.5, for example, memory will be collected once heap will become 1.5x
larger than real memory usage after last GC. this effectively
guarantees that memory overhead will never be over this factor
Thank
Bulat Ziganshin ha scritto:
Hello Manlio,
Monday, March 2, 2009, 8:16:10 PM, you wrote:
By the way: I have written the first version of the program to parse
Netflix training data set in D.
I also used ncpu * 1.5 threads, to parse files concurrently.
However execution was *really* slow, due
Hello Manlio,
Monday, March 2, 2009, 8:16:10 PM, you wrote:
> By the way: I have written the first version of the program to parse
> Netflix training data set in D.
> I also used ncpu * 1.5 threads, to parse files concurrently.
> However execution was *really* slow, due to garbage collection.
>
Hello Manlio,
Monday, March 2, 2009, 8:16:10 PM, you wrote:
> 1) With default collection algorithm, I have:
> 2) With -c option:
> So, nothing changed.
you should look into +RTS -s stats. those 409 vs 418 mb is just
somewhat random values, since GCs in those 2 inviocations are not
synchronized
Bulat Ziganshin ha scritto:
Hello Manlio,
Monday, March 2, 2009, 6:30:51 PM, you wrote:
The process-data-1 program parse the entire dataset using about 1.4 GB
of memory (3x increment).
This is strange.
The memory required is proportional to the number of ratings.
It may be IntMap the culpri
Hello Manlio,
Monday, March 2, 2009, 6:30:51 PM, you wrote:
> The process-data-1 program parse the entire dataset using about 1.4 GB
> of memory (3x increment).
> This is strange.
> The memory required is proportional to the number of ratings.
> It may be IntMap the culprit, or the garbage colle
Hi.
After some work I have managed to implement two simple programs that
parse the Netflix Prize data set.
For details about the Netflix Prize, there was a post by Kenneth Hoste
some time ago.
I have cabalized the program, and made available here:
http://haskell.mperillo.ath.cx/netflix-0.0.
17 matches
Mail list logo