On Fri, Aug 6, 2010 at 9:53 AM, Alexander Sack <a...@linaro.org> wrote:
> Hi,
>
> On Fri, Aug 6, 2010 at 3:28 AM, Christian Robottom Reis <k...@linaro.org>
> wrote:
>>
>> Hi there!
>>
>>    I unpacked our minimal release image and ran an xdiskusage on it,
>> mostly to see what we're shipping -- and I was surprised to see that a
>> fourth of the image is actually apt package caches and lists.  Can we
>> put into the image generation script something to strip them out before
>> generating the image?
>
> if there are really .deb's shipped in the tarball then this is definitly
> waste and a bug.
>
> However, if its just the lists and pkg cache then I am not so convinced
> unless we say we
> remove apt (and dpkg) from our images (e.g. dont allow easy install/upgrade
> etc.).
>
> Those files would come back when running apt-get update etc., so the only
> thing we would win is smaller initial download bandwidth, while I think we
> are really after
> general/lasting disk foodprint savings.

We could remove these files, but I agree it may be a false
optimisation: the size of the release filesystem is no longer
representative of the steady-state size of the filesystem when it's in
use in this case.

Out of interest, does anyone know why dpkg/apt never migrated from the
"massive sequential text file" approach to something more
database-oriented?  I've often thought that the current system's
scalability has been under pressure for a long time, and that there is
potential for substantial improvements in footprint and performance -
though the Debian and Ubuntu communities would need to give their
support for such an approach, unless we wanted to switch to a
different packaging system.

> One thing we could do is remove universe from our default apt line. this
> probably would
> reduce the size of that directory by > 50% ...
>
> Long term we could have our own archive with less packages ... this could
> reduce size
> of those indexes etc. even further.
>
>>
>> The untarring also suggests a number of places where we could further
>> trim the image, some of which are probably pretty hard to do:
>>
>>   * stripping /usr/share/doc out (but everybody knew that)
>
> ack. we plan to do that using pitti's dpkg improvements; last time they
> didn't land
> in the archive yet, but I will check the status soon again.

It's interesting to note that due to the fact that /usr/share/doc
contains mostly nearly-empty directories and tiny files, the
filesystem overhead may be a significant part of the overall
consumption here - I estimate about 20-30% of the overall space,
assuming a typical filesystem with 4KB blocksize.

If we have to keep /usr/share/doc/ (for copyright notices and so on),
maybe it would be feasible to replace each /usr/share/doc/<package>/
with a tarball?  This would eliminate most of the overhead as well as
making the actual data smaller.  Since /usr/share/doc/ is not accessed
often, and not accessed by many automated tools, this might not cause
much disruption.

[...]

>>
>>   * stripping out modules for devices that won't ever be on
>>     this ARM device
>
> yeah, this feels to make sense. However, I am not sure how to draw the line.
> Maybe this is something the kernel WG can take a look at and come up with a
> reduced list of modules?

Classifying drivers by bus, and throwing out anything that can't be
physically connected, such as PCI/AGP/ISA might be an approach here.
Also, peripherals which can only be connected to on-SoC buses, but are
not present in a given platform's silicon could be excluded.  We would
still have to keep a lot though... anything which can be connected via
USB, for example.

A more ambitious solution might be to allow for dynamic installation
of missing modules, but that's probably a separate project since it
would impact on the way the kernel is packaged.

Currently we have no choice but to install absolutely everything "just
in case" (much like the way /dev used to contains 1000s if device
nodes that were never used).

Cheers
---Dave

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to