On Fri, Sep 04, 2015 at 10:20:01AM +0200, Adam Wolk wrote:
| Hi misc@
| 
| I upgraded my mail server to an amd64 snapshot from Sep 2nd and found
| the server stuck delivering mail in the morning with spamassasin
| churning at 90% CPU usage.
| 
| Quick investigation lead me to a huge bayes_toks file of 65.3G in
| /var/spampd/.spamassasin/.
| 
| $ ls -alh
| total 4738352
| drwx------  2 _spampd  _spampd   512B Sep  4 10:00 .
| drwxr-xr-x  3 _spampd  _spampd   512B Sep  3 15:57 ..
| -rw-------  1 _spampd  _spampd    36B Sep  4 09:53 bayes.lock
| -rw-------  1 _spampd  _spampd   9.8M Sep  3 22:52 bayes_seen
| -rw-------  1 _spampd  _spampd  65.3G Sep  3 22:55 bayes_toks
| 
| $ file
| bayes_toks bayes_toks: Berkeley DB 1.85 (Hash, version 2, native
| byte-order)
| 
| 
| Interestingly I don't see that much space used with df (anyone knows
| why?):

You should read up on sparse files.  Here's a quick trick from the
sparse files book of tricks:

# First we create a file 'bigfile' using dd:
[weerd@despair] $ dd if=/dev/zero of=bigfile bs=1048576 count=10 seek=1024
10+0 records in
10+0 records out
10485760 bytes transferred in 0.178 secs (58799094 bytes/sec)

# ls will tell us how big this file is:
[weerd@despair] $ ls -lh bigfile
-rw-r--r--  1 weerd  weerd   1.0G Sep  4 19:51 bigfile

# du will tell us how much space is in use by this file:
[weerd@despair] $ du -sh bigfile
10.1M   bigfile

# cp is even better at the sparse files game:
[weerd@despair] $ cp bigfile bigfile2

# bigfile2 is the same as bigfile:
[weerd@despair] $ ls -lh bigfile2
-rw-r--r--  1 weerd  weerd   1.0G Sep  4 19:54 bigfile2

# No, really .. exactly the same:
[weerd@despair] $ md5 bigfile*
MD5 (bigfile) = 5ec6988d232a445bc40b9dca003b95f7
MD5 (bigfile2) = 5ec6988d232a445bc40b9dca003b95f7

# However, it uses a lot less disk space:
[weerd@despair] $ du -sh bigfile2
48.0K   bigfile2


TL;DR: files with lots of emptiness (consecutive ranges of all 0 data)
are efficiently stored using "sparse files"

| $ df -h
| Filesystem     Size    Used   Avail Capacity  Mounted on
| /dev/sd0a     1008M   90.1M    868M     9%    /
| /dev/sd0k      9.8G   80.3M    9.3G     1%    /home
| /dev/sd0d      3.9G    118K    3.7G     0%    /tmp
| /dev/sd0f      3.9G    1.0G    2.7G    28%    /usr
| /dev/sd0g     1001M    212M    738M    22%    /usr/X11R6
| /dev/sd0h      9.8G    572M    8.8G     6%    /usr/local
| /dev/sd0j      3.9G    2.0K    3.7G     0%    /usr/obj
| /dev/sd0i      2.0G    2.0K    1.9G     0%    /usr/src
| /dev/sd0e      598G    4.3G    564G     1%    /var
| 
| I removed the file and disk usage dropped by 2.3G on /var.
| 
| 
| Did anyone experience issues with spamassasin/spampd similar to the
| one reported above?
| 
| p5-Mail-SpamAssassin-3.4.1p2 (installed)
| spampd-2.30p3 (installed)
| 
| After deleting the file, restarting the service processing a single
| email brought the DB to reported size 37.9M, few emails later it's
| already reported as 113M I have a hunch that it will bloat again really
| fast.
| 
| Regards,
| Adam
| 

-- 
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/                 

Reply via email to