On Fri, Sep 04, 2015 at 10:20:01AM +0200, Adam Wolk wrote: | Hi misc@ | | I upgraded my mail server to an amd64 snapshot from Sep 2nd and found | the server stuck delivering mail in the morning with spamassasin | churning at 90% CPU usage. | | Quick investigation lead me to a huge bayes_toks file of 65.3G in | /var/spampd/.spamassasin/. | | $ ls -alh | total 4738352 | drwx------ 2 _spampd _spampd 512B Sep 4 10:00 . | drwxr-xr-x 3 _spampd _spampd 512B Sep 3 15:57 .. | -rw------- 1 _spampd _spampd 36B Sep 4 09:53 bayes.lock | -rw------- 1 _spampd _spampd 9.8M Sep 3 22:52 bayes_seen | -rw------- 1 _spampd _spampd 65.3G Sep 3 22:55 bayes_toks | | $ file | bayes_toks bayes_toks: Berkeley DB 1.85 (Hash, version 2, native | byte-order) | | | Interestingly I don't see that much space used with df (anyone knows | why?):
You should read up on sparse files. Here's a quick trick from the sparse files book of tricks: # First we create a file 'bigfile' using dd: [weerd@despair] $ dd if=/dev/zero of=bigfile bs=1048576 count=10 seek=1024 10+0 records in 10+0 records out 10485760 bytes transferred in 0.178 secs (58799094 bytes/sec) # ls will tell us how big this file is: [weerd@despair] $ ls -lh bigfile -rw-r--r-- 1 weerd weerd 1.0G Sep 4 19:51 bigfile # du will tell us how much space is in use by this file: [weerd@despair] $ du -sh bigfile 10.1M bigfile # cp is even better at the sparse files game: [weerd@despair] $ cp bigfile bigfile2 # bigfile2 is the same as bigfile: [weerd@despair] $ ls -lh bigfile2 -rw-r--r-- 1 weerd weerd 1.0G Sep 4 19:54 bigfile2 # No, really .. exactly the same: [weerd@despair] $ md5 bigfile* MD5 (bigfile) = 5ec6988d232a445bc40b9dca003b95f7 MD5 (bigfile2) = 5ec6988d232a445bc40b9dca003b95f7 # However, it uses a lot less disk space: [weerd@despair] $ du -sh bigfile2 48.0K bigfile2 TL;DR: files with lots of emptiness (consecutive ranges of all 0 data) are efficiently stored using "sparse files" | $ df -h | Filesystem Size Used Avail Capacity Mounted on | /dev/sd0a 1008M 90.1M 868M 9% / | /dev/sd0k 9.8G 80.3M 9.3G 1% /home | /dev/sd0d 3.9G 118K 3.7G 0% /tmp | /dev/sd0f 3.9G 1.0G 2.7G 28% /usr | /dev/sd0g 1001M 212M 738M 22% /usr/X11R6 | /dev/sd0h 9.8G 572M 8.8G 6% /usr/local | /dev/sd0j 3.9G 2.0K 3.7G 0% /usr/obj | /dev/sd0i 2.0G 2.0K 1.9G 0% /usr/src | /dev/sd0e 598G 4.3G 564G 1% /var | | I removed the file and disk usage dropped by 2.3G on /var. | | | Did anyone experience issues with spamassasin/spampd similar to the | one reported above? | | p5-Mail-SpamAssassin-3.4.1p2 (installed) | spampd-2.30p3 (installed) | | After deleting the file, restarting the service processing a single | email brought the DB to reported size 37.9M, few emails later it's | already reported as 113M I have a hunch that it will bloat again really | fast. | | Regards, | Adam | -- >++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+ +++++++++++>-]<.>++[<------------>-]<+.--------------.[-] http://www.weirdnet.nl/