Hi, I also maintain some mailing list archives with wilma. (and run debian) I'm wondering if you can tell me about your configuration. We seem to have hit a size limit or something, and can't keep a glimpse index that doesn't get corrupted, even after reindexing our archives completely from scratch.
With what arguments do you call glimpseindex? How does your crontab line look? What versions of the wilma software are you running? Made changes? On our setup, the raw ascii archives are ~275MB, and the html archives are ~590MB. We are running pretty much the stock wilma configuration of glimpse 3.6, perl 5.0, and mhonarc 2.1.0. Sendmail pushes the mail. The box is running a pretty rough installation of redhat 5.2, kernel 2.0.36. (Not my machine.) Memory is .5GB, P2 CPU. I have a feeling there might be some piece of text that's causing striphtml to hang, I've had to 'kill -9' that subprocess every time I've tried a full reindex in the last few months, at some point during the job. Anyway, I've been thrashing, figured I'd check with someone else. Thanks, morgan