On 05/24/2010 05:28 AM, Nathan Gibbs wrote: > * Török Edwin wrote: >> On 05/23/2010 02:46 AM, Nathan Gibbs wrote: >>> I've wondered about a similar idea for speeding up file scanning. >>> Especially in regards to daily system scans, >>> >>> After an initial scan, record a file's name, checksum, and time scanned. >>> On rescan, >>> If a file's checksum hasn't changed from time scanned { >>> Scan it with the DB sigs that have changed since then >> >> There are some exceptions here, some databases (daily.ftm, daily.ign(2), >> daily.fp, daily.wdb, daily.idb) affect other signatures by their nature. >> So if any of these change it should be considered that the entire DB >> changed. > > I figured that there would be some things in the internal workings that I > didn't know about. > >> Also signature removals need to be considered: a previously >> infected file can become clean. >> > > My mistake there. What I was thinking was After an initial scan and the file > is clean, make a record. > Enumerate good, not Enumerate everything. > Infected stuff would never get recorded, & get hit with everything on the next > scan. If it became clean, then it would get recorded. If something > previously marked as clean became infected after the next "partial when > possible" scan, its record would be dropped. > I hope that clarifies what I meant, but didn't say very well. > :-)
You are right signature removals are not a problem in the "cache clean" scenario. > >> This would also mean more memory usage since we would need 2 AC tries: >> one for the full DB, and one for the partial DB since last reload. >> The partial DB would probably not use that much memory though. >> > > True, > Possible solutions > 1. > Would it be possible to point the engine at a start sig in the full DB? > If we have a clean checksum, start scanning at sig X instead of from the > beginning. > There is a lot I don't understand about how ClamAV works internally, so this > may simply not work. But if it did, that would solve the memory issue. > 2. > Store an array of pointers to sigs needed to do a partial scan. > Hand the engine the list on partial scans. > That might be the more elegant solution. It might be possible to do this for the MD5 signatures, not the NDB signatures though. An AC trie needs to be built out of the full set of signatures you intend to scan with. The trie is actually more like an automaton, so you need a new one each time you add or remove a signature. I'm sure there is a way to do, it is just not that simple :) Actually for full system scans there might be a way to do it that doesn't involve many (or in fact any) changes to the engine: Store the CVD used, and a DB of clean file hashes/sizes. When you want a rescan you give the old CVD and the new CVD to a script, which compares the 2 databases and builds you a partial DB in a temporary directory (if a partial scan is possible, see the corner cases above). Then you can start a scan with just that temporary DB. In 0.96 we started with something simple (cache clean, erase cache on reload) that works. We can always improve that later. Even the simple solution has some corner cases though (for example recursion depth needs to be taken into account), so care needs to be taken. > >> I think this might speed up full-system scheduled scans (compared to the >> caching we already do, see below). >> >>> else >>> Scan it like it does now >>> ( with everything in the DB, I assume. ) >>> } >> >> A simpler form of this is already implemented in 0.96 :) >> >> If a file is determined to be clean, its MD5 is added to an in-memory cache. >> When scanning a new file, its MD5 is computed and looked up in the >> cache. If found, it is considered clean. > > Cool, they read my mind a few months back. > Thats exactly what my idea looked like initially. > My current idea, just stores that to a file for latter reference. After > reload. > :-) > > Is that what happens when you run ClamAV on your brain and turn Stats > reporting on? > While scanning it mines any idea's about ClamAV improvement and sends them to > the ClamAV Team. LOL. Best regards, --Edwin _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://www.clamav.net/support/ml