[clamav-users] ClamScan does how much of this (heuristical analysis/sandboxes)?

Swudu Susuwu via clamav-users Wed, 20 Mar 2024 09:52:23 -0700

To better secure us, would future versions of Clamscan add artificial neural 
networks (artificial CNS) to virus scanners?


Github has lots of FLOSS (Open Source Softwares) simulaters of CNS (at 
https://github.com/topics/artificial-neural-network , such as 
https://github.com/Rober-t/apxr_run/ ), which virus scanners could use to do 
this:

Just have training data inputs = samples of infected files/programs, and 
outputs = original files/programs (or null if no fresh programs to revert to), 
to produce artificial CNS to undo infections from files/programs.

Assume most antivirus programs have heuristical analysis and sandboxes, but if 
not here is how to do this:
Search for open source (FLOSS) virus scanners 
(https://github.com/topics/virus-scanner has lots) and look at how those scan 
executables to figure out what programs do to your OS.

Most look for OS opcodes (such as “int” or “syscall”) or look at what libraries 
the programs load and search for instructions such as “jmp” or “call” that goto 
system libraries, to flag programs that alter other programs and flag programs 
that alter page flags to have W+X (lots of malware alters pages to have both 
writable and executable access, so virus scanners block such programs)

To figure out what libraries a program loads,

refer to the specifications of the OS’s executable format -- “Portable 
Executable” for Windows ( 
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format 
https://wikipedia.org/wiki/Portable_Executable ), “Extended Linker Format” for 
most others such as UNIX and Linuxes ( 
https://wikipedia.org/wiki/Executable_and_Linkable_Format )

which would allow you to know what libraries a program loads at startup, plus 
those libraries’ functions’ addresses.
 
Virus scanners should also look at dynamic loads of functions ( 
https://www.codeproject.com/Questions/338807/How-to-get-list-of-all-imported-functions-invoked
 ) such as from GetProcAddress, or just flag functions such as GetProcAddress

For virus scanners to have better heuristical analysis, should flag programs 
that perform raw network accesses (versus OS functions to download/upload 
files),

or that alter files of which the program is not the owner.

Some of this requires that you not just look at what functions the program 
calls,

but also look at (if just constant parameters) or guess (if registers/addresses 
as parameters, antiviruses should use sandboxes or just flag all non-constant 
parameters to sensitive functions) what parameters the program passes to those 
functions.

Example outputs from Fdroid through sandbox/analysis:
https://www.virustotal.com/gui/file/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75
https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal
 
[https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal%20R2DBox/html]R2DBox[https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal%20R2DBox/html]/html[https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal%20R2DBox/html]
https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_Zenbox/html

The false positive outputs (from Zenbox) show the purpose of manual reviews for 
programs that your sandbox flags.

For the scanners with heuristical analysis and sandboxes,
the next logical move is to add artificial CNS.

Not all scanners have such analysis and scanners,
but clear that some such as Virustotal do, as the Urls show us.

Updates: adds howto use chroot/strace for sandboxes, and LLVM static analysis.
Earlier urls show Virustotal's heuristical analysis of Fdroid's package manager 
and review its behaviour through two sandboxes.

A POSIX OS such as Linux has "chroot()" (run `man chroot` for instructions) so 
that the programs you test cannot alter stuff out of the test,
and has "strace()" (run `man strace` for instructions, or look at 
https://opensource.com/article/19/10/strace
https://www.geeksforgeeks.org/strace-command-in-linux-with-examples/ ) which 
hooks all system calls and saves logs,
simple sandboxes just launch programs for a few seconds and dump such logs,
with additional heuristics to guess which calls should go to logs (so reviewers 
have to look through less.)

Autonomous sandboxes use the ideas from first post, to flag programs that do 
system calls that would alter resources that are not part of the program under 
test.

Heuristical analysis is similar to Clang/LLVM's static analysis tools (static 
analysis checks programs for accidental security threats such as bufferr 
over-runs/under-runs,) and you could use the FLOSS static analysis tools as a 
first basis for virus scanners, just add checks for deliberate security threats 
and flag those for manual reviews and warn not to run such programs before the 
reviews.
https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer is part 
of LLVM and is a FLOSS basis for analysis, has uses for virus scanners.
if you don't want LLVM, https://github.com/secure-software-engineering/phasar 
has just the analysis.

As for the artificial neurons/CNS,
those are as simple to use for this as the original post says.
If you want, would not require too much effort to do this,
but who has access to large sample databases for the artificial CNS?
Examples of howto setup APXR as artificial CNS; 
https://github.com/Rober-t/apxr_run/blob/master/src/examples/
Examples of howto setup HSOM as artificial CNS; 
https://github.com/CarsonScott/HSOM/tree/master/examples
Simple to setup once you have access to databases.
Just as (if humans grew trillions of neurons plus thousands of layers of 
cortices) one of us could pour through all databases of infections (plus 
samples of fresh programs) to setup our synapses to revert (from hex dumps) all 
infections to fresh programs,
so too could artificial CNS with trillions of artificial neurons do this.

>From howtos ( 
>https://silvercross.quora.com/Howto-produce-better-virus-scanners-through-heuristical-analysis-sandboxes-plus-artificial-CNS-Search-for-open-source
https://open.substack.com/pub/swudususuwu/p/howto-produce-better-virus-scanners 
)

_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

[clamav-users] ClamScan does how much of this (heuristical analysis/sandboxes)?

Reply via email to