You’ve probably seen “AV is dead” headlines in several security blog posts. What new-age security product companies really mean is that only using static signatures is not quite as effective as it use to be.
So what the heck is a static signature you might ask?
Well its basically a simple “rule” that an analyst/automation creates that looks for either strings, binary opcodes, or anything an AV engine can examine in its un-executed state aka “static.” While its easy to say AV is dead, really static signatures are still very effective however more so in finite time internals. AV should be a cog in the wheel amongst your plethora of layered defenses. Hence the gov’y term “Defense in Depth.”
So why are static signatures limited, especially in time intervals?
I’m going to throw a machine learning term out there called “Concept Drift.” The term concept drift is typically used in machine learning to describe learned concepts through feature training causing less accurate results on a non-stationary population. So in terms of malware, its a evolving population that may cause learned concepts (static signatures) to become less accurate over time.
OK, it seems pretty obvious that threats change over time, but what specifically do I mean by it?
There is a nice paper called “Tracking Concept Drift in Malware Families” from the University of Louisiana that I thought summarized what I see on a daily basis pretty well:
Slight changes in adding features and other code refactoring by the malware authors.
Changes in the development environments such as compilers and referenced libraries. This also may include compression, encryption, packing, and compartmentalization (i.e. Plugx’s side-loading).
This is a term used more frequently lately. It means that the malware is generated with automated obfuscations. Consider the term long-tail theory in that there is a large population with high degree of diversity thus making it harder for static signatures to be effective (i.e. Angler EK, Upatre, Zbot).
Take Away: AV is still hella useful, it just needs helpful security product friends to fill the gaps.
So what do you need to use with AV?
Dynamic Behavior & Anomaly analysis.
In the paper I mentioned earlier, using Mnemonic N-grams as features is effective but may have its limitations with packing, encryption, and compartmentalization evasion techniques. Taking this a step further, I find that capturing the dynamic behavior such as hooking Windows APIs can provide greater reliability for machine learning features or even rule generation. I tried an experiment by collecting APIs captured from Upatre and Zbot binaries on malwr.com and applied them against a speech recognition/sequence tagging machine learning algorithm (SVM-HMM). The results were at least 86% accurate but it could use some more research. But hey, I’m not a data scientist… reversing malware is my day job.
FYI, if you want to see my paper, just look in “ME” page.