Intro to Malware Analysis
Last updated
Last updated
Every once in a while, when you are working as a SOC analyst, you will come across content (a file or traffic) that seems suspicious, and you will have to decide whether that content is malicious or not. It is normal to feel confused with all the mixed signals that such content provides. This becomes a little overwhelming for somebody who is just starting in Cybersecurity, and it is common to begin self-guessing oneself. Knowing what steps to take to resolve such a scenario is helpful. This room will lay down some steps to help you make the initial conclusion about a particular suspicious file.
Notably, in this room, you will learn:
What is malware?
How to start analyzing a malware
Static and Dynamic malware analysis
Resources to help you analyze malware
The word malware is derived from the term MALicious softWARE. Therefore, any software that has a malicious purpose can be considered malware. Malware is further classified into different categories based on its behavior. However, we will not go into the details of those in this room. Here we will ponder the steps we will take if we suspect that we found malware in a machine. So, let's get started.
Malware Analysis is an important skill to have. As a quick overview, Malware Analysis is performed by the following people in the Security Industry:
Security Operations teams analyze malware to write detections for malicious activity in their networks.
Incident Response teams analyze malware to determine what damage has been done to an environment to remediate and revert that damage.
Threat Hunt teams analyze malware to identify IOCs, which they use to hunt for malware in a network.
Malware Researchers in security product vendor teams analyze malware to add detections for them in their security products.
Threat Research teams in OS Vendors like Microsoft and Google analyze malware to discover the vulnerabilities exploited and add more security features to the OS/applications.
Overall, it seems like many different people do malware Analysis for many compelling reasons. So let's see how to start!
Please note that malware is like a weapon because it can produce great harm if not handled with care. For this reason, always take the following precautions while analyzing malware:
Never analyze malware or suspected malware on a machine that does not have the sole purpose of analyzing malware.
When not analyzing or moving malware samples around to different locations, always keep them in password-protected zip/rar or other archives so that we can avoid accidental detonation.
Only extract the malware from this password-protected archive inside the isolated environment, and only when analyzing it.
Create an isolated VM specifically for malware analysis, which has the capability of being reverted to a clean slate once you are done.
Ensure that all internet connections are closed or at least monitored.
Once you are done with malware analysis, revert the VM to its clean slate for the next malware analysis session to avoid residue from a previous malware execution corrupting the next one.
Answer the questions below
Which team uses malware analysis to look for IOCs and hunt for malware in a network?
Threat Hunt Team
Malware Analysis is like solving a puzzle. Different tools and techniques are used to find the pieces of this puzzle, and joining those pieces gives us the complete picture of what the malware is trying o do. Most of the time, you will have an executable file (also called a binary or a PE file. PE stands for Portable Executable), a malicious document file, or a Network Packet Capture (Pcap). The Portable Executable is the most prevalent type of file analyzed while performing Malware Analysis.
To find the different puzzle pieces, you will often use various tools, tricks, and shortcuts. These techniques can be grouped into the following two categories:
Static Analysis
Dynamic Analysis
When malware is analyzed without being executed, it is called Static Analysis. In this case, the different properties of the PE file are analyzed without running it. Similarly, in the case of a malicious document, exploring the document's properties without analyzing it will be considered Static Analysis. Examples of static analysis include checking for strings in malware, checking the PE header for information related to different sections, or looking at the code using a disassemble. We will look at some of these techniques later in the room.
Malware often uses techniques to avoid static analysis. Some of these techniques use obfuscation, packing, or other means of hiding its properties. To circumvent these techniques, we often use dynamic analysis.
Static analysis might provide us with crucial information regarding malware, but sometimes that is not enough. We might need to run the malware in a controlled environment to observe what it does in these cases. Malware can often hide its properties to thwart Static Analysis. However, in most of those cases, Dynamic Analysis can prove fruitful. Dynamic analysis techniques include running the malware in a VM, either in a manual fashion with tools installed to monitor the malware's activity or in the form of sandboxes that perform this task automatically. We will learn about some of these techniques later in this room. Once we run the malware in a controlled environment, we can use our knowledge from the Windows Forensics rooms to identify what it did in our environment. The advantage here is that since we control the environment, we can configure it to avoid noise, like activity from a legitimate user or Windows Services. Thus, everything we observe in such an environment points to malware activity, making it easier to identify what the malware did in this scenario.
Malware, however, often uses techniques to prevent an analyst from performing dynamic analysis. Since most dynamic analysis is performed in a controlled environment, most methods to bypass dynamic analysis include detecting the environment in which it is being run. Therefore, in these cases, the malware uses a different, benign code path if it identifies that it is being run in a controlled environment.
Advanced malware analysis techniques are used to analyze malware that evades basic static and dynamic analysis. For performing advanced malware analysis, disassemblers and debuggers are used. Disassemblers convert the malware's code from binary to assembly so that an analyst can look at the instructions of the malware statically. Debuggers attach to a program and allow the analyst to monitor the instructions in malware while it is running. A debugger allows the analyst to stop and run the malware at different points to identify interesting pieces of information while also providing an overview of the memory and CPU of the system. We will not cover advanced malware analysis in this room. However, it will be covered in a future module targeting malware analysis.
Answer the questions below
Which technique is used for analyzing malware without executing it?
Static Analysis
Which technique is used for analyzing malware by executing it and observing its behavior in a controlled environment?
Dynamic Analysis
When analyzing a new piece of malware, the first step is usually performing basic static analysis. Basic static analysis can be considered sizing up the malware, trying to find its properties before diving deep into analysis. It provides us with an overview of what we are dealing with. Sometimes it might give us some critical information, for example, what API calls the malware is making or whether it's packed or not. However, other times, it might only give us information to help us size the malware up and give us an idea of the effort required to analyze it.
So without further ado, let's see some of the techniques we can use to perform basic static analysis.
Although static analysis is performed without running the malware, it is highly recommended that you perform malware analysis in an isolated Virtual Machine. You can create a clean snapshot of your Virtual Machine before performing any malware analysis and revert it to start from a clean state again after every analysis. Don't perform malware analysis on a live machine not purpose-built for malware analysis. For this room, we will be using the attached Remnux VM. Remnux (Reverse Engineering Malware Linux) is a Linux distribution purpose-built for malware analysis. It has many tools required for malware analysis already installed on it.
The machine will start in the split view. Alternatively, you can access the machine using the following credentials:
Username: ubuntu
Password: 123456
Though often the file type of malware is visible in the file extension and is obvious, sometimes malware authors try to trick users by using misleading file extensions. In such scenarios, it is helpful to know how to find the actual file type of a file without depending on file extensions. In Linux, we can find the file type of a file using the file
command. To understand what the file command does, we can read its man page
or use the --help
option:
man file
or file --help
We will find out that it is a simple command to use. We can use the following command to find the file type of a file:
file <filename>
Remnux
There is a folder named Samples
on the Desktop in the attached VM. We will be using the samples present in that folder for our analysis. The above terminal shows the file
command being run on the 'wannacry' sample. The output shows a PE32 executable file with a Graphical User Interface, which was compiled for a system that runs Microsoft Windows with an Intel 80386-based processor. The Intel 80386 processor was one of the first 32-bit processors ever, and the instruction set designed for the 80386 is still used for 32-bit Intel processors, which is why you see "x86" processors and code. This means that the "80386" in the output above tells us that this application was designed for 32-bit Intel processors.
Another really important command that provides us with useful information about a file is the strings
command. This command lists down the strings present in a file. To understand what the string command does, we can read its man page
or use the --help
option:
man strings
or strings --help
We will find that it is also a simple command to use. We can use the following command to find the strings in a file:
strings <filename>
Looking at strings in a file can often give clues related to the behavior of malware. For example, if we see URLDownloadToFile in the output of the strings command, we will know that this malware is doing something with the URLDownloadToFile Windows API. Most likely, it is downloading a file from the internet and saving it on the disk. Similarly, strings might also provide contextual information that helps us later during malware analysis.
Remnux
Here we can see the strings
command being run against the 'wannacry' sample. We will see that the output starts with the DOS Stub
, which is the text that says !This program cannot be run in DOS mode
. Some values don't make much sense and look like garbage, but you will also see useful output. For example, we can see above that some strings look like Windows APIs. For example, CloseHandle
, GetExitCodeProcess
, TerminateProcess
, and so on. Similarly, we can see text that says inflate 1.1.3 Copyright 1995-1998 Mark Adler
. A quick search shows that it is a part of the zlib data compression library, this tells us that the sample might be using this library.
Tip: Sometimes, the output of the strings command is too big to be shown on the terminal completely. We can redirect it, write it to a file, and read it using vim or any other tool. The below terminal shows the output being redirected to a file named str:
Remnux
Alternatively, you can use the more
or less
command to parse the output in a more visible manner:
Remnux
We can use the space key to scroll down the list of strings here. If you are interested, this room contains more information about strings.
File Hashing provides us with a fixed-size unique number that identifies a file. A File Hash can therefore be considered a unique identifier for a file, similar to Social Security Numbers or National Identification Numbers used for the citizens of a country. Hashing is an important concept in malware analysis. It can be used as an identifier for specific malware. As we will see later in this task, this identifier can then be shared with other analysts or searched online for information sharing purposes. Please note that a single bit of difference in two files will result in different hashes, so changing the hash of a file is as simple as changing one bit in it.
Commonly, md5sum
, sha1sum
and sha256sum
hashes are used for file hashing. We can calculate file hashes by using a simple command in Linux, as shown below for the md5sum hash:
md5sum <filename>
Remnux
Above, we can see the md5sum hash is calculated for the file named 'wannacry.'
Similarly, sha1sum
and sha256sum
commands can be used for calculating sha1sum
or sha256sum
of a file (Hashes are often referred to without the sum
at the end, e.g. md5
instead of md5sum
and so on.)
If you are interested in learning more about hashes, you can check out this room.
Scanning a file using AVs or searching for a hash on VirusTotal can also provide useful information about the classification of malware performed by security researchers. However, when using an online scanner, it is recommended to search for the malware's hash instead of uploading online to avoid leaking sensitive information online. Only upload a sample if you are sure of what you are doing.
Let's see what it says about the sample we calculated the hash for above. We can search for the md5sum we calculated for the wannacry sample on the VirusTotal homepage:
VirusTotal has a mix of handy features. It provides scan results from 60+ AV vendors and each AV vendor's classification to the sample.
The details tab lists the history of the sample, the first submission, the last submission, and the metadata of the sample.
We can also find comments about the sample by the community on VirusTotal, which can sometimes provide additional context about the sample.
Perhaps it is very clear from the above screenshots that we are looking at a sample of wannacry ransomware.
Answer the questions below
In the attached VM, there is a sample named 'redline' in the Desktop/Samples directory. What is the md5sum of this sample?
ca2dc5a3f94c4f19334cc8b68f256259
What is the creation time of this sample?
Search the hash on Virustotal and check the Details tab
https://www.virustotal.com/gui/file/e8ba49a75de083cb786e8ed84972affa11542dd913f1a07b0d44e1d45e5e22e9/details
2020-08-01 02:44:18 UTC
The PE File Header contains the metadata about a Portable Executable file. This data can help us find a lot of helpful information to help us in our analysis. We will go into detail about the PE header and the information it contains in the upcoming Malware Analysis module for the advanced path. However, some of the vital information found in the PE header is explained below:
A PE file seldom contains all the code that it needs to run on a system on its own. Most of the time, it re-uses code provided by the Operating System. This is done to use less space and leverage the framework the Operating System has laid to perform tasks instead of re-inventing the wheel. Imports are such functions that the PE file imports from outside to perform different tasks.
For example, if a developer wants to Query a Windows Registry value, they will import the RegQueryValue function provided by Microsoft instead of writing the code themselves. It is understood that this function will be present on any Windows machine on which the developer's code is going to run, so it does not need to be included in the PE file itself. Similarly, any PE file export functions are exposed to other binaries that can use that function instead of implementing it themselves. Exports are generally associated with Dynamically-Linked libraries (DLL files), and it is not typical for a non-DLL PE file to have a lot of exports.
Since most PE files use the Windows API to perform the bulk of their jobs, a PE file's imports provide us with crucial information on what that PE file will do. It becomes evident that a PE file that is importing the InternetOpen function will communicate with the internet, a URLDownloadToFile function shows that a PE file will download something from the internet, and so on. Names of Windows APIs are generally intuitive and self-explanatory. However, we can always consult Microsoft Documentation to verify the purpose of a particular Windows function.
Another useful piece of information available in the PE file header is the information about sections in the PE file. A PE file is divided into different sections which have different purposes. Although the sections in a PE file depend on the compiler or packer used to compile or pack the binary, the following are the most commonly seen sections in a PE file.
.text: This Section generally contains the CPU instructions executed when the PE file is run. This section is marked as executable.
.data: This Section contains the global variables and other global data used by the PE file.
.rsrc: This Section contains resources that are used by the PE file, for example, images, icons, etc.
We can use the pecheck utility present in the Remnux VM attached with the room to check the PE header.
Remnux
Here we can see information pecheck has extracted from the PE header of the wannacry sample. We see that the sample has 4 sections, .text, .rdata, .data and .rsrc and their respective entropy. Similarly, it has also shown us the different hashes of the sample. Pecheck also shows us the functions that a PE file imports. In the above terminal window, we can see the IMAGE_IMPORT_DESCRIPTOR, which shows the functions it imports from the ADVAPI32.dll Linked library. We will see similar descriptors for all the other linked libraries whose functions are imported by the sample.
We can see that pecheck shows us a lot more information than what we discussed in this task; however, discussing all that information is out of the scope of this room. We will dive into further details in the upcoming malware analysis module. We will take what we are looking for from the information we see, namely, the section information and the imports of our samples.
chatGPT
ADVAPI32.dll es un archivo de sistema en Microsoft Windows que proporciona acceso a un conjunto de funciones de programación de aplicaciones (API) relacionadas con la seguridad y el registro del sistema. Estas funciones se utilizan a menudo por aplicaciones y programas del sistema para realizar tareas especializadas y para acceder a recursos y configuraciones del sistema.
Algunas de las tareas que pueden realizar las funciones en ADVAPI32.dll incluyen:
Gestión de cuentas de usuario y grupos
Gestión de contraseñas y autenticación de usuario
Acceso a claves del registro del sistema
Control de acceso a archivos y carpetas
Gestión de eventos del sistema
Si necesitas utilizar una de estas funciones en tu aplicación, es posible que tengas que hacer referencia al archivo ADVAPI32.dll en tu código. Esto se hace a menudo mediante la declaración de una "biblioteca de enlace dinámico" (DLL) en el código fuente de tu aplicación.
Answer the questions below
In the attached VM, there is a sample named 'redline' in the directory Desktop/Samples. What is the entropy of the .text section of this sample?
Use pecheck to find this information
6.453919
The sample named 'redline' has five sections. .text, .rdata, .data and .rsrc are four of them. What is the name of the fifth section?
Use pecheck to find this information
.ndata
From which dll file does the sample named 'redline' import the RegOpenKeyExW function?
Check the image import descriptor in pecheck output. The name of the function is appended to the name of the dll it is imported from.
ADVAPI32.dll
Check out the GUI-based Petree tool and see what information it shows. You can use the following command for using the pe-tree tool to analyze the 'redline' malware. (The pe-tree tool might take some time to initiate.)
pe-tree redline
While basic static analysis provides us with useful information about a sample, most times, we need to perform additional analysis to move further in our analysis procedure. One quick and dirty way to find more clues about a malware's behavior is by performing basic dynamic analysis. Many of the properties of a malware sample can be hidden when it's not running. However, when we perform dynamic analysis, we can lay these properties bare and learn more about the behavior of a malware sample.
Dynamic analysis requires running live malware samples that can be destructive. It is highly recommended that you perform malware analysis in an isolated Virtual Machine. You can create a clean snapshot of your Virtual Machine before performing any malware analysis and revert it to start from a clean state again after every analysis. Don't perform malware analysis on a live machine not purpose-built for malware analysis.
Sandbox is a term borrowed from the military. A sandbox is a box of sand, as the name suggests, modeling the terrain where an operation has to take place, in which a military team dry runs their scenarios to identify possible outcomes. In malware analysis, a sandbox is an isolated environment mimicking the actual target environment of a malware, where an analyst runs a sample to learn more about it. Malware analysis sandboxes heavily rely on Virtual Machines, their ability to take snapshots and revert to a clean state when required.
For malware analysis using sandboxes, the following considerations make the malware analysis effective:
Virtual Machine mimicking the actual target environment of the malware sample
Ability to take snapshots and revert to clean state
OS monitoring software, for example, Procmon, ProcExplorer or Regshot, etc.
Network monitoring software, for example, Wireshark, tcpdump, etc.
Control over the network through a dummy DNS server and webserver.
A mechanism to move analysis logs and malware samples in and out of the Virtual Machine without compromising the host (Be careful with this one. If you have a shared directory with your malware analysis VM that remains accessible when running malware, you might risk malware affecting all files in your shared directory)
Though it is good to understand what a good sandbox is made of, building a sandbox from scratch is not always necessary. One can always set up Open Source Sandboxes. These sandboxes provide the framework for performing basic dynamic analysis and are also customizable to a significant extent to help those with a more adventurous mindset.
Cuckoo's sandbox is the most widely known sandbox in the malware analysis community. It was developed as part of a Google Summer of Code project in 2010. It is an open-source project that you will often see deployed in SOC environments and with enthusiasts' home labs. Advantages of Cuckoo's sandbox include huge community support, easy-to-understand documentation, and lots of customizations. You can deploy it on your network and let the community signatures guide you into identifying which files are malicious and which are benign because of the vast corpus of community signatures that come with it.
Cuckoo's sandbox has been archived, and an update is pending. It also doesn't support Python 3, making it obsolete right now. However, all is not lost because we have alternatives.
CAPE Sandbox is a little more advanced version of Cuckoo's sandbox. It supports debugging and memory dumping to support the unpacking of packed malware (We will learn more about packing and unpacking in the advanced malware analysis module). Though beginners can use this sandbox, advanced knowledge is required for making full use of it. A community version of this sandbox is available online, which can be used to test run it before installing. CAPE Sandbox is so far actively developed and supports Python 3.
Setting up and maintaining a sandbox can be a time-consuming task. Keeping that in view, online sandboxes can be of great help. Some of the most commonly used online sandboxes are as follows:
Though online sandboxes provide a useful utility, it is best not to submit a sample online unless you are sure of what you are doing. A better approach is to search for the sample's hash on the service you are using to see if someone has already submitted it. Let's look at Hybrid Analysis to see what interesting analysis it provides for our sample.
On its homepage, we are greeted with the following screen:
As we mentioned, we will not be submitting a sample. Instead, we will search for the hash of our sample. Therefore, we will search for the md5sum of the wannacry sample from the attached VM. We will see that it is already submitted multiple times, and we can choose from the submitted results.
Let's open the one submitted on Windows 7 64 bit from among these.
We will see the above interface when we click on the sample. We can see a navigation pane on the right that highlights different parts of the report. We can also see that the verdict is malicious, with a threat score of 100/100 and AV detection of 95%. Below that, we see the overview of the sample's behavior. Below that, we will see the mapping to MITRE ATT&CK techniques. We will see the following mapping when we click view all details
:
Below that, we will see some indicators and context information and some static analysis information for the sample. The dynamic analysis part comes below that:
This part provides us with a lot of information about the behavior of the sample when it was run in a sandbox. We can click each process to find more detail about it. In the above screenshot, of particular interest can be the executions of cmd.exe. We can see that the sample is running script files and deleting backups and volume shadow copies, something often done by ransomware operators to stop the victim from restoring their files from these sources.
Below this section, we will see network analysis of the sample:
Extracted strings and extracted files are also available in the report. These can provide information about the batch scripts we saw in the processes above.
And there are comments from the community at the very end. As we might have seen, we can find many pieces of the puzzle that a malware sample is, using the discussed techniques. However, in some cases, these techniques can prove insufficient to make a decision. Let's move to the next task to determine what scenarios can make it challenging to analyze malware.
Answer the questions below
Check the hash of the sample 'redline' on Hybrid analysis and check out the report generated on 14 March 2022. Check the Incident Response section of the report. How many domains were contacted by the sample?
Check the Risk Assessment tab
https://www.hybrid-analysis.com/sample/e8ba49a75de083cb786e8ed84972affa11542dd913f1a07b0d44e1d45e5e22e9/622f708708751066e8250d8e
8
In the report mentioned above, a text file is created by the sample. What is the name of that text file?
Text files have .txt extension
fj4ghga23_fsa.txt
While the security researchers are devising techniques and tools to analyze malware, the malware authors are working on rendering these tools and techniques ineffective. We found a great deal of information in the previous tasks about the malware we analyzed. However, there are ways malware authors can make our life difficult. Below are some of the techniques used by malware authors to do the same.
Malware authors often use packing and obfuscation to make an analyst's life difficult. A packer obfuscates, compresses, or encrypts the contents of malware. These techniques make it difficult to analyze malware statically. Specifically, a packed malware will not show important information when running a string search against it. For example, let's run a string search against the file named zmsuz3pinwl
in the Samples folder in the attached VM.
Remnux
We will notice that this sample contains mainly garbage strings that don't provide much value to us. Let's run pecheck on the sample to see what else we get.
Remnux
As suspected, we see that the executable has characteristics typical of a packed executable, as per pecheck. We will notice that there is no .text section in the sample, and other sections have execute permissions, which shows that these sections contain executable instructions or will be populated with executable instructions during execution. We will also see that this sample does not have many imports that might show us its functionality, as we saw with the previous sample.
For analysis of packed executables, the first step is generally to unpack the sample. This is an advanced topic that will be covered in the upcoming rooms.
As we have seen previously, we can always run a sample in a sandbox to analyze it. In many cases, that might help us analyze samples that evade our basic static analysis techniques. However, malware authors have some tricks up their sleeves that hamper that effort. Some of these techniques are as follows:
Long sleep calls: Malware authors know that sandboxes run for a limited time. Therefore, they program the malware not to perform any activity for a long time after execution. This is often accomplished through long sleep calls. The purpose of this technique is to time out the sandbox.
User activity detection: Some malware samples will wait for user activity before performing malicious activity. The premise of this technique is that there will be no user in a sandbox. Therefore there will be no mouse movement or typing on the keyboard. Advanced malware also detects patterns in mouse movements that are often used in automated sandboxes. This technique is designed to bypass automated sandbox detection.
Footprinting user activity: Some malware checks for user files or activity, like if there are any files in the MS Office history or internet browsing history. If no or little activity is found, the malware will consider the machine as a sandbox and quit.
Detecting VMs: Sandboxes run on virtual machines. Virtual machines leave artifacts that can be identified by malware. For example, some drivers installed in VMs being run on VMWare or Virtualbox give away the fact that the machine is a VM. Malware authors often associate VMs with sandboxes and would terminate the malware if a VM is detected.
The above list is not exhaustive but gives us an idea of what to expect when analyzing malware. In a future module dedicated to malware analysis, we will discuss these techniques and ways to detect malware that employs them.
Answer the questions below
Which of the techniques discussed above is used to bypass static analysis?
string search is static analysis.
Packing
Which technique discussed above is used to time out a sandbox?
Long sleep calls
That was a primer on malware analysis. However, this was just scratching the surface. So far, we learned:
Static and Dynamic analysis of malware
Finding strings, calculating hashes, and running AV scans on malware
Introduction to the PE header and how to use information from it in malware analysis
Sandboxing and different online sandboxes that we can use
How malware evades the techniques we just discussed.
We will be working on a malware analysis module that will cover ways to counter the anti-analysis techniques just discussed.
Let us know what you found interesting in this room on our Discord channel or Twitter account.
[[TheHive Project]]
Malware faces a dilemma. It has to execute to fulfill its purpose, and no matter how much obfuscation is added to the code, it becomes an easy target for detection once it runs.
For the following steps, we will use the attached VM. Start the machine by clicking on the Start Machine button in the top-right corner of this task.
Sometimes it also provides information about the behavior of a sample and its relations as seen in different environments online.