I was recently browsing on Hybrid Analysis and decided to take poke at this sample, what made me interested in it was the call of “certutil.exe”. I was browsing the various subroutines when I find a fairly long subroutine which had some obvious indicators to deobfuscating strings, the repeated use of a ‘key’ was one clue I saw. I call this a ‘banker’ as I used that as a search term from Hyrbid Analysis, I have yet to confirm this is used to target a user logging into a bank. We could use dynamic analysis to figure out every single string but this would be laborious in logging every return compared to copying the string table and working from there. The sample also seems to have various anti-analysis techniques implemented within it, although not in this case, it can sometimes be best to compartmentalise and be realistic about what is the best option in getting information on the sample.
We push four arguements to our calls, these are the length of the input string we wish to deobfuscate, the input string offset, the length of the key and the offset to our key. I’m using key loosely because the individual characters are used as a key, key pool would be more an appropriate term. It is fairly small function which sometimes isn’t the case with malware, it is an obvious for loop as we are looping and producing a comparison in the early stages to check whether the length of our inputString has been reached or not. We set a char pointer for one of our characters in the key pool, dependent on EDX and iterate through these by the length of the key string. It is modulated in every iteration, which is fairly common in cryptographic algorithms, so that it wraps back around to potentially have any length of input. We also get a single character from our input string in preperation for the main operation, every character from our input string is treated seperately and has its own iteration.
We then use a famous XOR operation to manipulate the value that we return. Quite common to see this used in malware samples and is fairly trivial to rewrite code to deobfuscate such strings, but we have an interesting choice from the developers in which most of the strings also include complete junk which we do not want to return when deobfuscating our string, this may be used to defeat automatic methods of malware detection. By XOR’ing the values with the initial string, you will get exceptions to the code. The string has every number from 1-9 but we need to also include 0 when we try to reproduce our string deobfuscation as well due to modulation, this is something static analysis may not of clocked. If we print our strings without exception handling we get results like this.
The actual subroutine where this all gets called from is rather wasteful, instead of implementing some sort of loop towards an array of these strings we are instead presented with a long branch of code which repeatedly calls the same function with a seperate input string. This leads to IDA having a very scary looking function, but in fact, is very simplistic, and could be shortened considerably.
I have decided to use Python as I am very comfortable with it at the moment and the script I produce can be implemented into an IDA script. Reproducing the values that IDA had provided, appending the 0, and ensuring when we hit an exception to return the string, the deobfuscation worked. I confirmed that the deobfuscation worked with strings that had hit an exception by live debugging the sample and replacing the origin of the EIP. Here I was able to give a comparison of the compared results in which I saw the junk characters were ignored. The finished result was produced by copying all the string table lines into a text file, removing parts of it that were not relevant and then feeding this by reading it from the Python script. The end result was sucessful.