The idea behind file encrypting in Python
In this article, we will create a simple script python that uses XOR to encrypt a file.
Xor is one of the most simple encryption methods malware developers use to obfuscate their code thanks to its significant advantages:
simplicity (no need for an easy-to-find external encryption library)
minimal length
Given that, it can be hard for an analyst to find, especially in a fast static analysis like string research, import table analysis or shellcode analysis.
Prerequisites
For this mini project, the only thing you need is a working installation of python3
What is XOR
The XOR operator is a logical operation that takes two operands and returns a true or false value based on whether the operands are different or the same.
It has the property that XORing two times an operand with the same one, we will have the starting one, it makes possible to use this operator for symmetric encryption.
The code
We want to pass a file and XOR it char by char with a number passed in input:
def xor_content(content, key:int):
return [x^key for x in content]
if __name__ == "__main__":
if len(sys.argv) < 4:
print("USAGE:")
print("python3 xor_file.py <KEY> <INPUT_FILENAME> <OUTPUT_FILENAME>")
exit(0)
with open(sys.argv[2], 'rb') as i_file:
in_content = i_file.read()
output = xor_content(in_content, int(sys.argv[1]))
with open(sys.argv[3], 'wb') as o_file:
o_file.write(bytearray(output))
This code takes in a key and two file names as input. It then reads the first file, xors every byte with the key, and writes the result to the second file.
We can launch it in this way:
python3 main.py 65 input.txt output.txt
Basically, launching that it will xor char by char of the file “input.txt” with 65 and then write the output onto “output.txt“.
Improvements
Encrypting a file in Python seems really easy, but we can do something more.
The XOR represents a single-byte encoding, and due to the fact that a null byte xor with the key, returns the key, we can see a potential weakness. In fact, usually, there are a lot of null bytes and it can see that the encoded file shows a bunch of bytes equal to the key.
The same problem occurs with a byte equal to the key.
The malware author mitigates that using a NULL-preserving single-byte XOR encoding scheme.
It simply means to skip NULL bytes and the ones equal to the key.
So let’s try to make it, by changing the xor_content method:
def xor_content(content, key:int):
return [x^key for x in content if x!=key and x != 0]
Bruteforcing XOR encoding
Our work seems to be ok, but it works just if we have found the key in assembly.
Now let’s try to code a tool that brute forces the encrypted file in order to find a specific set of bytes.
First of all, we need to write a method that checks if a sub is a subarray of arr.
def is_subarray(sub, arr):
if len(sub) > len(arr):
return False
for i in range(len(arr)-len(sub)+1):
if arr[i:i+len(sub)] == sub:
return True
return False
This code block opens a file and reads its contents into a list.
It then uses a for loop to iterate through every possible key value from 1-128.
For each key, it xor’s the list of file contents with the key and converts the result into a string.
If the string contains the substring (in hex) passed as the third argument, it prints the key and exits the program;
otherwise, it prints “Key not found” and exits the program.
Put all together
Now let's see the complete code for both scripts
xor_file.pyimport sys
def xor_content(content, key:int):
return [x^key for x in content if x!=key and x != 0]
if __name__ == "__main__":
if len(sys.argv) < 4:
print("USAGE:")
print("python3 xor_file.py <KEY> <INPUT_FILENAME> <OUTPUT_FILENAME>")
exit(0)
with open(sys.argv[2], 'rb') as i_file:
in_content = i_file.read()
output = xor_content(in_content, int(sys.argv[1]))
with open(sys.argv[3], 'wb') as o_file:
o_file.write(bytearray(output))
bruteforce.pyimport sys
def xor_content(content, key:int):
return [x^key for x in content if x!=key and x != 0]
def is_subarray(sub, arr):
if len(sub) > len(arr):
return False
for i in range(len(arr)-len(sub)+1):
if arr[i:i+len(sub)] == sub:
return True
return False
if __name__ == "__main__":
if len(sys.argv) < 3:
print("USAGE:")
print("python3 bruteforce.py <INPUT_FILENAME> <KEY_TO_SEARCH>")
exit(0)
with open(sys.argv[1], 'rb') as i_file:
in_content = [x for x in i_file.read()]
for k in range(1,128):
output = xor_content(in_content, k)
string = [x for x in bytearray.fromhex(sys.argv[2])]
if is_subarray(string, output):
print(f"The key is {k}")
sys.exit(0)
print("Key not found")
Conclusion
At this point, we have all that we need, and we can test it with these simple commands:
python3 xor_file.py 88 in.txt out.txt
python3 bruteforce.py out.txt 4d5a
If everything is right the output should be:
The key is 88
Note: “4d5a” passed as input in our example is the hexadecimal ASCII value of the initial two bytes of a DOS MZ EXECUTABLE.
This is a technique to hinder static analysis, if you want to know some simple ideas to do the same with dynamic analysis, take a look at those posts: