How to easily encrypt files in Python

How to easily encrypt files in Python

·

4 min read

The idea behind file encrypting in Python

In this article, we will create a simple script python that uses XOR to encrypt a file.
Xor is one of the most simple encryption methods malware developers use to obfuscate their code thanks to its significant advantages:

  • simplicity (no need for an easy-to-find external encryption library)

  • minimal length

Given that, it can be hard for an analyst to find, especially in a fast static analysis like string research, import table analysis or shellcode analysis.

Prerequisites

For this mini project, the only thing you need is a working installation of python3

What is XOR

The XOR operator is a logical operation that takes two operands and returns a true or false value based on whether the operands are different or the same.

It has the property that XORing two times an operand with the same one, we will have the starting one, it makes possible to use this operator for symmetric encryption.

The code

We want to pass a file and XOR it char by char with a number passed in input:

def xor_content(content, key:int):
    return [x^key for x in content]

if __name__ == "__main__":

    if len(sys.argv) < 4:
        print("USAGE:")
        print("python3 xor_file.py <KEY> <INPUT_FILENAME> <OUTPUT_FILENAME>")
        exit(0)

    with open(sys.argv[2], 'rb') as i_file:
        in_content =  i_file.read()
        output = xor_content(in_content, int(sys.argv[1])) 
        with open(sys.argv[3], 'wb') as o_file: 
            o_file.write(bytearray(output))

This code takes in a key and two file names as input. It then reads the first file, xors every byte with the key, and writes the result to the second file.

We can launch it in this way:

python3 main.py 65 input.txt output.txt

Basically, launching that it will xor char by char of the file “input.txt” with 65 and then write the output onto “output.txt“.

Improvements

Encrypting a file in Python seems really easy, but we can do something more.

The XOR represents a single-byte encoding, and due to the fact that a null byte xor with the key, returns the key, we can see a potential weakness. In fact, usually, there are a lot of null bytes and it can see that the encoded file shows a bunch of bytes equal to the key.
The same problem occurs with a byte equal to the key.

The malware author mitigates that using a NULL-preserving single-byte XOR encoding scheme.
It simply means to skip NULL bytes and the ones equal to the key.
So let’s try to make it, by changing the xor_content method:

def xor_content(content, key:int):
    return [x^key for x in content if x!=key and x != 0]

Bruteforcing XOR encoding

Our work seems to be ok, but it works just if we have found the key in assembly.
Now let’s try to code a tool that brute forces the encrypted file in order to find a specific set of bytes.

First of all, we need to write a method that checks if a sub is a subarray of arr.

def is_subarray(sub, arr):
    if len(sub) > len(arr):
        return False
    for i in range(len(arr)-len(sub)+1):
        if arr[i:i+len(sub)] == sub:
            return True
    return False

This code block opens a file and reads its contents into a list.
It then uses a for loop to iterate through every possible key value from 1-128.
For each key, it xor’s the list of file contents with the key and converts the result into a string.
If the string contains the substring (in hex) passed as the third argument, it prints the key and exits the program;
otherwise, it prints “Key not found” and exits the program.

Put all together

Now let's see the complete code for both scripts

xor_file.pyimport sys


def xor_content(content, key:int):
    return [x^key for x in content if x!=key and x != 0]

if __name__ == "__main__":

    if len(sys.argv) < 4:
        print("USAGE:")
        print("python3 xor_file.py <KEY> <INPUT_FILENAME> <OUTPUT_FILENAME>")
        exit(0)

    with open(sys.argv[2], 'rb') as i_file:
        in_content =  i_file.read()
        output = xor_content(in_content, int(sys.argv[1])) 
        with open(sys.argv[3], 'wb') as o_file: 
            o_file.write(bytearray(output))
bruteforce.pyimport sys


def xor_content(content, key:int):
    return [x^key for x in content if x!=key and x != 0]

def is_subarray(sub, arr):
    if len(sub) > len(arr):
        return False
    for i in range(len(arr)-len(sub)+1):
        if arr[i:i+len(sub)] == sub:
            return True
    return False

if __name__ == "__main__":

    if len(sys.argv) < 3:
        print("USAGE:")
        print("python3 bruteforce.py <INPUT_FILENAME> <KEY_TO_SEARCH>")
        exit(0)

    with open(sys.argv[1], 'rb') as i_file:
        in_content =  [x for x in i_file.read()]
        for k in range(1,128):
            output = xor_content(in_content, k)
            string = [x for x in bytearray.fromhex(sys.argv[2])]
            if is_subarray(string, output):
                print(f"The key is {k}")
                sys.exit(0)
        print("Key not found")

Conclusion

At this point, we have all that we need, and we can test it with these simple commands:

python3 xor_file.py 88 in.txt out.txt 
python3 bruteforce.py out.txt 4d5a

If everything is right the output should be:

The key is 88

Note: “4d5a” passed as input in our example is the hexadecimal ASCII value of the initial two bytes of a DOS MZ EXECUTABLE.

This is a technique to hinder static analysis, if you want to know some simple ideas to do the same with dynamic analysis, take a look at those posts:

Did you find this article valuable?

Support Stackzero's blog by becoming a sponsor. Any amount is appreciated!