Subomain scanner with python? Why?
Do you want to build your subdomain scanner with python? But why?
There are many reasons why you might want to develop your own subdomain scanner.
Maybe you’re a security researcher who wants to find vulnerabilities in websites.
Maybe you’re a penetration tester who needs to assess the security of a client’s website.
Maybe you’re just curious about how websites work and want to learn more about how to find vulnerabilities in them.
Whatever your reason, developing your own subdomain scanner it’s not so challenging but rewarding process. It requires a bit of knowledge about network and coding.
But the end result can be a powerful tool that can help you find vulnerabilities in websites and protect yourself and others from cyber attacks.
Prerequisites
To be able to write our own tool you don’t need much, you just need to:
install python3
install the “requests” library
find a file with a set of possible subdomains
get a working connection
If we imagine we are on a Kali Linux virtual machine, we probably have everything already, but let’s see the terminal commands if we are on a Ubuntu machine:
sudo apt install python3
sudo apt install python3-pip
pip3 install requests
pip3 install optparse
The installation of a progress bar is optional:
pip3 install progress
Now that we’ve installed everything we need, let’s get a list of possible domains that we’re going to browse, for the tutorial I’ll use subdomains-top1million-5000.txt in this repository SecListRepo, more precisely at this address:
SecListsFile.
Introduction
Now we are ready we want our program to take as input the main domain and a file containing the list of domains to be iterated. It might also be interesting to add the possibility of saving the output to a file. You can find a few versions of a subdomain scanner online, but they all turn out to be quite slow, so let’s try using threads to try to do something better, and perhaps with the possibility of passing as a parameter the number of threads we need. So let’s see how to use the optparse library to collect the arguments we need.
def get_args():
parser = optparse.OptionParser()
parser.add_option('-d', '--main', dest='domain_name',
help='The domain name', metavar='DOMAIN_NAME')
parser.add_option("-i", "--input", dest="input_list",
help="read the list from INPUT_FILE", metavar="INPUT_FILE")
parser.add_option("-f", "--file", dest="output_file", default="",
help="write report to FILE", metavar="FILE")
parser.add_option("-t", "--threads", type=int, dest='n_threads', help="Set the number of threads", metavar="N_THREADS", default=1)
return parser.parse_args()
As we can see, the get_args method takes as input the arguments we need:
-d the main domain
-i the input file
-f the output file, if we want to save everything to a file
-t the number of threads to launch
finally, it returns the arguments that will be used in the main.
Auxiliary methods
Before writing the whole main, let’s define the methods and global variables we might need:
q = queue.Queue()
bar = None
active_domains = []
lock = threading.Lock()
def from_file(filename):
with open(filename, 'r') as f:
subdomains = f.read().split('\n')
return subdomains
This is the method that reads a list of subdomains given a file and returns it to the calling method, but now let’s look at something more interesting:
def check_subdomain(domain, sub):
subdomain = f"http://{sub.strip()}.{domain}"
try:
requests.get(subdomain, timeout=2)
except (requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout) as e:
return False
return True
The check_subdomain method first takes a domain and a subdomain as arguments. It then sends a get request through the requests library and waits 2 seconds if it doesn’t get an immediate response.
If the request is successful, the method returns True, otherwise, if it throws an exception, the return value is False.
The last auxiliary method is append_if_exists, which will be used to insert a subdomain in a global list of existing domains.
It also uses a lock in order to avoid concurrency errors
def append_if_exists(host, sub):
if(check_subdomain(host, sub)):
with lock:
active_domains.append(f"{sub}.{host}")
Finally we have the get_active method.
def get_active():
global q
while True:
i = q.get()
append_if_exists( domain_name, i)
bar.next()
q.task_done()
This method iterates over a queue until it’s empty, being the queue common to all threads we want to avoid race conditions, even if they are unlikely and not very dangerous, so we can manage that using the class queue.Queue.
Inside the loop, the first thing the method does is pop the element, append the domain, update the bar and then notify the task is done.
Put it all together and create the Python subdomain scanner
In the main we’ll put everything together, we’ll call all the defined methods, whose behaviour we already know.
The queue will contain all the subdomains from which the threads will take the next value to check, and active_domains will be a list in which each thread will insert positive results.
Into the for loop we create all threads, set the thread.daemon as True (the thread will end with the main) and everyone will call the get_active method.
With t.start() we launch all threads and then wait for the queue’s emptying with q.join().
We will use a try-catch to be able to stop the scan using CTRL+C without losing the results.
And finally, we decide whether to print the input to the screen or save it to a file.
Having done everything, let’s see the main inside the complete code (working with a simple copy and paste for the lazy ones).
import requests
import threading
import time
import queue
from progress.bar import Bar
import optparse
q = queue.Queue()
bar = None
active_domains = []
lock = threading.Lock()
def from_file(filename):
with open(filename, 'r') as f:
subdomains = f.read().split('\n')
return subdomains
def check_subdomain(domain, sub):
subdomain = f"http://{sub.strip()}.{domain}"
try:
requests.get(subdomain, timeout=2)
except (requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout) as e:
return False
return True
def append_if_exists(host, sub):
if(check_subdomain(host, sub)):
with lock:
active_domains.append(f"{sub}.{host}")
def get_args():
parser = optparse.OptionParser()
parser.add_option('-d', '--main', dest='domain_name',
help='The domain name', metavar='DOMAIN_NAME')
parser.add_option("-i", "--input", dest="input_list",
help="read the list from INPUT_FILE", metavar="INPUT_FILE")
parser.add_option("-f", "--file", dest="output_file", default="",
help="write report to FILE", metavar="FILE")
parser.add_option("-t", "--threads", type=int, dest='n_threads', help="Set the number of threads", metavar="N_THREADS", default=12)
return parser.parse_args()
def get_active():
global q
while True:
i = q.get()
append_if_exists( domain_name, i)
bar.next()
q.task_done()
if __name__ == "__main__":
options, args = get_args()
for s in from_file(options.input_list):
q.put(s)
bar = Bar("Subdomain scanning...", max=q.qsize())
domain_name = options.domain_name
try:
pre_time = time.time()
for i in range(options.n_threads):
t = threading.Thread(target=get_active)
t.daemon = True
t.start()
q.join()
except KeyboardInterrupt:
pass
finally:
if options.output_file:
with open(options.output_file, 'w') as f:
f.write("\n".join(active_domains))
else:
print("\n")
for e in active_domains:
print(e)
print(f"\nFound {len(active_domains)} subdomains")
print("Executed in %s seconds" % (time.time()-pre_time))
Now let’s suppose call the file main.py, this is how to use it:
python3 main.py -d <MAIN_DOMAIN> -i <SUBDOMAIN_INPUT_FILE> -f <OUTPUT_FILE> -t <THREAD_NUMBER>
# Example:
python3 main.py -d google.com -i subdomains.txt -f output.txt -t 30
Further readings
If you found it interesting to read, I recommend you the following articles: