In this article, we will learn how to efficiently parse Nginx logs using a Python script. Nginx is a widely used web server known for its performance and reliability. Its logs can provide valuable insights into web traffic and server performance, making them a crucial resource for system administrators and developers. We will explore a Python script that processes Nginx log files, extracting key information and writing it to a CSV file for further analysis.
Prerequisites
Step #1:Installing Python and Nginx
First update the Package List.
sudo apt update
![Python Script to Parse Nginx Log Files 1](https://www.fosstechnix.com/wp-content/uploads/2024/06/1-7.png)
Then install Python and pip.
sudo apt install python3 python3-pip
![Python Script to Parse Nginx Log Files 2](https://www.fosstechnix.com/wp-content/uploads/2024/06/2-7.png)
Install Nginx on the system.
sudo apt install nginx
![Python Script to Parse Nginx Log Files 3](https://www.fosstechnix.com/wp-content/uploads/2024/06/3-7.png)
Verify nginx status to see if its running properly or not.
sudo systemctl status nginx
![Python Script to Parse Nginx Log Files 4](https://www.fosstechnix.com/wp-content/uploads/2024/06/4-7-1024x270.png)
Step #2:Write a Python Script to Parse Nginx Logs
First create a file to write a script.
nano parse_nginx_logs.py
![Python Script to Parse Nginx Log Files 5](https://www.fosstechnix.com/wp-content/uploads/2024/06/5-6.png)
Below is a Python script to parse Nginx logs. This script reads log files, extracts relevant fields, and writes the information to a CSV file.
import argparse
import csv
import re
import glob
from datetime import datetime
import os
import gzip
# Define a regular expression pattern to match a line in an nginx log file
line_format = re.compile(r'(\S+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"')
# Define a function to format bytes as a string with a unit
def format_bytes(bytes):
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if bytes < 1024.0:
return f"{bytes:.2f} {unit}"
bytes /= 1024.0
# Define a function to process nginx log files
def process_logs(log_path, output_file):
# Check if the log_path is a directory or a file, and get a list of files to process
if os.path.isdir(log_path):
files = glob.glob(log_path + "/*.gz") + glob.glob(log_path + "/*.log")
elif os.path.isfile(log_path):
files = [log_path]
else:
print("Invalid log path")
return
# Define some variables to store summary statistics
ip_counts = {}
status_counts = {}
status403_ips = {}
referrer_counts = {}
bytes_sent_total = 0
# Open the output file and write the header row
with open(output_file, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["IP", "Timestamp", "Method", "URL", "Status", "Bytes Sent", "Referrer", "User Agent"])
# Loop over each file to process
for filename in files:
# Check if the file is gzipped or not, and open it accordingly
if filename.endswith('.gz'):
open_fn = gzip.open
else:
open_fn = open
# Open the file and loop over each line
with open_fn(filename, 'rt', encoding='utf-8') as file:
for line in file:
# Match the line against the regular expression pattern
match = line_format.match(line.strip())
# If there is a match, extract the relevant fields and write them to the output file
if match:
ip, date_str, request, status, bytes_sent, referrer, user_agent = match.groups()
dt = datetime.strptime(date_str, '%d/%b/%Y:%H:%M:%S %z')
try:
method, url = request.split()[0], " ".join(request.split()[1:])
except IndexError:
method, url = request, ''
writer.writerow([ip, dt, method, url, status, bytes_sent, referrer, user_agent])
# Update summary statistics
ip_counts[ip] = ip_counts.get(ip, 0) + 1
status_counts[status] = status_counts.get(status, 0) + 1
bytes_sent_total += int(bytes_sent)
if status == '403':
status403_ips[ip] = status403_ips.get(ip, 0) + 1
referrer_counts[referrer] = referrer_counts.get(referrer, 0) + 1
# Print summary stats
print("\033[1m\033[91mTotal number of log entries:\033[0m", sum(ip_counts.values()))
print("\033[1m\033[91mNumber of unique IP addresses:\033[0m", len(ip_counts))
print("\033[1m\033[91mNumber of unique status codes:\033[0m", len(status_counts))
print("\033[1m\033[91mBytes sent in total:\033[0m", format_bytes(bytes_sent_total))
print("\033[1m\033[91mTop 10 IP addresses by request count:\033[0m")
for ip, count in sorted(ip_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
print(f"{ip}: {count}")
print("\033[1m\033[91mTop 10 status codes by count:\033[0m")
for status, count in sorted(status_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
print(f"{status}: {count}")
print("\033[1m\033[91mTop 20 IP addresses with status code 403:\033[0m")
for ip, count in sorted(status403_ips.items(), key=lambda x: x[1], reverse=True)[:10]:
print(f"{ip}: {count}")
print("\033[1m\033[91mTop 10 referrers by count:\033[0m")
for referrer, count in sorted(referrer_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
print(f"{referrer}: {count}")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Process nginx log files.')
parser.add_argument('--log_path', metavar='LOG_PATH', type=str,
help='Path to the log file or directory')
parser.add_argument('--output_file', metavar='OUTPUT_FILE', type=str,
help='Path to the output CSV file')
args = parser.parse_args()
process_logs(args.log_path, args.output_file)
![Python Script to Parse Nginx Log Files 6](https://www.fosstechnix.com/wp-content/uploads/2024/06/7-4.png)
Save the file (ctrl + x, Y) and exit (esc).
Execute the script with the desired log path and output file location.
python parse_nginx_logs.py --log_path /var/log/nginx --output_file output.csv
![Python Script to Parse Nginx Log Files 7](https://www.fosstechnix.com/wp-content/uploads/2024/06/Nginx-Log-1024x224.png)
How the Code Works:
The script to parse Nginx logs is by following steps:
- Define the Log Format: It uses a regular expression to match the standard Nginx log format, capturing fields like IP address, timestamp, HTTP request, status code, bytes sent, referrer, and user agent.
- Process and Parse Nginx Logs: The
process_logs
function checks if the provided log path is a file or directory. It then processes each file, including gzipped files, extracts relevant information using the regular expression, and writes the data to a CSV file. - Generate Summary Statistics: The script also calculates and prints summary statistics such as the total number of log entries, unique IP addresses, unique status codes, total bytes sent, and top entries by request count and status code.
Conclusion
In conclusion, parsing Nginx logs using a Python script can provide valuable insights into web traffic and server performance. This script simplifies the process by extracting relevant fields from Nginx logs and writing them to a CSV file for easy analysis. With a few steps, you can efficiently analyze your web server logs and gain a deeper understanding of your web traffic and server behavior.
Related Articles:
Python Script to Create Jira Ticket using GitHub Events
Reference: