The NGINX Access Log–a Primer

Home » Web Crawler News

Getting Started

The following techniques require ssh access to your Linux host. Likely, you will need to be a member of the sudoers (a Linux group with root permissions). If you are new to Linux, you can learn more.

By default, you will find your NGINX access log in /var/log/nginx/access.log. Different Linux distributions handle log permissions differently, however, it is common for read access to the NGINX access log to be limited to root access. The usage of sudo in the following examples is on account of this.

In multi-domain setups, you may find each domain logs to a separate file. These will also likely be found within the /var/log/nginx/ directory. For the purposes of this tutorial, I will assume the target log file is the default access.log.

How Do I Detect Traffic on My Webserver?

Whether you're changing DNS from one server to another, or checking whether your server is receiving traffic in the way you anticipate, a common command to see current traffic is the following:

sudo tail -f /var/log/nginx/access.log

tail -f is a command to monitor the end of any file, in this case the NGINX access log. If you you don't see activity immediately, you can create some with a browser by navigating to the site. To "break" from tail, hit Ctrl + C.

Wait, That's Too Much Traffic

On a high-traffic site, tailing the access log can generate a deluge of information. If you want to monitor a single IP address, perhaps your own, you can pipe tail to grep to filter to only the one IP.

sudo tail -f /var/log/nginx/access.log | grep "158.69.242.81"

The same technique can be used to filter to a particular user-agent, webpage, or URL argument. Any information output to the log is fair game.

sudo tail -f /var/log/nginx/access.log | grep "bingbot"

Accessing Historical Data

While the tail method gives you a live feed of traffic, it is often necessary to look back in the logs. If you only need a recent sample you can replace tail in the previous example with more. In more common scenarios, however, you will require the largest traffic sample possible.

By default, NGINX historical logs are stored in a compressed, gzip format, within the same directory as the access.log. The files will be named "access.log.[number].gz." Here you will want to use zgrep along with the path wildcard * to extract the maximum data.

Say you want to see all 500 server errors, here you would execute the following command (note, \" is to avoid including 500 byte files as false positives):

sudo more /var/log/nginx/access.log.* | zgrep "\" 500 "

It's not uncommon for this command to take a minute.

Wait a Second. How Do I Get the Data out of SSH?

While some cases will require the results be scanned visually, most often the data is saved to a file so it can be processed further. The most straight forward way to do this is to direct stdout to a file, e.g.:

sudo more /var/log/nginx/access.log.* | zgrep "url/path/of/interest/." > ~/myresults.txt

If you're unfamiliar with the tilde, that is simply a shortcut to your user's home directory. You can cd ~ to get there, but more importantly you can scp from a remote shell to grab the file from that directory without any hassle. Getting the results to your desktop using Mac Terminal or a shell on Windows Subsystem for Linux:

scp user@example.com:~/myresults.txt .

Getting Counts

Sometimes you don't need a data feed, you just want an idea of how many times something happened. A simple trick (Doctors hate me!) is to pipe wc -l in to get a line count. For example, to grab the count of 404s in the historical logs, you could use the following command:

sudo more /var/log/nginx/access.log.* | zgrep "\" 404 " | wc -l

Advanced Usage

There are two paths to leveling up on working with access logs. The first is to generate the data feeds in Linux, and then parse and process it in a dynamic language with REPL support, such as Python. Linux die-hards will tell you this is 90% unnecessary, that you just need to master sed. The best way, as always, will be whatever works best for you.


InterroBot is a web crawler and devtool for Windows 10/11.
Want to learn more? Check out our help section or download the latest build.