The NGINX Access Log–a Primer
The following techniques require ssh access to your Linux host. Likely, you will need to be a member of the sudoers (a Linux group with root permissions). If you are new to Linux, you can learn more.
By default, you will find your NGINX access log in
/var/log/nginx/access.log. Different linux distributions handle log permissions differently, however, it is common for read access to the NGINX access log to be limited to root access. The usage of
sudo in the following examples is on account of this.
In multi-domain setups, you may find each domain logs to a separate file. These will also likely be found within the
/var/log/nginx/ directory. For the purposes of this tutorial, I will assume the target log file is the default
How Do I Detect Traffic on My Weberver?
Whether you're changing DNS from one server to another, or checking whether your server is receiving traffic in the way you anticipate, a common command to see current traffic is the following:
sudo tail -f /var/log/nginx/access.log
tail -f is a command to monitor the end of any file, in this case the NGINX access log. If you you don't see activity immediately, you can create some with a browser by navigating to the site. To "break" from tail, hit Ctrl + C.
Wait, That's Too Much Traffic
On a high-traffic site, tailing the access log can generate a deluge of information. If you want to monitor a single IP address, perhaps your own, you can pipe
grep to filter to only the one IP.
sudo tail -f /var/log/nginx/access.log | grep "18.104.22.168"
The same technique can be used to filter to a particular user-agent, webpage, or URL argument. Any information output to the log is fair game.
sudo tail -f /var/log/nginx/access.log | grep "bingbot"
Accessing Historical Data
tail method gives you a live feed of traffic, it is often necessary to look back in the logs. If you only need a recent sample you can replace
tail in the previous example with
more. In more common scenarios, however, you will require the largest traffic sample possible.
By default, NGINX historical logs are stored in a compressed, gzip format, within the same directory as the
access.log. The files will be named "access.log.[number].gz." Here you will want to use
zgrep along with the path wildcard
* to extract the maximum data.
Say you want to see all 500 server errors, here you would execute the following command (note, \" is to avoid including 500 byte files as false positives):
sudo more /var/log/nginx/access.log.* | zgrep "\" 500 "
It's not uncommon for this command to take a minute.
Wait a Second. How Do I Get the Data out of SSH?
While some cases will require the results be scanned visually, most often the data is saved to a file so it can be processed further. The most straight forward way to do this is to direct stdout to a file, e.g.:
sudo more /var/log/nginx/access.log.* | zgrep "url/path/of/interest/." > ~/myresults.txt
If you're unfamiliar with the tilde, that is simply a shortcut to your user's home directory. You can
cd ~ to get there, but more importantly you can
scp from a remote shell to grab the file from that directory without any hassle. Getting the results to your desktop using Mac Terminal or a shell on Windows Subsystem for Linux:
scp email@example.com:~/myresults.txt .
Sometimes you don't need a data feed, you just want an idea of how many times something happened. A simple trick (Doctors hate me!) is to pipe
wc -l in to get a line count. For example, to grab the count of 404s in the historical logs, you could use the following command:
sudo more /var/log/nginx/access.log.* | zgrep "\" 404 " | wc -l
There are two paths to leveling up on working with access logs. The first is to generate the data feeds in Linux, and then parse and process it in a dynamic language with REPL support, such as Python. Linux die-hards will tell you this is 90% unnecessary, that you just need to master
sed. The best way, as always, will be whatever works best for you.