How much traffic comes from the front page of Hacker News?

This is one of those posts that I didn’t really want to write at first, since it’s about meta blogging that reminds me on those sad blogs that started 5 years ago with “How I write my blog” post and then the stream of ideas fizzled and the blog never got anywhere.

A lot of people are wondering how much traffic comes from the post that sits on the front page of Hacker News or how to prepare your server in case one of your posts ever goes viral and hopefully this post will shed some light on this topic. Somehow two of my posts happened to appear on the frontpage at the same time (posts 4 and 5 on the screenshot below) and I was able to get the traffic numbers from the server logs.

Two of my blog posts on the frontpage of Hacker News on 4th and 5th place

Generated traffic on the first day of Hacker News front page

In the first 20 hours we served ~85,000 requests out of which ~43,000 were unique. Keep in mind some of them are bots, so we served our pages to less than 86,000 eyeballs. All requests up to this point came from Hacker News which means that each post on the front page brought roughly ~20,000 unique requests. After the first 20 hours one of the posts was also reposted on Reddit, which further increased the traffic numbers described in the next section.

The reported bandwidth from the screenshot is wrong as it represents only the bandwidth that was spent for serving html pages, since the static asset requests are not logged.

Traffic

The following tables contain data from 2021-02-14 to 2021-02-15 and contains information about the traffic coming from both Hacker News and Reddit raid.

Traffic in numbers
Requests (2021-02-14)93,275
Requests (2021-02-15)46,390
Total Requests139,665
Unique Requests68,123
Outgoing Bandwidth~6 GB

The bandwidth number was fetched from the HAProxy dashboard. It’s an estimate, since the number on the dashboard also contained requests before this event, but the regular traffic is obviously much smaller.

PageRequestsTotal
/posts/the-complexity-that-lives-in-the-gui/69,42250.53 %
/posts/on-navigating-a-large-codebase/49,80336.25 %
/posts/42183.07 %
/posts/index.html20501.49 %
/posts/its-just-a-button/15861.15 %
Referring SitesRequestsTotal
news.ycombinator.com33,91859.60 %
blog.royalsloth.eu10,81719.01 %
reddit.com37016.50 %
google.com23754.67 %
t.co (Twitter)14442.54 %

The numbers for referring sites are much smaller than the ones from the page section, since not all requests contain the Referer header. While it’s nice to know where the traffic is coming from, the Referer header also represents a potential leak of information. The server logs often contain a Referer that points to the company’s internal wiki pages. Not the best way to keep your future plans secret.

<redacted ip> - - [25/Mar/2021:13:56:39 +0100] 
"GET /posts/the-complexity-that-lives-in-the-gui/ HTTP/1.1" 200 16807 
"https://<redacted>/display/RD/2021-03-25+Architectural+Meeting+Notes" 
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0"

Apparently, Google is also prioritising your pages during the raid which will lead to additional influx of traffic; in this case 4.7 % of the total traffic came from Google.

Traffic by location

Traffic per continentRequestsTotal
North America72,40151.84 %
Europe50,96536.49 %
Asia90386.47 %
Oceania39502.83 %
South America14721.05 %
Unknown10870.78 %
Africa7520.54 %
North AmericaRequestsTotal
United States66,03947.28 %
Canada58254.17 %
Mexico2410.17 %
Costa Rica740.05 %
EuropeRequestsTotal
Germany10,4947.84 %
United Kingdom79095.66 %
France48773.49 %
Netherlands30732.20 %
Sweden25971.86 %
AsiaRequestsTotal
India28876.47 %
Israel9990.72 %
Japan9440.68 %
Singapore8200.59 %
OceaniaRequestsTotal
Australia31542.26 %
New Zealand19392.85 %
United States Minor Outlying Islands20.00 %
French Polynesia10.00 %
Papua New Guinea10.00 %
AfricaRequestsTotal
South Africa3850.28 %
Egypt920.07 %
Kenya570.04 %
Nigeria420.03 %
Algeria310.02 %

Traffic by time

+------+----------+----------+-------------------------------------+
| Hour | Requests | Total    | Visualization                       |
+------+----------+----------+-------------------------------------+
   00    3387       2.43 %   ||||||||||||
   01    2719       1.95 %   ||||||||||
   02    2682       1.92 %   ||||||||||
   03    1987       1.42 %   |||||||
   04    2756       1.97 %   ||||||||||
   05    5125       3.67 %   ||||||||||||||||||
   06    5198       3.72 %   ||||||||||||||||||
   07    5748       4.12 %   ||||||||||||||||||||
   08    6668       4.77 %   |||||||||||||||||||||||
   09    6807       4.87 %   ||||||||||||||||||||||||
   10    5754       4.12 %   ||||||||||||||||||||
   11    4807       3.44 %   |||||||||||||||||
   12    6916       4.95 %   ||||||||||||||||||||||||
   13    7956       5.70 %   ||||||||||||||||||||||||||||
   14    9048       6.48 %   |||||||||||||||||||||||||||||||
   15    9354       6.70 %   |||||||||||||||||||||||||||||||||
   16    8548       6.12 %   ||||||||||||||||||||||||||||||
   17    8593       6.15 %   ||||||||||||||||||||||||||||||
   18    7392       5.29 %   ||||||||||||||||||||||||||
   19    6817       4.88 %   ||||||||||||||||||||||||
   20    5653       4.05 %   ||||||||||||||||||||
   21    5371       3.85 %   |||||||||||||||||||
   22    5588       4.00 %   ||||||||||||||||||||
   23    4791       3.43 %   ||||||||||||||||

The majority of requests were made between 15:00-16:00 UTC with a total of 9354 requests or 2.6 requests/s. The requests usually do not come one after another, but even if they come in bursts, this traffic is close to nothing for what a modern server is capable of serving.

Traffic by users

Operating SystemsRequestsTotal
Darwin32,26423.10 %
Macintosh23,95917.15 %
Android19,64414.07 %
Windows19,27113.80 %
iOS17,47912.51 %
Unknown14,50810.39 %
Linux12,0118.60 %
Known bots2510.18 %
Chrome OS2220.16 %
BSD560.04 %

I don’t know which device identifies with a Darwin user agent, but judging from the server logs it probably comes from a broken iOS application that caused a huge amount of requests in a short period of time. See the Infrastructure section for more details.

BrowsersRequestsTotal
Unknown41,25729.54 %
Chrome38,98327.91 %
Safari25,27618.10 %
Firefox23,97117.16 %
Other39402.82 %
Crawlers36012.58 %
Feeds8540.61 %

Judging by the browser version breakdown, the vast majority of visitors were using the latest browser; either due to tech-savvy audience or auto updating functionality that a modern software enforces. Wonderful news for programmers that are writing the websites.

Infrastructure

The server is hosted on the cheapest 3.04 €/month (cca. $3.7 at current rates) plan at Hetzner 1 and is located in Germany. The server specifications for this plan are:

Part
CPU1 vCPU
RAM2 GB
Disk20 GB SSD
Traffic20 TB outgoing, ∞ incoming

Hetzner definitely has the best deal for renting a small virtual machine and I don’t see any reason why would other providers charge more for objectively worse hardware and less outgoing traffic (AWS, Google Cloud, Azure, Linode, Digital Ocean). The drawback is that you can only get a virtual machine in Germany or Finland which may or may not be suitable for your use case.

This blog is generated with Hugo and is using a custom template in order to keep the page size small. Unfortunately, a vast majority of Hugo templates out there are cobbled together with megabytes of cruft that you don’t really need for serving a simple blog. A typical blog post currently sits at around 100kB on the first request (half of that are fonts). After the first request, the assets are cached on the client and every subsequent request has to transfer only the html file, which is usually quite small (cca. 20kB).

All the static files (fonts, css, images) are served locally via Nginx which sits behind the HAProxy. If there is ever a need to scale this blog beyond one server, adding a new server into the mix is a simple one line change in the HAProxy config. The main reason why I am using HAProxy instead of Nginx for proxying the traffic, is due to its excellent dashboard page.

The following screenshots are taken directly from the Hetzner’s server dashboard on the first day of the raid:

CPU, Disk throughput server graphs during Hacker News frontpage drive by

CPU was mostly idling at 0-5%. The cause of the CPU spike at around 18:00 was apparently a broken iOS client for Hacker News which kept making requests until fail2ban banned the IP. Hordes of people just kept hammering the poor server, but the server didn’t care.

Network traffic during the Hacker News frontpage drive by

Network traffic peaked at 150 KBps which is close to nothing for a server that is sitting in the data center with a 10 Gbps connection.

It’s a wrap

Occasionally I see blog posts about someone’s infrastructure for serving a blog, and it boggles my mind how much money they are throwing down the drain for serving a few pages. If you can’t handle 100,000 requests per day on a small virtual machine, you are doing it wrong. Tune your websites!

No request is faster than the one that was not made.

Ilya Grigorik (High performance browser networking)

Notes


  1. Referral link, you get $20 for signing up and I get some free server credits for every person that signed up with this link. ↩︎