If you have visited Ektoplazm anytime in the last two months you have probably experienced slow transfer rates, corrupted downloads, and/or significant downtime and unavailability. There has also been a problem with individual release download counts spiralling wildly out of control. These issues have been plaguing the site since early August. Usually, such things are easily remedied with an email to the technical support staff at my web hosting company, but not this time. Unfortunately I was completely consumed by the last of my undergrad degree requirements in late summer and was unable to devote my full attention to the problem. I have, since then, spent an inordinate amount of time trying to address the underlying issues. Now, having alleviated the worst of the problems, I figure it is time to post an “official” update about he situation. Be warned: things get a bit technical from here on in.
First, some details about the server. For the last several years Ektoplazm has been hosted by Dreamhost on a virtual private server (VPS) provisioned with the usual LAMP stack (Linux/Apache/MySQL/PHP). The entire site is a custom WordPress installation with caching provided by WP-Super-Cache and XCache. Some performance tuning of the front-end application was accomplished several years ago; since then I’ve mainly been adding content and leaving the underlying code alone. In terms of resource usage, I was chewing through something close to 20 Tb of bandwidth per month and gobbling up nearly 1 Tb of HD space. Ektoplazm has grown to be a bit of a monster as far as quasi-personal web sites go!
Back in late August it seemed as if the problem was simply that the site had become too popular: too many people were attempting to access the site at the same time and the web server was unable to keep up with demand. Technical support staff at my web host suggested several courses of action: switch to a lightweight web server, reduce hotlinking (e.g. direct linking to release packages from pirate blogs) and other abusive and/or illegitimate traffic, and address potential disk I/O bottlenecking.
After some research I decided to switch from Apache to Nginx, a lean web server noted for its high performance under load. With some help from this Nginx configuration for WordPress the site was up and running again without much downtime. It took some wrangling to get Nginx tuned for the specific needs of the site but it seems to be running relatively smoothly now. Switching to Nginx immediately solved the problem I had been having with Apache floundering under demand, so that’s good. I still have some things to test and a few bugs to chase down but progress has certainly been made.
Hotlink protection under Nginx was easy to implement. Serving 403s to users surfing in off of pirate blogs and forums did not sit well with me, however. To address this issue I rigged up a script to (hopefully) redirect users to the appropriate release download page with a bit of RegEx. It won’t help those who right-click and “save as…” but this should catch a good chunk of traffic and bring those users back into the fold. There is also a white list of sites from which direct links will be honoured. If you wish for your domain to be added to this list please contact me by email (particularly if you are a label partner that I have inadvertently blocked). Similarly, if you are having trouble downloading releases from a web-based RSS reader please email me with the domain name of your service of choice.
Dealing with “illegitimate” traffic has turned out to be more of a hassle than I had initially thought. Initially I experimented with Nginx’s Limit Zone and Limit Req modules. Together, these modules limit the number of connections and the frequency of requests from individual IPs, ostensibly to block abusive traffic. Here’s the catch: download managers (e.g. FlashGot) don’t play nice with these modules. It seems as if many people use automated browser plugins and third party applications to pillage web sites these days. Some of these download managers don’t take no for an answer; when their requests are denied they simply try again, over and over. This continues to trigger the limiting threshold, locking client and server in a struggle of wills that can last for hours.
Apart from needlessly tying up resources, this vicious cycle also has the effect of inflating download counts. Since labels and artists rely on these counts I’ve received many, many concerned emails about suspicious activity, most of which can be traced to the showdown between greedy download managers and the limiting modules I was testing out for a few weeks. The counts have since been reset, leading to more confusion, but there was no other way to get everything back to normal. In the process, I altered the script responsible for increasing the download counts, and such abusive traffic should no longer cause a problem. (I am being a bit vague here so that it isn’t obvious how to game the counts in the future.) Monthly and weekly download charts will take some time to regenerate as I had to clear the logs. In any case, the situation is now stable: the download counts are reasonably accurate and should remain that way.
While the Nginx limit modules certainly have some utility they turned out to be impractical for what I had in mind. The entire idea was a bit of a kludge anyhow. What I really wanted to do was to install and configure a firewall like iptables, but I was not able to do so while running on the old VPS. Since then I have upgraded to dedicated hosting and plan to configure a proper firewall to limit bad traffic before it even hits my web server. I am also looking into using CloudFlare; does anyone out there have any experience with it?
Finally, we arrive at the crux of the matter: disk I/O bottlenecking. Hard drives are slow. Think about how long it takes to copy a file from one hard drive to another. Now imagine copying hundreds of files simultaneously. Obviously the transfer rate for individual files will slow down considerably. This is essentially what has been going on in the last few months: demand reached such a high level in August that the hard disk serving files was unable to keep up. Transfer rates have been slow because the web server has had to wait a while for disk access.
Disk I/O bottlenecking can be addressed in a number of ways. Perhaps the most common approach is to reduce disk I/O operations and cache popular files in memory. I did my best to optimize Nginx for reduced disk I/O operations and higher static file throughput but this did not yield results; transfer rates remained slow. Caching won’t help; the files I am serving are too large to be stored in memory. Solid state drives and content distribution networks are prohibitively expensive. Simply mirroring content might work; I’ll be looking into that next. BitTorrent is not an option, and using third party file-sharing services like RapidShare would be terribly impractical and not at all future-proof.
In the end I bit the bullet and upgraded to dedicated hosting at the cost of $209 per month. I no longer share a hard drive with other users. (Access to physical media on a VPS is often shared; this likely contributed to the slowdown.) Of course, pretty much all of the problems I’ve been having could have been addressed by feeding money into the machine but I have been very reluctant to do so given my personal financial situation. Ektoplazm is entirely supported by the community at this point; I did not wish to burden the community with increased monthly expenses without trying out other options. Still, as long as donations remain strong the site will live on. If it reaches a point where there is a shortfall I may have to scale things back somehow, but I am not seriously thinking about that. There is room for cautious optimism; support through the dark months that preceded this post has been strong. I can only hope that it will remain so in the months to come.
I’d like to thank everyone who stepped in to assist me in debugging the issues I’ve been having with the site, from providing simple speed reports to the expert help I received from a few sysadmins and developers. I’m not sure I would have straightened things out without the support of all the free music lovers out there. I still have a few server-related tasks to perform but the worst of it is now in the past.
Next up: a post about the future of the site. Now that I have graduated from university I plan to travel in the next few months. This will impact how frequently I am able to update the site with new music. I am also considering how feasible it might be to transform Ektoplazm into something much greater than it is today. I have a vision… I simply need to decide whether or not to pursue it as I enter into the next chapter of my life.
Photo credit: Maximum 40.