In this guest post, Allan Flores of Pinoy.FM shares his experience in managing high-traffic sites. Should you pay for dedicated hosting or use Amazon AWS or other grid server solutions for your web application? Pinoy.FM, launched in February 2008, is the fastest-growing music web application for Filipinos. When you’re in the mood for good music, listen to pinoy.fm.
If you have read Marie’s post of our interview, you are probably wondering the logical and technical aspects of using Amazon Web Services and how you can apply this technology towards your growing startup business. Well, let me share with you how we are using these services and what are the factors that led us to rely on this new (or not so new) technology. Most of the details I will share with you are based on my experience as a technical lead for high-traffic websites such as torrentspy, myspace and fandango, and how we apply them for pinoy.Fm’s infrastructure.
Defining the Problem
Virtualization or the use of grid computing (sometimes referred to as cloud computing) is not for everyone. There are advantages and disadvantages you have to weigh in to be able to make a decision on whether this platform is good for your business. What we need for Pinoy.Fm was purely computing power. Not storage. Not banwidth (although this seems to come as a prerequisite for any web-related application). In fact, we need not only extra computing power but one that is scalable. By that I mean, we can expand or reduce that computing power as we find necessary. And it goes with it, that we only pay for what we use.
Finding the right solution
If you are facing a similar challenge, then virtual computing may be right for you. So we came up with quite a number of possible solutions to address the issue:
- Buy extra dedicated servers to enhance our streaming capability.
- Build our own servers and get a reliable co-location service.
- Use grid computing.
Solution #1: Buying Extra Dedicated Servers
Sure this approach can easily address the problem. We don’t have to worry about hardware maintenance. We just need to find a reliable host, buy maybe 5 to 10 servers, we might even get a discount and a 5-star customer support treatment. But alongside this approach comes the problem of reliability. If we buy servers to just one host and their network went down, so is our business. Buying from multiple hosts can easily solve that issue but then we have to deal with a lot of people if we need to upgrade or do maintenance on these servers.
We are a startup, our time is better spent doing something productive, not calling different technical support centers because we upgraded our servers and needs a reboot. Another downside is maintenance, so we have 10 dedicated servers, if we upgrade our algorithm, we need to push those changes to 10 servers. If you’re a technical person like me, you know how painful it is to push changes on more than 1 server.
Our website is a user-generated content site. This means that most (if not all) items you see on our site are user-submitted. That includes audio uploads. Having 10 dedicated servers would mean we have to push the all data to each server to keep them synchronized. That alone would probably consume our monthly bandwidth allowance since we have 200 mp3 uploads per day, not even considering around 300 avatars. So this approach quickly drops off the list.
Solution #2: Building Our Own Servers
This is the fanciest idea of all. I, myself, have never been in any company that doesn’t build their own server. Torrentspy, for example, we build our server, we test them by making them game servers for a month, and then deploying them for production. How fun is that?
But we don’t have the cash. We are a startup, remember? A decent server can run at least $6000-$10000. And that’s not the best you can get. Although the colocation service might save you some money, a nice 10Mbps connection runs at $88 per month. This seems feasible except for the fact that, after we spent all that money in buying those servers, we are faced with the same problem as buying dedicated servers. Worse is colocations normally does not offer support, at most you can get is a 2 free reboots per month.
Solution #3: Grid Computing to the Rescue
All these issues are gone when you talk of Amazon Elastic Compute Cloud (EC2). No hardware problems, nor any issues related to that. Pushing changes to your application is very simple too. You just have to rebuild your base image once, then restart all your instances to use that image and your done. Synchronization of data was a problem at first. Until we learned that you can use Amazon’s other service (Amazon Simple Storage Service) as a backup platform and you can mount them to any instance you wish. So what we did is build a base image, attach our S3 storage to the base image and we just fire up any instance we need. Thus giving us enough computing power to stream at par with Last.Fm. Except that we only have 3 people looking over these stuff on a part-time basis compared to their 100+ employees. And we only have to move our files once, on a central location mounted on any of our server instance.
Amazon EC2 is highly reliable. This is the same infrastructure where amazon.com runs their business. Unlike Google App Engine, in it’s infancy, and yet to prove its reliability. Sure Google network is reliable, but does it run on the same platform as the one they are selling you?
Taking it to the next level
One thing I’ve learned about online business is keeping things simple. We tried to follow that at Pinoy.Fm. With torrentspy, we used to have 14 million page views per day. And only 3 people maintaining it. That’s a lot of traffic for such a small team. How did we do it? Automation. Almost everything is automated, and it is what we are doing with Pinoy.Fm. Another great service from Amazon is SQS. We use this platform for our automation. At Pinoy.Fm, we collect hundreds of thousands of information every day. Who befriends who, who listens to what, who rated what tracks, what is the popular track today, who sent what to who. These are all bits and pieces of information that requires processing power above your regular request and response algorithm. And we use SQS and SimpleDB to process these information. Thus bundling Amazon’s web services into our system saves us money since exchange of information between EC2 instances and S3 is FREE.
The next thing we are doing is automating the process of scaling of our server instances. The image below shows the performance metrics on one of our server instance.

- Green – streams being processed
- Blue – the requests on queue
- Red – the cpu consumption
- Yellow – IO operation on current streams
We are planning to take grid computing to the next level by analyzing the red line. If at any certain time, the red line goes over say 50% and this is true for all instances running, then our system can automatically fire up a new server to supplement the demand. Likewise, if it falls below 30%, we can program our system to automatically kill the server with the lowest cpu usage. Once in place, this tool can save us a bunch of money plus give us more time to do other things than worry about our infrastructure. This tool is currently being developed and we are planning to release it as an open source project.
There are a lot of grid computing products out there. And in full view, this may not seem to be the right product for you. But if your facing the same challenges that we have, it might be well worth your time to try and play with it. With some technical skills and determination to save some money, you can make technology work for you, for less.
More info on Amazon SQS, SimpleDb, Amazon EC2.

Or, subscribe via email:




RSS feed for comments on this post· TrackBack URI
Comments
Alvin Tan
July 9, 2008 9:29 pm
Great post, Allan! Thanks for sharing your experience.
Andre Marcelo-Tanner
July 11, 2008 9:37 am
Very interesting post, how hard is it to integrate your systems with Amazon Web Services? Common systems like PHP and MySQL, but doesnt AWS use a different format?
Also for dedicated hosting, its true pushing updates especially large file updates to 3+ servers is not the ideal way to do things, you’d probably want to setup a Network File System. Also in a dedicated setup all your servers should be internally connected or in the same cabinet so your data doesnt go outside your host’s network and they shouldnt charge you for intranetwork data.
Allan
July 11, 2008 10:55 am
@alvin : Thanks.
@andre : on the bottom of the article, there is a link to amazon ec2. you can find the page to their public images. if you are running LAMP stack, there are lots of available pre-built images you can just run and maybe modify to fit your needs and then save it as your own private image. there are different flavors of distro too.
i agree with the dedicated server setup that you mention. and as i have said, the problem is in that very same setup. if the network hosting those servers went down, all your servers go down as well. at pinoy.fm, we need redundancy with the least dependency among streaming servers.
Andre
July 20, 2008 2:32 pm
Though what happens with Amazon EC2 goes down?
Allan
July 20, 2008 10:59 pm
i’d knew you’ll ask that. if amazon ec2 goes down, since amazon.com uses ec2 as well, they will loose $1.8 million per hour. now, i think i will have to let them worry about keeping the service up. last year, ec2 downtime was close to 4 hours (for the whole year) and mostly due to maintenance. this year it was 2 hours (maintenance).
http://mashraqi.com/2008/06/amazon-down-costing-company-36-million.html
Allan
July 31, 2008 3:04 am
Hi all.
We just crossed our 100,000th member registration and 400 Million streams. With this, we are releasing a new feature “indie music” where you can download mp3s for free.
We have secured creative commons licenses among independent artists and are starting to upload a few albums from more than 3,000 that we collected.
We are kicking off 5 albums from Allison Crowe. If you are a fan, you can get them now on our website for free.
We will be uploading 5-10 albums more per day as we try to leverage the volume of downloads and internet traffic (not overwhelm our servers). These are 192K bitrate mp3s and quality is pretty good.
We incorporated a lot of features as well in our version 3 release yesterday. Check us out and get your free mp3s.. No strings attached. Promise.
polhen
July 31, 2008 5:32 pm
Pareng Allan,
How are you doing?
Polhen
Leave a Comment