In This Series...
- Prelude to Eternity
- The Tale of Eternity: Part 1
- The Tale of Eternity: Part 2
- Interlude: See-Invisibility Exploit
- The Tale of Eternity: Part 3
- (more to come)
If you only care about the DDoS attack, scroll down.
I remember reading Paul Graham's essays, which kept stressing the importance of finding compatible co-founders. One or two of them also mentioned that poor founder relations are often a reason for startup dying. All praise the oracle.
The Golden Years
EternityRO started with a bang. Our IRC channel was flooded with regulars, and frequented by random players here and there who needed support. The work never seemed to stop. We barely slept, and implicitly divided up our availability so there was always at least one person taking charge. Our dedication paid off, and Eternity Version 1 grew to about 1,200 simultaneous players. Boom, bang--whatever you want to call it, we really couldn't have dreamed for more. I was even scared we might go over our allocated 1TB of bandwidth for the month.
The Dilemma
Alas, all good things must come to an end. After several weeks of operation, the stress etched away Ayumi's sanity, and I witnessed a cascade of poor decisions. From freaking out at players to disappearing randomly, it was obvious that something was wrong. Like any friend, I tried to talk to her - but I was met with hostility and anger.
If this were a smaller project, I would've said, "Screw the project. You're stressed out and overreacting, and I want to help. Sit down." That's what any good friend would have done. But thousands of people were depending on us, and by this time, almost two thousand dollars had already been invested by the population. I had to weigh an emotionally unstable co-founder against a semi-business, and the enjoyment of thousands of people.
Enter Snow and Aisha - two of the five women responsible for EternityRO's booming success. Aisha's experience and personality made her well suited for management, and Snow was a development powerhouse. After trying to lure them into Eternity for a few days, they offered their assistance... but at a price. They wanted me to fire Ayumi and Prodigy.
The Decision
How does one co-founder fire the others? I guess you don't - but there was no equity in this case. After deliberating and arguing with myself for a night, I begrudgingly agreed. In retrospect, I can say I made the right decision. While it may sound morally abhorent, it might be better to look at this through consequentialist lenses. The alternative would've been to let Eternity die. Hey - I saw Social Network! I'm not the only one who did this!
Removing influential community members is usually a delicate process. You need to have finesse, and yet generate enough momentum to carry the motion through. It's actually remarkably similar to the Needle Through Glass technique I learned. The trick to using a needle to penetrate a pane of glass involves throwing the needle straight, yet with enough force to break the glass. A fault on either end will either bounce the needle back or shatter the glass. I'm getting a little bit off topic; I'll explain in another post some time how this and other techniques like breaking bricks and metal bars works. For now, here's a picture from when I was learning:
I changed the passwords on every system, and even the ssh ports. I closed down their forum accounts, disabled their SVN access, and redirected their emails to a new support email I set up - all while they were asleep. The last thing I needed was an emotional overreaction. Eternity was going strong, and I wasn't going to let it break apart from the inside. I announced their departures as mutual, and announced Snow and Aisha shortly after. The community didn't care all that much.
The Hiring Process
Ayumi and the others freaked and eventually disappeared from the internet (not an easy task). To compensate for the missing manpower, I charged Aisha with recruiting new support staff. She solicited applications, and to my surprise, recruited 20 new support GMs (Game Masters). Twenty is an exorbitant amount of people to introduce at once - but my objections were moot. Although Aisha was actually very small, cute, and huggable in real life, she scared me online.
Here's an approximate breakup of what happened to the twenty keen recruits:
- 10 of them quit or stopped logging in after the first month due to stress, incompetence, and other factors
- 5 of them dropped out between the first and second month due to stress, and real life
- 2 of them dropped out between the second month and fourth month
- 1 of them was fired for outright cheating
- 1 of them was fired for conspiring with enemies/competitors
- 1 of them stayed on and proved to be both exceptionally useful and intelligent. Hello Griffin!
Training twenty people at once was downright painful, especially since they were all volunteers. We didn't pay any of our staff, so you might be curious why anyone would work for us. Well, people like to have a sense of power, and that's really why people become Game Masters - even though they'll tell you it's because they want to help the community.
So why did 85% (17/20) of our new recruits drop out? Some of you HR fanatics have probably already decided that we had a bad selection or orientation process. While this might be true of this specific incident, our future open-hiring showed similar patterns. No, the problem is deeper rooted than our practices.
To understand why we had such a ridiculous churn rate, we need to look at motivation. People are only motivated by power for so long; once the honeymoon rush is gone, they lose their incentive. Add constant player-abuse, cheating accusations, and pressure to perform - and all of a sudden the job isn't so appealing anymore. Since they don't have much to lose, most people either quit or simply disappear.
The Irony
However, there was one out of the 20 who stayed on board, firm and committed. In fact, after version 1, I gave him administrative powers and started sharing all my plans with him. He essentially took on the role of co-founder, and at times even put me in my place. But why didn't he quit? Why didn't he lose motivation? You could say it was because he loved the community, but I have a better answer.
After version 3 died, we had a reminiscent chat. Apparently the only reason he applied for a Game Master position, was so he could cheat - but not in the classic way. Because all Game Masters could see which other Game Masters were online, Griffin used this to his advantage. He ran a small bot network on the server, and linked the software to his Game Master account. His script worked something like this (pseudo code):
if(GameMasters.Online.Count() > 0) { AllBots.logoff();}
Basically, because his bots avoided the inherent cheat protection, a Game Master was required to catch them. If they were never logged on when Game Masters were online, no one could ever prove he was cheating. I laughed when I found out. Obviously, Griffin stopped doing this eventually, and probably stayed on for the same reason as I did: intellectual curiosity. Several of our other staff members were also previously cheaters - ha.
The DDoS Attack
Recently, a wise man told me "when you run a successful tech startup, there are two truths: you will be DDoS'd and you will be sued." One night, I was sitting peacefully at my computer, when the server froze. After frantically trying to SSH into my box after being pelted with close to a hundred calls/texts/emails/forum PMs/IRC messages, I decided to log into my control panel to check my bandwidth. Oh my f***. Here's what I saw:
Distributed Denial of Service Attack Explained
The Distributed Denial of Service Attack--otherwise known as a DDoS attack--is a cyber attack, where many different computers flood a target with data at the same time. The theory behind a DDoS attack, is if you can overwhelm the victim's resources while making your requests indistinguishable from legitimate request, then the host can't protect against it.
DDoS attacks are usually done through compromised machines. For example, someone may send you a virus which stays dormant until it receives a command. Upon activation, the virus will use your computer to send packets of data to a server that the controller requested. This happens to all compromised machines, which means the originating IPs of these attacks could be spread all over the world. As the victim, you can't filter out IPs or subnets at risk of filtering out your legitimate users.
At a certain point, the victim is overwhelmed with data. Either the attack overloads the bandwidth capability of the server, or it strains the hardware or software in the machine. Eventually, if the attack is successful, the server is unable to operate properly. In our case, the bandwidth capability was overloaded.
Containment Efforts
I googled this, asked friends, and even emailed professors and experts in security, asking how I could stop these DDoS attacks. I remember a lot of jargon about APF, firewalls, etc, but here's one of the more helpful responses I received:
So I had a brief chat with Dr. McHue about your problem and depending on how severity the DDoS attacks, you might be screwed. He actually recommend the IP caching filtering approach that I suggested (I was surprised... I was right) however he indicated that you will probably need some dedicated hardware for it because the lookups will eat an entire CPU, possibly more. You also want to make sure you store the IP lookup table entirely in memory. Finally, this machine has to be located far enough up the network so you minimize the packet lost due to high traffic; this might mean putting the machine at the ISP even.Other things that he suggest is simply changing the IP of the machines you have and see how long it takes for the attackers to update. Depending on how long it take, you might get some clues as to how professional these attackers are. The second suggestion he made is to incorporate some sort of distributed architecture - multiple machines for logging in and clusters for the game state. The greater the distribution, the more difficult it is to get flooded with traffic.What did you guys do to piss someone off enough to DDoS you?
Basically... we needed specialized hardware, or (and this is key): we're screwed. Even more importantly is the last sentence!
My... Roundabout Solution
Well, I was stumped, but I made some assertions that proved useful in solving the problem:
- EternityRO is a game, so whoever is DDoSing me probably isn't really all that professional.
- If I make it more expensive and risky to DDoS me, they might stop.
- Additional bandwidth probably costs less than acquiring new compromised machines. Outputting too many packets from each machine would make their attacks detectable and then I could filter them out.
Now, at this point, I had consulted Razor Servers on what I should do. Unfortunately, their response to stemming a DDoS attack was to null route my IP - which is fancy for turning my machine connection off. This really pissed me off, but I never stopped using Razor Servers. Their services are generally fantastic and well priced, but their customer service needs a lot of work. Rather than keep my main box there, I rented out several virtual private servers for secondary functionality - but I'll get to that later.
I did a quick google search, found that Softlayer offered large uplink ports, a Cisco Guard, and a server in Washington DC. Having no alternative, I moved my server to Softlayer. They gave me a 1gbps--1000 mbps--line for just $10/mo! Obviously, I couldn't use that much all the time, but I also doubted DDoS attackers could keep their attacks up for all that long. I essentially called the bluff and won. After a while, we stopped receiving DDoS attacks altogether.
I guess this wasn't the best solution, but it was the solution we were looking for. I'd like to think running any project is about judgment calls - and it wouldn't have been worth our effort to find a technical solution to such a big problem. For a larger corporation, that may not be the case.
More to come... I'll post more as I write it. Follow me on twitter (@zeteg) for updates, or email me if you have questions.