The making of StackEgg

April 06, 2015

A bit of history

Ever since I joined Stack Exchange in 2010, I've been the developer tasked with implementing the April 1st happenings on Stack Overflow and the other Stack Exchange sites. In fact, even before I was hired, I had a large part in the 2010 April Fool's gag, where for a day all users' avatars where replaced with random unicorns, created by my unicorn generator Unicornify, which itself has its roots on the Stack Exchange network.

My first April Fool's project as an employee, in 2011, provided unicorn animations whenever you upvoted or downvoted a post. Those unicorns where again created with Unicornify's rendering engine, which I had tweaked to allow me to animate the unicorns.

2012 saw a unicorn again, this time in the form of Clippycorn, a helpful assistant based on Clippy, but in the shape of a bent-wire unicorn that gave more or less helpful advice to the users.

In 2013 we introduced Chat with an expert, a chatterbot based on the principles of ELIZA, but with responses that were customized to problem-solving situations instead of psychiatry clichés, and with some additional more sophisticated features. The “expert” bot, codenamed Adviza (both a play on the word “advisor” and a contraction of “advanzed Eliza”), was not really very helpful, though it seemed convincing to some.

For 2014, we offered our own virtual currency called Unicoins, allowing users to mine coins and buy fairly useless power-ups.

And now we're in 2015, and this is the story of this year's April 1st feature, StackEgg. I say “feature” because it wasn't really a gag or prank with an even remote chance of being considered a serious new feature; rather, it was a game that offered a little bit of fun for the two days it existed.

The idea

Every year, around the beginning of March, somebody announces that “it's time to start thinking about April 1st”. This year, if I recall correctly, Laura was the first to bring it up. We had a Trello card on the Core Team board to drop ideas on. One of the first comments was from Jon, who suggested

MMO Pacman - regular users gain fake rep as they eat dots on the map. Fruits are badges. Mods/high-rep users can be ghosts. Stackman! Pacoverflow! Websockets!

Well, good thing we didn't end up going there, because Google did almost the same thing this year. I then dropped a comment with an idea that I had, although I couldn't yet see turning it into an April 1st thing:

I had a similar idea to Jon: A Tamagotchi that you have to keep alive for the day. (Probably not going to happen, but we're just brainstorming here.)

Turns out that the idea stuck anyway. During the next weekly team meeting (this was on March 9th), we created a Google doc that contained the basic idea at the top and then a brainstorming section for everyone to drop ideas in. Here's how the doc started:

Basic Idea:

  • Every site gets one Tamagotchi which represents the site. Your goal is to keep your ‘site’ alive & healthy together. Mechanics don’t have any effect on the site, but rather metaphorically represent a site (feed it with upvotes, clean up the poop via review, etc.)

  • Actions are named after site actions and roughly correspond to the things that make a site healthy,

    • e.g. upvotes = feeding (make it happier but fatter), close votes (or review actions) = clean up the poop (if you don’t it gets sick)

    • Note: these are not actual actions, just “themed” options in the UI. It’s all a metaphor, man.

  • Users vote together on the next action to take. Every X seconds (30?) the top-voted action is taken

Here are some examples of brainstorming ideas that did not actually make it into the final game (I only had three weeks after all):

  • Tamagotchis have a branching evolution path, based on the care they receive

    • So more Digimon than Tamagotchi

    • E.g. evolve from a pony to a unicorn or maybe to a narwhal

  • Give hints on the console

  • different looks depending on the reputation level. high rep users see a much cooler version. Eg: LED -> grayscale -> 8bit color -> true color

  • Every X minutes (30?) two site Tamagotchis are paired up and fight (“A challenger appears”)

Research and prototype

And so I started working on it. The first thing I did was installing a Tamagotchi app on my phone as a refresher of how the real thing worked. I did own an actual first-generation Tamagotchi back in the 90s, but that's been a while.

Then I began trying to figure out how stats and actions would work concretely. I attempted to model it after the real evolution of various Stack Exchange sites, looking at site analytics and creating tons of Excel spreadsheets that simulated various different models (including one in which I invented three numerical indicators – Engagement, Maturity, and Relevance – based on actual sites, which was not really helpful at all).

Long story short, simulating something close to reality with realistic numbers wasn't really going to work, so I went back to a simple zero-to-four-hearts model, and made a simple JavaScript prototype in which I played with how exactly stats should be influenced by actions and by other stats. At this point, the stats (Questions, Answers, Users, Quality, Traffic) and actions (Ask, Answer, Upvote, Downvote, Close) where already those that made it into the final game, only flagging for moderator attention didn't exist yet.

It still took a while until I ended up with game dynamics that I felt made sense, including a breakdown into game phases of increasing difficulty and increasing amounts of available stats and actions. These phases, based on the actual Stack Exchange site “life cycles”, were private beta, public beta, and launched (later renamed to graduated), with the final winning state dubbed “winning the internet”. On March 18th, I gave this playable prototype to my coworkers to try, and the feedback was very positive, and thus the core game logic was born. Except for some very small tweaks in numbers, the game at this point was identical to what actually ended up being played on April 1st.

Server-side implementation

And so I ported the prototype to C#. You can check out the details yourself; I published the core game on GitHub after April 1st was over. At the same time (we're on March 23rd now), Marc was extremely helpful in creating the backend implementation of recording and storing the voting data (the core game is a single-player game, but what really happened on April 1st was that many users at once played the game by voting on the next action to be taken), and also of creating the infrastructure for repeatedly evaluating the votes after each voting round. We piggybacked on the heartbeat functionality in one of our backend services called StackServer. This “heart” beats every ten seconds, and on every third heartbeat it would then evaluate the past voting round, for a round length of thirty seconds (we later reduced this to twenty).

I created a wrapper class (StackEggGameWrapper) that held a core game object (the StackEggGame), but that also stored and handled things that were not part of the core game, like the time the current voting round will end, the quorum required during this voting round, handling phases in which the actual game was no longer active (like after winning), and the current message that would be displayed in the UI.

We stored all the game state in Redis, serialized via ProtoBuf. That way we didn't need to make any database changes for data that we only needed for a few days, not to mention the fact that it's also blazingly fast – because Redis itself is very fast, and also because our Redis infrastructure has built-in webserver-local caching so that most of the time it didn't even have to call out to the Redis server.

Finally, I made sure that the current state of the game would both be available via an AJAX route, and be regularly broadcast over websockets, so the UI has something to display. I was able to easily use our existing websocket infrastructure (which we use to give realtime updates about new questions and similar things), so this required very little work.

Client-side UI

Turning the clock back a few days, on March 20th I also started working on the client, that is, the popup in which you played the game. I usually don't get to design any major new features (we have an awesome design team that does these things a lot better than I ever could), so here was a welcome opportunity to create something new. I started by making this mockup in Inkscape, which looks pretty close to what the final thing ended up like:

You'll notice that the primary color here was orange, and it actually stayed orange until a few hours before StackEgg launched, when I decided to go with something that looked a little less Stack Overflow-y.

The image of the keychain toy with the LCD in it was a quick Blender hack. I'm an amateur in Blender, and if you look closely (and especially if you're a Blender pro) you'll notice a lot of issues, but I think it turned out fine for the purpose.

Most of the animations in the UI (moving vote percentage bars, highlight when you clicked a vote button, etc.) used CSS3 transitions. One of the nice things about working on the April Fool's project is that it's absolutely okay to exclude anything below IE 10, and thus I could rely on transitions being available, which was awesome. They are a much less fragile way to do many things that you'd otherwise have to do in JavaScript.

Speaking of JavaScript, the client was extremely flexible. For example, if during the course of the day we had come up with another stat and another action for users to take, we would only have to make changes on the server, and the client would happily display the new stuff. The client had some default texts, for example for the descriptions that were displayed next to the vote buttons, but the server could override them in case it wanted to.

On March 25th the client was in a state where I could give it to my coworkers to try it out. It worked well, and the most important feedback was that 30-second voting rounds were just too long, and so we reduced the time to 20 seconds.

Animations

From the beginning it was clear that we needed tiny pixelated LCD animations as a nod to the real Tamagotchi. I decided to use a 32x16 pixel screen with three colors, “light” (the greenish LCD background), “dark” (the blueish “LCD on” color), and “medium”, in the middle between the two. Having a third color in addition to the classic on-and-off-only made it a bit easier to draw somewhat recognizable very-low-resolution images. It also came down to a nice two bits per pixel, with the fourth value being transparent.

On March 17th, I wrote a very simple image editor to draw the pictures (using a real image editing program would have made it much more complicated to work with these particular constraints and also to turn the images into a format that I could easily use for the animations). This image editor gave me a 171 characters long BASE64-encoded string of the image bits, which was the serialization format I used to store the images in the JavaScript file.

On March 19th I created a simple programming language to define the animations in, and a compiler that converted programs into an instruction code format (simply a JavaScript array) that the client understood how to play. Why did I create this custom language, and not just write the animations directly in JavaScript? Several reasons:

  • Because I wanted to! Creating a programming language is fun, after all.

  • The animations where smaller that way, although admittedly chances are that gzip would've eaten up a lot of the disadvantage.

  • I needed some way to abstract away pauses. In my little language I could just write wait 300 to pause for 0.3 seconds until the next animation step. But of course this had to be translated into asynchronous execution when the animation was playing. And since wait instructions were some of the most frequent ones, pure JavaScript would have been a little awkward to use.

  • I like to put things into our April Fool's JavaScript for our enthusiastic users to reverse-engineer (which used to happen quite a lot, but in recent years has been a bit disappointing). I thought maybe someone would try to decipher the instruction code and create their own animations that the UI could play. I know of no such attempts, though.

  • Being constrained by this simple language prevented me from being too fancy with the animations, which wouldn't have been fitting for the idea of a tiny keychain LCD.

I tried hard not to make the language Turing-complete, but I failed in the end when I added the GotoTimesVar instruction, which can be abused to create a conditional. The source code of the simplest animation, the one where the StackEgg just idled back and forth on the screen when nothing else was there to be displayed, looked like this:

var eggx 4
var eggy 2

setxy eggx eggy

label walking
    label loopright
        clear
        inc eggx
        setx eggx
        picxy egg
        picxy eggeyes_right
        wait 300
    gototimes 11 loopright

    picxy eggeyes_down
    wait 300

    label loopleft
        clear
        dec eggx
        setx eggx
        picxy egg
        picxy eggeyes_left
        wait 300
    gototimes 11 loopleft

    picxy eggeyes_down
    wait 300
goto walking

By the way, while I created the image editor and the animator fairly early, the creation of the animations went on continuously. In fact I created the final good-bye animation only when people had already been playing StackEgg for more than a day – it was possible to push new animations to the client that weren't in the original JavaScript file (as said, the client was very flexible). I also used this to make some animations “secret”, so a potential reverse-engineerer wouldn't be able to look at the animation until the server told the client to actually play it.

I have created a page on which you can see all animations that existed.

Approaching the deadline

The last few days before the launch consisted mostly of finishing up small but important details like

  • user settings, so fun-haters could disable the StackEgg, and also so that we could remember whether the user has ever interacted with the popup (before that, we would not animate the StackEgg widget in the sidebar; our very strict “no animated ads” policy counts even for things like this),

  • copywriting – there needed to be a help text,

  • compiling a list of example places for all full-hour timezones, in order to display “it is currently such-and-such time on April 1st in such-and-such place”, pre-empting user complaints that it's not April 1st yet (or anymore) where they live – because of the collaborative nature of the game, we had decided to enable StackEgg for everybody for the whole time it was April 1st anywhere in the world, and not, like in the previous years, only while it was April 1st for the particular user,

  • adding my library ByTheWay to enable multiple browser tabs that a user has open on a site to share data that comes in, reducing the amount of necessary server communication,

  • creating the leaderboard page that ranks all Stack Exchange sites by their StackEgg performance,

  • creating missing animations,

  • and general polishing, tweaks, and fixes.

Finally, and a little bit too late (it was already 23 minutes after midnight on April 1st in Samoa), I enabled StackEgg for everyone. And it was noticed quickly: Within minutes, there were over 60 users playing the game on Stack Overflow.

And then Stack Overflow went down.

Self-inflicted DDoS

It didn't go down with a bang; rather we noticed that page loads were becoming slower and slower until the site was unresponsive most of the time. You can read a bunch of details in the post-mortem on our status blog, but it's missing one detail (and deliberately so, because we believe in blameless post-mortems): the fact that the outage was completely my fault.

In my quest to add as little work as possible to the page request (because we want our pages to render as quickly as possible), I achieved quite the opposite. My idea was to load the initial data that StackEgg needed after the page itself had been rendered (because the page iself is what the user was actually came for), and thus I loaded that initial data via an AJAX request.

What I didn't consider is that when you make an AJAX request so close after the page load, the connection between the client and the server is kept alive, because it looks like there may be more requests coming. An AJAX request shortly after page load also didn't strike me as unusual, because I see them all the time on Stack Overflow. But of course that's because I'm a developer, and those AJAX requests are for things like MiniProfiler; page views by normal users don't initiate any immediate AJAX requests.

And so all those persistent connections quickly exhausted the maximum that was configured in our load balancer, and things went south.

In addition, I made two other mistakes: For one, there was no way to tell existing clients to stop polling the server for data (most updates came over websockets, but there was still some AJAX polling). Lesson learned here: If you have recurring AJAX requests in the page, have a kill switch so the server can tell the client to stop. Because people keep pages open a lot (which, of course, they are not to blame for at all) even when not actively looking at them anymore.

Funny thing: There actually was a way we could have done it, and we utilized it later, but in the stress of the immediate situation it didn't come to my mind at first. Remember how I said earlier that the client could load new animations after the fact? It did so by loading and executing a new JavaScript file that added the missing animation to the list of the client's known ones. But of course it was still just a JavaScript file, so who says it could only add that new animation, and not also, say, monkey-patch jQuery's $.get to not make the request if the URL matched a certain pattern?

The other mistake was that the JavaScript was too aggressive in retrying requests after a failure, because I mostly considered this code to be handling bad connections at the user's end, not a series of non-200 responses from our side. And that's why temporarily disabling StackEgg made the situation even worse, because the resulting 404s caused even more requests to be made. The JavaScript throttled the retries back pretty quickly, but the initial retry was too fast.

Thanks to the awesome work of our sysadmin team (special thanks to Nick and Kyle), and through a few quickfixes and then later more thorough fixes in both StackEgg's server and client code, the crisis was averted quickly, and people could finally enjoy two days of playing StackEgg. In total, almost 460,000 votes were cast by more than 15,000 users, and the internet was won 454 times across the network.

And I could finally relax after three weeks. Don't get me wrong, this was a very cool project to work on. But it was also nice when it was finally done.


previous post: Catastrophic backtracking: When regular expressions explode

next post: Hello (Virtual) World: Your first Daydream app

blog comments powered by Disqus