Monday, March 21, 2011

TowerMadness Retrospective Part 2


Hey everyone,

welcome back! This week, I'm going to write about our decision to go to three dimensions, and how we did that technically.

Why 3D?

As you may remember from my last post, after a few weeks we actually had a fully working 2D game, that was ready to ship, modulus artwork. We then decided to pimp the look of the game, by bringing Arash on the team. At this point, we made the decision to 3D, but it involved a lot of thinking. Some of the core arguments:

  • We were 3 people on the team now, all of them having a degree in 3D Computer Graphics
  • The artist was significantly more experienced in creating 3D artwork than creating 2D artwork
  • At that time, there weren't any 3D RTS games on the AppStore
  • Last, but not least, we love 3D, that's why we studied it. Since TowerMadness was a fun project, we figured we wanna pick what is most fun to us.
There is one key problem with the third argument, being the only 3D RTS game on the AppStore. That is, we got beaten to the market by Star Defense. Another Tower Defense game. In 3D. Featured at a Steve Jobs keynote. And they were using spherical maps. You can imagine we were pretty shattered for a short bit. But we didn't give up, we took TowerMadness and made it more beautiful, faster, and more fun. And now, almost two years later, we're still here and kickin'. Seems like our passion helped us over that hurdle. Our lesson: don't give up, even if (for the moment) it looks like you just got beaten in every possible regard.

Anyways, back to 3D engines.

3D Engine Overview

I'm reluctant to even call what we have in TowerMadness an "engine. What it really is: a very pragmatic collection of utilities for and a wrapper around OpenGL ES1. The features:

  • It caches the OpenGL states, and allows data-driven modification of these through custom material files, like (yes, we love JSON :)
"lightning": {
"tex": "lightning",
"depthmask": 0,
"color": [1.0, 1.0, 1.0, 1.0],
"cull": "none",
"blend": "alpha"
}
  • It can load .pvr textures (by the way, check out my little PVR Quicklook plugin: https://github.com/Volcore/quickpvr)
  • It loads a .texture file, which has additional information about textures, such as filter modes, like
{
"type": "pvr",
"file": "landmineA",
"mag": "linear",
"min": "linear_mipmap_linear"
}
  • It loads .vbo files, which are nothing more than externally pre-compiled vertex and index buffer objects that get loaded straight onto the GPU. We wrote a little C++ tool that loads the .obj and outputs the .vbo file
  • It loads .model files, which combine all of the above, linking a material with a vbo file, like:
"landmineA": {
"vbo": "landmineA",
"material": "landmineA"
}
  • It supports a point sprite cache, where you can queue up point sprites with certain materials, locations, sizes, and they get all rendered in one batch per material later
  • It doesn't free anything. Since the RAM and GPU footprint of TowerMadness is really small, we get away with that
  • Everything is handle based. This turned out to be great. If a model couldn't be loaded, or was forgotten to be loaded, it didn't crash the game. Rather, it always showed a little colored cube. In one instance, we forgot to add a .model and .vbo file to xcode, and the game shipped without the flamethrower level 3 model. Instead, it would show the "flaming cube of death" (see below), as it's fans called it. Imagine it had crashed the game instead.
And that's it, our "Engine". In a nutshell, to the game code it's not more than PGL_loadModel and PGL_drawModel, with a lot of OpenGL transforming and matrix pushing. But that's all it needs. There is no in-engine culling (although we experimented with that, more on this later), no post processing, no lighting, no multi-texturing, no particle systems.

And that's a little bit odd. When thinking about 3D engines, it always seems this magical thing that has so many fascinating features. But it often clutters the most important feature, which is to place a certain model at a certain position. And that's what you need most of the time, for a regular game. As such, it should be optimized to make it very easy to do that.

Pipeline

To make it easy to place objects into the game, having a good art pipeline is extremely important. This includes every step the artist needs to do in order to put his new model into the game. This is so important, because the longer the pipeline, the longer iteration times.

As such, ideally, you'd have something where a plugin in the DCC tool exports the object right into the game. But since we're indies, and we normally don't have time or money to develop these tools ourselves, we need to improvise. And since we're agile, we start out somewhere, and then iterate with artist feedback until it's good enough.

Initially in our engine, the artists had to export their models as obj, then convert it to .vbo, create a .model, convert the texture to .pvr, create a .texture, create a .material file, then add everything to xcode and add the PGL_loadModel/PGL_drawModel commands. That's a lot of steps, especially considering that most of them will be identical. That's what we realized (way too late in the development, though), and optimized this a little. The .model, .texture and .material files looked the same for most of the objects: solid, trilinearly textured models, and the filename of the texture and vbo are the same, mod the file extension. Hence, we made those files optional, and saved the artist a lot of work, while also avoiding a lot of very small files.

Performance

I'm a performance fetishist. At least to some degree. I can't stand if a game doesn't run at 30/60 hz (depending on the genre). I will push the team for days to find the cause of stuttering, slow downs, etc.

But I'm also refraining from low level optimization as much as possible.

I believe it's much more important to properly bound and control the amount of stuff rendered, than to get the last 5% out of the rendering code, to be able to render 5% more stuff on screen, especially if it comes at the cost of ease to use and maintainability.

This is what went terribly wrong in TowerMadness. Not only were the new levels constantly pushing the size, number of spawn points (and hence number of aliens), build-able spaces and doodads on the screen, and not only was the art constantly getting a few more triangles here, a few more triangles there. We also defied all reason, and added an endless mode to the game. Before the endless mode, every level was limited in time and complexity. Because we knew how much money you were getting in total, we knew how many towers you would be able to build at most. And we knew how many aliens were there at most, because even if you slowed all of them, there was a finite and usually small number. But we (and the fans) wanted more. More towers, more aliens, more everything. So we added that endless mode, and suddenly there was no upper bound anymore. The players could send wave over wave, slow them down, populate large maps with many many towers, until everything was maxed with (rendering and cost) expensive nuke towers. As such, we're regularly getting complaints about the game becoming unplayably slow after some absurdly high wave that the game was never designed for, in endless mode. To date I'm not sure if adding the endless mode was a good idea.

In general though, for TowerMadness the main performance problem in rendering is that we have a lot of stuff on the screen. There are many doodads (trees, barn, sheep, etc), aliens and towers. Just to give you some estimated numbers, I think there are about 40-50 tree groves, 1-50 towers, 0-400 aliens and 10-20 doodads on the screen. And the opportunity for batching is very limited. However, I think the most important reason for why the game runs as fast as it does is that we're doing a lot of "smart" batching. Eg. when rendering the towers, we first render all the tower bases, then render all the towers of one type, towers of the other type, etc. That way we minimize the state changes, but without actually performing any form of sorting on the engine side. It's all pre-sorted.

We actually spent quite a while on optimizing. One other approach was to add frustum culling to every PGL_drawModel call. While it worked and culled a lot of stuff, the problem was that the generic culling was about as expensive as many of the rendering calls, which gave us no speedup at all. In worst case (fully zoomed out), the game was running at about 50% of the speed of without culling.

Then we did a "smarter" culling approach, where we just cull certain things, like towers, which have several draw calls. This actually gave a nice speedup.

The main problem, though, were the trees. Here we tried several things:

  1. Just render each tree grove separately
  2. Same as above, but with frustum culling
  3. Put all groves into one Vertex Array (VA) and render that (preprocessed, avoids state changes)
  4. Put all groves into one VBO and render that (preprocessed, avoids state changes)
  5. Put all groves into one VBO with dynamic culling (on the fly)
I sadly don't have the numbers anymore, and they'd probably be outdated by now, because it was optimized on my old iTouch2G, with an MBX. There is one curious result though. On the MBX, using a VA was as fast as using a VBO, even if uploaded as a preprocessing step and then not changed in any frame. Apparently, the VBOs were _retransmitted every frame_ on the old MBX hardware! This is (luckily) no longer true for the SGX.

Eventually, we ended up using the second technique, rendering every grove separately and using a simple form of frustum culling.

3D Engine: The Bad

We have a very curious thing that comes up every once in a while. Some players, and reviewers, complain about how we're not really using the 3D engine to do crazy things. Apparently, having a 3D engine means to them that it has to be used. However, I beg to differ. Eventually, I think it didn't do us much harm.
The other problems with 3D is that it's most likely going to be slower and more work than making a 2D game, so I guess you need to know what you're doing and you should like to suffer when you do this.

3D Engine: The Good

One thing I haven't mentioned yet is that since we made the game in 3D, it scales up very well to different display sizes. Going to the iPad and later Retina display, we had very little work to do. As far as I remember, it was just UI stuff that needed to be scaled up.
Other than that, we just love 3D, so that's why we did it.

Would I do it again

Yes. I think it turned out really well, and matched perfectly to the skills of the team. And remember that this was designed for the iPhone hardware of the first generation. With the A5 out now, this is going to be a wild place. We've done some very cool stuff in our upcoming titles, but I can't talk about that. Not today :-)

Next time I'll be writing about my little AppEngine-bag-of-tricks and how I got the TowerMadness server to scale to millions of requests per day.

Monday, March 7, 2011

TowerMadness Retrospective Part 1

[Note: Blogger owned me a little bit when I tried to put the formated post in here, so the version that went up on iDevBlogADay had some issues. They've since been edited]


I’m excited to finally be able to contribute to this collective of great iPhone indie wisdom! Over the next weeks, I’ll talk about some technical decisions and features of our first game, TowerMadness, and how they turned out in the end. While this post is meant to give others with similar design decisions more input, it also helps me to realize what was and is going on.

But first...

…let me give you an idea of who I am, and what qualifies my to write on the #iDevBlogADay list. I’m Volker, co-founder, Shogun of Technology, Emperor of Web Development and King of Game Design at Limbic Software. In short, I’m the one to blame if any of our apps doesn’t perform well, either on the performance, stability or gameplay side. I met the other two co-founders while doing my master’s at UCSD (for which, after 2 years, I finally submitted a paper, yay!), where the company was born in early 2009. After a very short PhD period in late 2009 in Aachen, I’m also proud to be able to call myself a college dropout. It was at the time when TowerMadness was starting to get really successful, and I had the chance to become a full-time indie gamedev. Now, in 2011, we’ve been to #1 on the App Store, earned many awards, and we’ve got more than 5 million downloads. What a crazy journey.

TowerMadness

Our first game, TowerMadness, was initially really just a little fun project, thought up by Iman and me on surfboards on the La Jolla beach. Because of time constraints (60h/week M.S. thesis), we were mostly working on it on the weekends, after surfing. Inspired by countless Flash and Warcraft 3 Tower Defenses we had played, and a passionate loved for them, after only a few weeks, we actually had an almost complete game.

The iTD Prototype

We lovingly called it iTD, and it was a lot of fun. But the problem was that neither Iman nor me were artists at all (I gloriously painted all the tower icons up there in Gimp, the sheep too). Combine that with the fact that Fieldrunners just came out and was taking the Appstore by storm. And although we were up to par in terms of gameplay, our graphics was atrocious. So we got Arash on board, who is not only a great programmer with strong 3D background, but also a great 3D artist, and founded the company and started from scratch.

TowerMadness as you can find it on the AppStore

This was also the point in time were a lot of important design decisions were made.
  • The original iTD prototype was written in almost 100% Objective C. We tried our best at proper OOP, every little dude had it’s own class. And it was slow. Because we didn’t know the platform very well and wanted to play it safe we decided to go with a minimalistic approach and use C99 to develop TowerMadness. It turned out to work really well, as the performance is great and we never had any issues with crash bugs, even in early alpha stage. Of course, it may have turned out the same with any other programming language, but the end result is proof that C99 worked out just well for us.
  • Back in iTD you were shooting the sheep. However after an intervention by my girlfriend we decided that we don’t want to hurt animals, and started shooting at aliens.
  • Because Arash, Iman and I are all 3D Computer Graphics Masters or PhDs candidates, we decided that making the game 3D would be worth a shot, especially since there hadn’t been any 3D Tower Defense games on the AppStore. I’ll talk about the 3D stuff in more detail in the coming weeks.
  • We decided to launch with only a few maps. The first alpha had 1 map, 3 towers, beta up to 3 maps, 9 towers, the release then had 4 maps and 9 towers. The game now, after almost two years of updating, has something around 60 maps and 12 towers.
  • We started out with a Dreamhost'd php script for the server side, but before launch we switched to AppEngine -- a great move, considering we're getting millions of requests a day now.
In retrospective, we used a very pragmatic and iterative development strategy, which probably was the reason it actually ever finished. At the time we were all still doing something else fulltime.

In the rest of this post, I’ll talk about a special little feature that so far I believe no other RTS game in the AppStore has, the replays.

Replays

Replays are a very interesting aspect of an RTS game. You couldn’t imagine Starcraft 2 or Warcraft 3 without the ability to watch replays of yourself or your favourite players.

The idea is that every game a player plays is recorded, and can be played back. You can then take that replay, and share it on the internet, analyze it with an automated tool, use it for debugging, and to run cheat-proof competitions.

Sadly this topic can get very boring and specific very quickly, so I’m trying to focus on the important design relevant parts and the pros and cons. If you like to know more about this topic, let me know in the comments.

How does it work?

How this works depends on your game. If you have a game with a few entities (eg. classical shooter like Quake), you can get away with just storing the positions of the players, items, bullets in snapshots, and then storing delta updates and important events in between. For an RTS game, like TowerMadness or even Starcraft, this doesn’t work because you have so many units and buildings. It would be an immense amount of data.

Instead, in an RTS game, you only record the game commands. For TowerMadness, that sounds pretty simple, since there are only 4 such commands: Build Tower, Upgrade Tower, Sell Tower, Start Game/Send Next Wave. When playing back a game, the game simulates each and every step that the original game did, and applies the commands exactly at the same times. This process is where it gets really tricky, as every computation needs to be exactly as it was when the original game was played. This synchronicity is one of the key challenges in implementing replays in an RTS game.

Keeping it Synchronous

On the one hand, if you follow a few simple rules, such as “the gamestate may never depend on external factors besides the input”, it seems pretty easy to keep a game in sync. On the other hand, the problem is that every tiny bug has the potential to break the replay feature.

When the synchronicity breaks, the result can be compared to a butterfly effect. What happens is that the game slowly starts to become “weird”. As an example, imagine your computation for which enemy to shoot at an external input (such as rand()). And when you play back a replay, instead of aiming at the little alien as in the original simulation, which gives money very quickly, the replayed game aims at a large alien. That way, the player doesn’t get the money in time to execute the next recorded build command, so the replayed game denies executing that command. And without that second tower, you’re in trouble, and the game will inevitably take a dramatically different and mostly fatal outcome.

To detect such issues instantly, every few frames, and every time we execute a command, we compute a checksum of the game state. This checksum is then verified when the replay is played back. The checksum doesn’t tell you what goes wrong, but it tells you _that_ something went wrong. And this is usually a great indicator for a bug somewhere in the code.

One method I use frequently to keep the replay feature sane is to run the replay at the same time you’re running the real game during development, in lockstep, and triggering a breakpoint if the game goes out of sync.

What can you do with it

There is the obvious feature of giving players the opportunity to watch their own, their friends’ and other peoples games to learn about other strategies. Some usually counter this by saying that the replay feature makes the game boring, because you know the “solution” to a level. While this is true to some degree, I don’t see why players shouldn’t be able to see the right solution if they want to. And the replay feature stirs the competition. We’ve had many people on our leader boards that push the strategies forward, beating each other by just a few points a time.

Another interesting feature is that you can analyze the uploaded replays on the server side, to learn about popular towers, easy and hard maps, and compute statistics. This is what we did for the first year-or-so. But after a while it became apparent that the sheer amount of replays submitted was too much even for my quad-core i7 running 8 instances of the stats bot at the same time. So now we actually compute the stats on the client side and submit them with the replay. We can then randomly and selectively verify if someone is cheating, especially if they have high scores on the leaderboards.

As we’re all gamers, and love competition (I’m totally into Starcraft 2 as you can see from my account) another interesting feature is running competitions. We did so, by releasing new maps for which the public replay option was disabled, and everyone could submit their replay for a week. After the week, whoever got the highest legit score won. With replays you can actually verify that every game submitted was legit. Whatever “cheating” you may do, it has to result in a game that is played back correctly.

One of TowerMadness’ key features is the competitive play. Because the leaderboards are the heart of the competitiveness, we regularly clean them from any cheating attempts using a little verification bot.

Replay Cons
Here is a summary of what is not so great about planning and implementing a replay feature:
  • Players can get easily confused if the replays are out of sync. In general, I would say that no replay feature is better than a broken replay feature.
  • It can be really difficult to get this feature right, especially regarding the non-reproducibility of IEEE754 floats, especially with math libraries and trig functions that have tiny differences on many platforms. I’ve managed to get our code working on iOS (all devices), MacOS 32 and 64, and Linux x32 (x64 had a few issues), but it was a long non-trivial process.
  • It took us about half a year post launch to iron out all the issues from the replay feature.
  • If you work on the game code with several people, it gets even harder to educated them about writing synchronous code. Here, it really helps to have MS/PhD colleagues though :-)
Replay Pros
And here is the reasons why you would want it anyways:
  • Although the replay feature itself is somewhat hidden in TowerMadness (we could’ve done a better job at integrating it), it still has a profound impact on many aspects of the game and game design. Additionally, players that know about it love it.
  • The replay feature is a strong driver for the competitiveness of our players. We’ve got people that battle it out on our leaderboards for days.
  • Replays can be a great help balancing the games, by looking at how players play the game, and detect any abuses, too strong or too weak game mechanics.
  • Replays enable cheat-proof competitions. However, eventually these didn’t really matter, and didn’t drive the sales any more than the leaderboards already did, so we abandoned them.

Was it worth it?

Now, eventually it all comes down to these questions: Was the replay feature worth it’s time and effort, and would I implement it again?

I must confess I am a little bit biased here, because I’m also running a broadcasting service for Warcraft 3, Waaagh!TV, where we take the replay data and broadcast it to thousands of viewers in real-time. As such, I probably value this kind of technology a little higher than it is to the average person.

However, I do believe that the replay feature has an impact on the success of TowerMadness. It’s one of the most competitive Tower Defense games on the AppStore, and the competitiveness is amplified by giving the players more than just a score they have to beat.

Replays are also a valuable tool for the development process, finding bugs and assist the game design by helping you understand how players play your game.

As such, I would say that the feature was definitely worth it for us, considering our focus was on creating a competitive game.

Conclusion

Wow, that was a pretty long post. I hope it didn’t bore you guys too much, in case anyone actually even reads all the way down to here. Next week I will write another tech retrospective, probably about our 3D “engine”.