Monday, May 30, 2011

Nuts performance levels

Hey iDevBlogADay,

Since my last blog post I've arrived in the Limbic HQ, Palo Alto, CA. We've also launched our latest game, Nuts!


Today I'm going to write about how we managed to make Nuts! a beautiful 3D game running at 60hz even on the 3GS. I'll go into the details of many common optimizations and try to analyze how much they actually gained us.

Measuring Performance

The most important thing for optimizing performance is a way to measure the performance, and its change as you modify the application.

The tool of choice for us is a little plug-in module we call the performance monitor. It records wall-clock render times, game update times and idle times. Idle time is the period spent between update and the next render, usually sleeping, updating cocoa, etc. You can see a annotated example here.

It's a very valuable tool, because of the way it plots performance of individual parts of the app over time, it helps you correlate performance events with potential causes. In our games, it's only compiled into the developer version, and it can be activated by double-tapping the top left corner of the screen.

30hz vs 60hz?

This is a question I'm very passionate about. Recently at the Santa Cruz iOS Game Dev meeting, the great Graeme Devine mentioned this as well. It's very important that your game runs smooth. Although most users will not admit it, slugishness, unresponsiveness, stuttering in a game are huge factors for instantly putting it away.

Keeping this in mind, when you're at the early stage of a new project, and you have a working prototype, you need to make an important decision: Do you want to go for 30hz or 60hz?

Let's think about this for a second. If you decide to go for 30hz, it means you can have twice as much stuff in one frame, compared to a 60hz game.

Many people will also argue that noone will notice the difference between 30hz and 60hz. They could not be more wrong. It really depends on the game you're making. For Nuts!, we experiemented with both 30hz and 60hz, and 30hz, although it had a very smooth and stable framerate, just didn't feel right. It was less responsive, and it didn't play well. Plus people were more likely to get motion sickness from it, which is a big factor as the game involves a camera that is constantly rotating around a tree. Hence, we knew the game has to be 60hz, and we took this into consideration for all the art and further engineering.

For another game that we're currently working on, it's a completely different story. It is a very different kind of game and 30hz is completely fine. And because we're 30hz, it means we can show more stuff, and at higher quality.

General Engine Design

To start out, I'd like to give you a small overview of how our game and engine is structure. Our OpenGL ES 2.0 engine is really simple and "dumb", it doesn't do any kind of automatic batch sorting. All we do is load models, which are objs plus a set of OpenGL states. There is only one shader that is quite simple and highly optimized to do everything we need.

The problems

In the week before launch, the game actually ran pretty well, mostly exceeding 60hz on both the iPhone 4 and the 3GS. However, we had random stutters here and there, that were really distracting and even could cause you to crash into a branch and lose.

Optimization 1: Vertex Array Objects

At WWDC last year, the Apple engineers recommended having a look at VAOs, as they can lead to a significantly reduced overhead when drawing a lot of batches. Hence, I went ahead and updated our engine. In principle, this is very easy, but there are some pitfalls and the implementation is very unforgiving. If you make a mistake, the code is very likely to crash, often by some form of memory corruption, deep inside the OpenGL code. After it all worked, we even saw a moderate performance gain, but it wasn't anything significant.

However, considering how simple this extension is, and how easy it can be built into an engine, I strongly recommend everyone to use it. There is nothing to lose here. UPDATE: Actually, there is something to lose. Every single VAO takes up 28 KiB of memory. For Nuts!, That's 2.5 MiB just for the VAOs. It heavily penalizes VBO animation. It seems to be a good combination with skeletal animation, though.

Optimization 2: State Caching

Before my final optimization pass, we were already caching many states, so I can't really give any feedback on that. But we basically didn't cache any of the OpenGL ES 2.0 states: Shaders, uniform bindings, uniforms, etc. In every drawcall, we were re-enabling the same shader, loading all uniform locations for that shader, and setting it to the right values. That sounded very much like an opportunity to optimize.

However, after I implemented this, I did not notice any improvement in performance. I don't know if the driver is now "smart enough" to do the state caching itself, but it seems to not have much effect on the overall performance. As such, I would still recommend caching for any of the easy stuff (glEnable states for example), but caching each individual uniform value seems to be overkill.

Optimization 3: Instruments

Instruments is a double edged sword. On the one hand, I love the leak checking and the new driver analysis. On the other hand, I think the CPU, GPU performance monitors, and the driver analysis are mostly useless. You may have noticed that I mentioned the driver analysis in both, that's because while it gives you a lot of cool insights, and it may catch a few bugs, it didn't have a lot of valuable insights into making the rendering faster. For the most part, the things it was very obsessed about didn't have any effect at all. But that may also be because I've been doing this for too long.

Optimization 4: Alpha Sorting

Initially, we rendered the scene kind-of arbitrarily. We would render the tree, the squirrel, then render some transparent effect, then the branches. We were more concerned about depth-correct rendering, than about performance at that point. However, the way the iPhone GPU works, it's actually more beneficial to completely separate the solid from the transparent rendering.

To help implement this, I added a two-pass mode to the engine. The first pass would only allow solid objects to be rendered, and it would complain if any rendering call tries to enable alpha blending. For the second pass, it was the other way around.

This actually helped the performance, especially in the peaks, which were sometimes caused by displaying a lot of transparent effects that would be alternated with solid render calls, like the fireball nuts and their particle effects.

I strongly recommend designing the whole renderer in this way. First, render all solid objects, then come back to render all non-solid objects. And enforce it, in case the artists try to be smart and fancy about something.

Optimization 5: High-level optimizations

By far the most significant improvement was the higher level optimizations. Usually, the performance issues came down to rendering too many things of one kind, or a model that was weirdly engineered to trash the texture cache with every single one of it's hundreds of triangles.

The performance monitor and A/B testing really helped a lot in pinpointing down the causes and fixing them.

Also, often when you're getting stuttering, the performance monitor will tell you that it's because the frame time is just a little bit too long every other frame, so the system keeps missing one out of every few draw events.

One other important thing to note is that once you know what performance target and visual quality level you're aiming for, you should figure out the limits of what you can display, and enforce them. If you don't, players will most definitely take your game, clump up all enemies in one spot, and blow it up in some crazy, unanticipated way that will completely destroy the performance. And it will become a norm. We learned that the hard way in TowerMadness.

Hence, if you implement an effect system that keeps track of and animates effects, also make sure that it has a cap on how many effects it will show, and that it gracefully handles a situation where too many effects are present.

Also, if you were to make a Zombie game, don't just allow unlimited Zombies to spawn. Make sure the numbers are limited, and design the game to work with that number. If the game is only fun through excess that can't be sustained, you should go back to the drawing board. That's also a good lifestyle advise, now that I think about it.

Summary

As you may have noticed, none of the optimizations by itself really did the job alone. It was the mix that made our game run at 60hz no matter what the player does, even on the 3GS.

There are also many things left to optimize. Like the math library is completely not optimized. But there is no need for that, as it's not the bottleneck of the game. Optimizing it would probably take a long time, and only reduce the total CPU usage by 2-5% that's what we estimated for Nuts! Having a good profiler helps a lot.

I hope summarizing up my notes on the Nuts! performance tuning process gave you some ideas about what to optimize, and what is probably not worth it, and I hope it makes your life easier in the future. And hopefully mine too, since I thought about this a lot while writing the article.

In case you're there, see you at WWDC! We'll be wearing Limbic shirts most of the time and my MacBook Air has a Yoshi on it, so we're easy to see. Don't hesitate to come over and say hi!

Monday, May 16, 2011

Guest Post: Virtual Game Development

Hey iDevBlogADay,


since I have got very little time, due to the fact that I'm leaving for Palo Alto in a few hours, the start of this years WWDC trip, I have asked my fellow Limbic co-founders Iman and Arash to write a little guest post. They're writing about the problems we face as a company working in two timezones with 9 hours in between, and being almost "purely virtual". Here it goes:


Unlike many startups, Limbic operates as a virtual company. In our case, we collaborate with team members in seven locations across the globe (Palo Alto, Davis, San Diego, Burbank, Germany, the Netherlands, and New Zealand). As one can imagine, operating in this fashion brings many challenges, but in our experience it comes with substantial benefits as well.


In order to support this arrangement, some degree of planning is essential, as meetings across multiple time zones must be coordinated. For our projects, we use a slew of tools for communication and task management:


* Skype, IRC, and iChat for voice and video conferencing

* Skitch and Dropbox for sharing images and videos

* BananaScrum and Lighthouse for project planning and task management

* GitHub for source code hosting and collaborative development

* Doodle.com for scheduling


The most common problem with working across multiple time zones is finding overlaps in the availability of US and European team members to meet. This leads to inevitable late night or extremely early morning meetings. When working on dependent project tasks, we have found it is important to sync up daily and hand off to other team members to ensure smooth and continuous development. If a voice or video meeting cannot be attended for a particular day, individual members communicate their progress via email to the team. Also, because team members aren't able to casually communicate throughout the day and all discussion happens during meetings, they tend to run quite long in order to cover all issues.


One of the difficulties with virtual collaboration is that it can be slower than face to face communication for rapid iteration. We minimize this by using screen sharing, chat, and video conferencing whenever necessary. A tremendous advantage to working virtually is that it allows everyone to work from their own favorite environment (coffee shop, home, etc.). In addition to the environmental benefits, commute time is reduced or eliminated in many cases, allowing more productive time for work. FInally, with no office expenses to pay, the operational overhead of the company can be reduced.


Recently at Limbic, we have moved towards capturing the benefits of a shared workspace by establishing a small studio in Palo Alto as a hub for physical collaboration, while maintaining the flexibility provided by continuing to operate primarily virtually. We also like to bridge the gap between all team members by periodically planning retreats where we all meet up face to face to have fun, brainstorm, and help kick-off new projects.




That's it! I'm rather exited for the next post, as it's going to be one week after we launch our new game, Nuts!

Monday, May 2, 2011

German iOS Developers

Hey #iDevBlogADay,

todays post I'd like to write on a more personal and local note. However, if you're interested in technical stuff, check out my post from two days ago, courtesy of Keith of Imangi.

As most of you guys know, there are a lot of iOS developers and get-togethers/meetings in the US. I never had the honor of attending one, but my Limbic co-founders Iman and Arash spoke highly of them.

I'm German, living in Cologne, and so far I've barely met any german iOS developers, let alone iOS game developers. Even at GDC Europe, it seemed like iOS wasn't that big of a deal yet. However, there are a few big shops, like EA, even in Cologne, and it really riddles me if there are no german (indie) iOS developers, or if it's just hard to find them because there are no proper communication channels.

So, my hopes are that I can find some German (or even European) iOS developers through this channel. Maybe we can even have a few get-togethers somewhere in the region to talk about tech, share knowledge, contacts, etc.

Please let me know (in the comments/by email to volker@limbic.com) if you feel addressed and you're interested!

Cheers,
Volker