Tuesday, October 27, 2009

The Latency Elephant.

When I started in the games industry back in 2000, one of my first major tasks (with the help of one other eager masochist) was to rewrite that company’s graphics engine for the PS2 (while a game was being written using it – a recipe for pain if I’ve even seen one). I designed and built it using the knowledge I’d attained from working in the Mining and Defence industries as well as what I’d learnt in academia. It was basically an object oriented engine – you had renderable objects that contained a lot of information about themselves; their state, size, orientation and position, references to vertex and texture data etc. These renderable objects were stored in a fairly flat hierarchy and DMA chains were constructed from the visible objects and used for rendering on the PS2. It was a simple engine (I don’t believe in overcomplicating things) and it did its job well enough, but over the years it increased in functionality and the performance demanded of it increased also.

image

The obvious bottlenecks were optimised and eventually what we were left with was an engine that, while functional enough, just didn’t run fast enough. Profiling at this point showed no tightly contained bottlenecks, it was as if a miasma of inefficiency was spread throughout the rendering system – everything was just a little bit slow. I bit the bullet and completely rewrote it, pulling out the object oriented sensibilities and replacing everything with flat homogenous arrays – the static world was still rendered in parcels, but each parcel was a lean set of arrays of bounding boxes, DMA chains and data already prepared sending directly to the HW. With data neatly laid out in such a fashion performance leapt by an order of magnitude – a test scene I was was profiling with went from taking 17ms to render to 2ms. Loading of the data from disc also sped up dramatically. Note that this was an engine that had been optimised and improved over 5 years.

I had to learn a lot to extract this level of performance – I had to understand how the I and D caches worked, how the compiler transformed code, and how the data flowed through the hardware and software I was using and writing. I also learnt that in order to gain a high level of performance you will probably have to throw away your OO design and replace it with a design that considers data and the flow of that data as a primary concern.

This is even more evident in today’s machines – it can cost up to 600 cycles to extract a piece of data from outside of the L2 cache on a Power PC processor! Do you have any idea how much processing you can do in 600 cycles? In order to extract a high level of performance, a programmer *must* consider the data over the processing of that data. If your data is not in cache friendly coherent streams then it doesn’t matter how few cycles your code takes to execute, all that matters is how fast you can get your data to your instructions. Precaching your data helps, but you still have to be able to look 400 cycles or so ahead to ensure that the required data is ready in the cache when you need it.

This isn’t a new problem, but it is one that has been slowly creeping up on us. In the 80’s we had the pleasure of access to main memory being in the order of a single cycle or so – obviously the focus on design in such a system is on the instructions. Do you know what was written in the 80’s? C++ (well, started in ‘79 but first released in ‘85). Since the 80s CPU speeds have been increasing by 60% per year and memory performance has relatively crawled along at a measly 10% increase in performance per year.

CPU_Memory_Comparison

What this means is that this problem will only get worse. Adding extra levels of cache will help, better and bigger caches will help, but in the end you still need to get your data from the relatively slow main memory into your pipeline. And if you want your system to perform well, you will need to think very carefully about where the data that you want is, how much there is of it and how long it will take to get it.

The reason that OO design is so bad for modern (console) architectures is that it treats data and code as being equally important. Bundling up all the associated data into a single contiguous chunk may be convenient for debugging and for your traditional OO programming mind set, but it will run badly. You are far better off allocating this data into homogenous pools (avoiding heavy malloc() calls is always a good idea anyway) or at least keeping the data that is used together contiguous (spatial and temporal locality of data is a necessary goal here).

The other benefit of considering data in this manner is that it becomes much easier to parallelise. Your code is generally simpler (it is doing fewer things at once), dependencies are more obvious and functionality more delineated (making it easier to break up into independent tasks). You also know what you will be doing 400 or 500 cycles in the future so prefetching becomes easier too. Not to mention the ease of the migration of this code to SPU (assuming you have them).

There is still a place for object oriented design in games, most definitely. C++ provides some very convenient ways to manage large systems of code, and 80% of your codebase isn’t going to be the bottleneck anyway. Its the 20% that gets executed 80% of the time that you need to worry about. If you aren’t clear on how data will flow through your system, or can’t know how it flows, then by all means build that system in an OO fashion, but be aware that you may (will) have to rewrite this code at a later date. Keep an eye on the data in the classes, be aware of how this data is used, note which data is used the most – and when that system becomes a bottleneck, refactor it so that it works efficiently under the hood. If your design is adequate then you should be able to maintain a similar interface and protect the rest of the game code from too much disruption. But, in order to make things easier on yourself, you should be considering the design of your data over the design of your code and you should be doing it now.

Some of the game development industry’s top programmers have been talking (and in at least one case, ranting) about this for years. Christer Ericson talked about it in his GDC 2003 presentation. Mike Acton persistently proclaims that C++ programming is Bullshit and his Three Big Lies are fundamentally about designing around data instead of code. Recently, Noel Llopis published an excellent article on Data-Oriented Design in the September issue of Game Developer magazine. image

Memory access speeds have been the elephant in the room for years now, but now either the elephant is getting bigger or the room is getting smaller. Either way, we can’t afford to ignore it anymore.

Sunday, May 03, 2009

Remote Control

As the game development industry is getting older, so are it’s employees. These employees have families that are becoming increasingly important and these same families are putting extra demands on the time and even the location of these maturing developers. One increasingly appealing option is to work remotely – you get to spend more time with your family and spend more time at work too – you get your cake and eat it too. But trust me, its not all that simple – sometimes there is a little too much cake.

bloodymesscake1thu

I’ve spent the last 2 and a bit years working remotely for a couple of companies with half a dozen teams in almost as many time zones and it would have been a damn sight easier to have been working with them in the same building.

“But…” you say, “but you get to code with no pants on…”

“No!” I interject. “Well, yes, but No! Shut up. Listen to me first, here’s why its more difficult”;

Communication is much, much harder. You can’t just turn around and abuse the dickhead behind you who thought that thread safe programming merely meant adding volatile to variable declarations. You can’t bump into the office graphics guru and strike up a conversation that leads to a new, more optimal way of performing motion blur. Emails will be misunderstood and misinterpreted – there is no substitute for face to face communication. You can’t under estimate the power and clarity provided by body language. Communication via instant messaging is a pain in the arse for anything of any detail.

You will get overlooked for meetings and announcements – you are out of sight and out of mind. And the larger the team, the more likely it is that you will be forgotten. Large meetings are awful over the phone – you pick up a lot of ambient noise and different speakers in different positions mean that you hardly ever hear what is going on. Plus it is an order of magnitude more boring not being there. I mean, its often hard enough to stay awake in some team meetings when you are there, let alone sitting in your room alone, eyes closed as you concentrate on the current speaker, leaning back on your comfy chair, thinking about the code you’re working on, or what you’ll be doing after work, or the last episode of True Blood you’ve just watched… well, you get the idea.

Testing your code becomes far more laborious – and it is also far more important that your code is robust. The first person to get blamed for code not working is the person that’s recently checked something in who isn’t there. Which, when you are working remotely, is always you. Here’s the typical cycle for submitting some code while you are working remotely;

You spend a week writing your code. You’ve been very cautious and carefully verify that it works flawlessly with your data. So you go ahead and check out the latest version of the code in the main branch (which has been automatically tested, so you assume that it works) and merge it with your own. You attempt to test it against your own data but realise that you need the latest data. So, you do a grab of the latest data in the art repository, hoping that there aren’t too many extra assets to download. When that finally finishes you munge the new assets only to discover that you need the latest version of the editor to munge all of your data as there have been some fundamental changes. Without the benefit of an office full of machines that can be utilised for a distributed munge, you know that you’re going to be waiting for 4 or more hours before you can test again (BTW, the munge process is sometimes called cooking because the ambient temperature of your office rises by 10C while the machines you do have churn through gigabytes of data). Finally, you test your code against the newly munged data only to learn that it doesn’t work. QA assures you (over the phone or IM) that the latest build is working, so you spend a couple of hours trying to debug your code, fruitlessly. You stagger to bed, say hi to the wife, and sleep the restless sleep of a coder without functioning code. The next morning you spend some more time on your code then chat with various people about your problem only to find out that the version of the editor that you grabbed didn’t work with the version of the code you had and that it was fixed not long after you checked it out (or that you have to roll it back to an earlier version). At that point you check out the working version of the editor, check out the latest version of the code, merge, check out the latest data, compile, fire off a munge, kick the cat, put your pants on and head down the pub.

image

Yes, it can be that painful – I’ve spent days trying to check in working code. Even with continuous integration and a good QA team, the lag between your code and data and the main branch can be troublesome to say the least.

Another issue with coding is that you are pretty much on your own. If you are having problems with a programming problem it is very hard to get someone to help you remotely. Applications like UltraVNC are excellent and I recommend that you do peer reviewed code checkins using something like this, but it is (at least initially) a large intrusion on a co-workers time to get them to remote into your machine and help you with your code. It becomes less of an issue as they become more used it but it is still a hurdle to cross.

Sure, you have more privacy and less distractions – actually, no. You have more distractions – family, a fridge with (theoretically) more food, alcohol, TV, games (not in the fridge), the internet with no-one looking over your shoulder, it’s just started raining and you’ve noticed that guttering is leaking so you get up on the roof to fix it ‘cos you can’t see where it’s leaking when its not raining…and did I mention games? The temptation to work your own hours “when you feel like it” is quite high – I mean, you’re not impacting on anyone elses schedule are you and anyway, you work better at 3am, right? That is until you pull an all nighter, sleep most of the next day and then can’t be arsed to work the night away again and before you know it you’ve lost a day of work.

Time zones complicate things somewhat also. It’s not too bad if you are only an hour or so out, but when working from Australia with the US or UK, you have a very large discrepancy in work hours. My current employer is in the UK and our regular office hours do not overlap at all – in order to communicate directly I need to spend part of my evening working. Sure that means that I manage to avoid watching some crap TV with The Wife(TM), but part of the reason to work from home was to spend time with the family wasn’t it?

One of the hardest things I’ve found is the lack of human interaction. I miss the idle banter you get in an office, the pointless chats while making coffee, the new friends you make while arguing over lunch, the things you learn when asking a co-worker to help you with a coding problem.

So you can see that there is some bad thrown in with the benefits of working remote and pantless. There is a lot of good though – when you hit the zone there is nothing to break you out of it – the flow keeps on going and going…. You can modify your working hours (a little) without affecting your routine too much – an hour here or there to take the kids to the doctor or swimming – and you can easily put in more hours when the need arises.The 5 second commute is awesome.

There are a few things that you can do to help you deal with working remotely.

You must know the team you are working with. Programmers are notorious for not trusting other people’s code. If you’ve not worked locally with a team before I would advise that you meet with them and spend at least a week or two working on site, learning the ropes, appreciating your workmate’s strengths and personalities and, importantly, socialising with them. It is important that everyone understands each other’s sense of humour (or lack thereof) – it helps with textual communication. I would advise that the remote worker and local team have at least one video conference a week – cover what you have worked on, what you will be working on, any problems you’ve been having as well as sprinkling the meeting with idle banter. You need to maintain a social connection with the team – if people like you then they are more likely to respond to your emails earlier and help you more.

Be disciplined with your working hours. Start at the same time, lunch at the same time, try and finish at the same time. Regular work hours will help you to maintain a sense of work life and home life – you need to maintain a sense of separation otherwise you’ll either end up working all the time or alternatively, watching Oprah and Dr Phil instead of working. Take short breaks like you would in a workplace – culture some rituals; for instance make a coffee (not instant mouth rot coffee, make something a little more involved. I use a stove top percolator to make beautiful coffee from beans bought at the local market). This will give you the type of break that you would get naturally at work, and give your brain a little time to work on in the background. Be careful that these rituals don’t become a form of procrastination though.

image

If you have a significant timezone difference, try to schedule some regular hours where you overlap with the team you are working with. Take those hours out of your regular work day but be consistent. If your workmates expect you to be working within their work hours on a regular basis then they are more likely to instigate communication during those regular “cross over” hours. Make sure that your workmates are aware that you are working – running a instant messaging client will ensure that your coworkers know when you are online.

With the problems involved with the latency in checking in and building or cooking your data, the best solution I’ve found is to get the local team’s QA to check in and label munged data and binaries when they test a specific code set. This means that you won’t have to worry about building the data yourself and you remove a level of complexity and another source of potential errors. Also, with the size of modern game’s data sets its often quicker to download 5GB of data than it is to munge it (assuming of course that you have a decent internet connection. If you don’t then I suggest you relocate until you do). Of course this doesn’t work when you are changing the munging yourself.

Be proactive with communication. Answer all of your emails promptly (if you work in a dramatically different timezone then you have the benefit of getting a full days emails when you log in in the morning). Regularly email team members with questions and even simple communication – you need to cultivate your relationship. Maintain an online presence via instant messaging. Don’t let yourself be forgotten. If you are being forgotten for meetings, ring the meeting room yourself. Make sure that management realise how important it is that you are included. I’ll mention it again as its so important; video conference at least once a week. The best remote relationship I had with a team was one where we had a video conference every morning. It was just a short scrum type stand up meeting, but invaluable as far as building a relationship with the team and understanding what everyone was doing and, even more importantly, letting your team mates know what you are working on.

So, if you have read this far, congratulations. This is a big topic, one that I deal with daily and one that I think is becoming more and more relevant to the modern programmer. I recommend you read The Pond for an excellent article on working remotely from the point of view of a manager. I’ve managed a remote worker before and the best thing I can recommend is to call regularly for updates and to catch issues early and to just maintain a base level of interaction, letting the remote worker know that you know they are there and working and that they are appreciated. It is also imperative that the manager sets realistic milestones for the remote worker and gets regular status updates - it is all too easy for a remote worker to drift off into a little corner of the codebase which is incredibly interesting, yet ultimately irrelevant.

Working remotely is hard. It is more work for manager and worker alike, but it can be very successful and rewarding as long as both parties are willing to address problems as soon as they arise. I'm privileged to have been able to work from home for these last few years - my children know nothing different. Daddy has always been there, and if things work out, Daddy will always be there.

Monday, February 09, 2009

The Cost of Redundancy

As I'm still waiting for my new job to kick off (I'm quietly building card cubes - not even halfway done yet), I thought I'd pen a few thoughts on the effects of redundancy on the individuals involved.


This particular redundancy went relatively smoothly for me, I saw it coming a few months ahead and was able to prepare for it. That doesn't mean that I was happy for it to happen - not at all. With any project you work on you form a close bond with the people you work with. They are your friends, your surrogate family - you spend more waking time with them than any other group. You spend years working together toward a common goal; the thrill of building a new project, drunken discussions at 3am over how a certain feature should be implemented to improve the gameplay experience, or how a crucial piece of tech should be rewritten to make it faster, more stable, simpler, smaller, better. You strain your relationships outside of your workplace with neglect - all your focus is on building the next big thing.

It's no wonder that when that gets taken away from you there is a grieving process. Having a game canned is bad enough, but when you lose your game, your friends, and your job, it's even harder. And with the current financial problems in the industry (not just the games industry) it's getting harder to find work without relocating. Moving yourself is hard enough, moving yourself and your family interstate or internationally to a job that you hope is more stable than your last is incredibly stressful.


My first redundancy was like that - I had no idea it was coming. I'd heard a rumour a few days before it happened and refused to believe it. When the axe fell and the entire studio was disbanded I was devastated. I was in the process of organising an extension on my house, I had a very young daughter, my wife was pregnant and all of a sudden I was without a source of income. And all this just before Christmas. That was hard. And the outcome was to uproot my family and move interstate.

This time, however, I was far more prepared and a bit more senior, so finding work was easier. I really feel for the people that have just managed to break into the industry and have been laid off, especially since the market is flooded with good people that have been made redundant from any number of studios that have been recently shut down by EA. Or THQ. Or Midway. Or pretty much anywhere. It's a very volatile industry now. Forewarned is forearmed - keep your eyes open for delayed milestones, delayed milestone payments, sudden dramatic scope changes in the game, groups of experienced people leaving, anything that should make you nervous. It is surprisingly easy to ignore this stuff, to blindly tell yourself that it'll be OK and it'll all work out fine. I know, I've done it myself.

I'm not saying that you should jump ship at the first sign of difficulty, just that when things start to go awry, you might want to sharpen your resume, polish up your LinkedIn profile or talk to mates in the industry and see how their workplace is faring.

Now, there is an upside to redundancies and studio closures, and that's new studios starting up. Give a group of enthusiastic, experienced developers a nice fat redundancy payout and they might just decide to band together and form a startup. Or another cash flush studio may decide to buy up some of the newly available talent and start a new branch of their own in the city where yours closed down. With these new opportunities comes new friendships, new bonds, new projects and goals. On top of that, your old friends will have scattered to the four corners, widening your industry network, potentially providing you with a source of employment in the future (and a place to crash when you go on holiday).

I have no regrets in choosing Pandemic Brisbane for my previous place of employment - I've met some fantastically talented people, made some great new friends and learnt a lot about all aspects of the game development process. I've also learnt a lot about myself, my relationship with my family and what I want to do with my life. I'm proud of what we built, even though it never saw the light of day.

To my friends and ex-coworkers - it's been a pleasure. Good luck with whatever you're doing now - working in the industry again, or looking for work there or elsewhere. Hopefully I'll see you around.

Friday, January 23, 2009

Card Blanche

So, I'm between jobs. I've finished Mirror's Edge (Test of Faith on my first play through), played Wipeout HD until my eyeballs dried out, done some work around the house, reorganised my game collection (by platform then genre, with surprisingly smooth transitions between genres), rearranged the lounge room and have got to the the point where there is nothing left to do other than tidy up my office. I mean, I've sent my work equipment back (I work from home BTW) and have reformatted and reinstalled the HW I have (2 years of application accretion will slow your machine down somewhat, even if it does have 4 cores) but the physical space is still a bloody mess.

In between some boxes of ancient computer peripherals I found a box of boxes of business cards - all up, about 1,000 Pandemic business cards. I think I used maybe 40 cards in my 3 years at Pandemic - why do we need so many? (For the internet rumour mongers amongst you, thats the real reason for Pandemic's troubles - too many business cards per person).

Now, to the point of this blog: What am I to do with these cards? I'm sure there is something cool and geeky that could be done with them (I have one idea that probably won't work), so I'm looking for (polite) suggestions from the internets. Drop me a comment if you have a good idea - if its within my ability to execute it I will.

Wednesday, January 14, 2009

For those of you that are wondering...

yes. I have been made redundant. But that's OK, I feel pretty good about it.
Now I'm free to do something really cool...

[Update: Just to clarify - I'm confirming my own redundancy here, nothing more or less. EA had made an announcement in December that headcount would be reduced. I'm just one of those heads.]

Thursday, December 11, 2008

Beyond the Critical Section

I've been working in a few different codebases in the last couple of years, all of which are supposedly 'Next Generation', and I've been consistently let down with the quality of parallel programming that I've seen. Most programmers seem to just want to slap a critical section around any and all code that has even the most remote chance of possibly being involved in some sort of thread-unsafe behaviour. Or even worse, inserting spinlocks instead.

So, what I tried to do for my talk at GCAP was to provide a simple introduction to the parallel primitives - mutex, semaphore, barrier, et al. - and provide some vocabulary and foresight of some of the basic problems intrinsic to parallel programming. I also briefly raced over some of the patterns discussed in "Patterns for Parallel Programming" and then sped through some lock free programming examples (based on "The Art of Multiprocessor Programming").

When I say 'sped' and 'raced' I meant it - the presentation was 184 slides long and I had 45min to cover them all (it took me 46min - out of all the attendees I think only Yahtzee appreciated the magnitude of that effort (or, more likely, he'd fallen asleep in a previous lecture and had just woken up and thought that spontaneous applause would cover his faux pas)).




My presentation is hopefully embedded above and you should be able to view it embedded there or you can head through to Slide Share and browse it there.

Below are some interesting links that I stumbled across in my research for this talk. Hopefully there's something interesting for some of you in there;

I can thoroughly recommend the two books I mentioned earlier in this post, particularly The Art of Multiprocessor Programming. It has heaps of Java examples, and, as with everything in programming, you learn the most when you code it up yourself.