tag:blogger.com,1999:blog-363483972024-03-13T11:49:21.516+10:30Seven Degrees Of FreedomProgramming without pants.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.comBlogger76125tag:blogger.com,1999:blog-36348397.post-30687244728945816382011-10-22T08:09:00.002+10:302011-10-22T08:09:55.807+10:30The Final Entry?Those of you who regularly read my work here may have noticed that I've not been posting very often (an understatement to say the least). I now have a company blog (my own company - <a href="http://overbyte.com.au/">Overbyte</a>) and I'll be posting regularly there for the foreseeable future and this site will be only used for posts that don't really fit the company profile.<br />
<br />
I'll also be revisiting some of the more popular posts from this blog there and updating them there.<br />
<br />
So, thank you for reading my work here and I hope to see over you on the <a href="http://overbyte.com.au/">Overbyte </a>site. The first <a href="http://overbyte.com.au/2011/09/03/optimisationmasterclass1/">blog entry</a> is up there now - the first in a series of articles on optimisation based on the Masterclasses I taught in Paris and Shanghai this year.<br />
<br />
I hope you like it.<br />
<br />
-TonyAnonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com4tag:blogger.com,1999:blog-36348397.post-65956274373197659782011-03-14T23:56:00.002+10:302011-03-15T00:06:48.598+10:30The Root of All Evil<span class="Apple-style-span" style="font-family: Georgia, serif; font-size: 14px; line-height: 22px;"></span><br />
<div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">Your game is running too slow – what should you do? Should you wait until all features are in place and then throw your smarty pants engine coders at it then? Should you dedicate resources to optimising the game now? Or should you have considered performance much earlier and risked the Wrath of Knuth?</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;"><span id="more-1956" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"></span></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">When writing a game you have a number of responsibilities to meet as a coder. Your code must be</div><ol style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; list-style-type: decimal; margin-bottom: 1.5em; margin-left: 1.5em; margin-right: 0px; margin-top: 0.5em; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 20px; padding-right: 0px; padding-top: 0px;"><li style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">Correct,</li>
<li style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">performant, and</li>
<li style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">maintainable.</li>
</ol><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">Correct code is code that does what it is supposed to do. Most programmers are aware of this constraint and it is their primary (sometimes solitary) goal – it’s also the most easily verified. Possible symptoms are crashing, disembodied limbs and an infinitely slow framerate.</div><div class="wp-caption aligncenter" id="attachment_1959" style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-left-radius: 3px 3px; border-bottom-right-radius: 3px 3px; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-left-radius: 3px 3px; border-top-right-radius: 3px 3px; border-top-width: 0px; display: block; font-size: 14px; margin-bottom: 1em; margin-left: auto; margin-right: auto; margin-top: 0px; max-width: 99%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 4px; padding-left: 4px; padding-right: 4px; padding-top: 4px; text-align: center; width: 310px;"><a href="http://altdevblogaday.org/wp-content/uploads/2011/03/poly-acne.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #528f6c; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"><img alt="" class="size-medium wp-image-1959 " src="http://altdevblogaday.org/wp-content/uploads/2011/03/poly-acne-300x168.jpg" style="border-bottom-style: none; border-bottom-width: 0px; border-color: initial; border-color: initial; border-color: initial; border-left-style: none; border-left-width: 0px; border-right-style: none; border-right-width: 0px; border-style: initial; border-top-style: none; border-top-width: 0px; border-width: initial; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 5px; max-width: 100%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" width="300" /></a><br />
<div class="wp-caption-text" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; font-style: italic; line-height: 16px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 2px; padding-left: 3px; padding-right: 3px; padding-top: 6px; text-align: center;">Incorrect code</div></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">Performant code is a necessity for console game programming. Console code should be performant with respect to memory and execution time. Consoles have a fixed amount of memory and cycles available and code which ignores those constraints is, at best, not finished yet. A game will not function on console if it uses too much memory, thereby making it void responsibility number 1. Overly slow code reduces the quality of the game – it is still correct yet there is less for a designer to work with, less that an artist can display.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">Code must also be maintainable. This doesn’t necessarily mean that you should produce code that will be used by thousands for generations to come, but that the code that you have written is understandable and modifiable for another programmer (or even yourself – I still remember coming back to a Space Invaders clone that I wrote in comment free 68000 assembly after a few months away. I cursed myself vehemently over that and learnt that just because you understand it now doesn’t mean that you will 6 months down the track.)</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left;">Most coders I know wouldn’t argue with points 1 and 3, but so many resist (or ignore) the generation of performant code with a passion. Some even seem to actively pursue the production of non-performant code. When discussing the performance of someone’s code, the quote that is almost guaranteed to rear its 30-odd year old head is</div><blockquote style="background-attachment: initial; background-clip: initial; background-color: initial; background-image: url(http://altdevblogaday.org/wp-content/themes/suffusion/images/blockquote-l.png); background-origin: initial; background-position: 0% 0%; background-repeat: no-repeat no-repeat; border-bottom-left-radius: 5px 5px; border-bottom-right-radius: 5px 5px; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-left-radius: 5px 5px; border-top-right-radius: 5px 5px; border-top-width: 0px; font-size: 1em; margin-bottom: 1em; margin-left: 3em; margin-right: 3em; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 10px; padding-left: 15px; padding-right: 15px; padding-top: 10px; text-indent: 2em;"><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">“Premature optimization is the root of all evil”<br />
- Knuth, Computing Surveys, Vol. 6, No.4, December 1974.</div></blockquote><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">You can read it <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.6084&rep=rep1&type=pdf" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #528f6c; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;">here</a> in its original format. Skip to page 268 if you don’t want to read it all – but you should at least read the paragraphs around it to see the context it’s used in. He’s not saying don’t optimise, he’s saying make sure you optimise the right stuff. Far be it from me to disagree with Dr. Knuth – I fully agree with him. Premature optimisation can be bad. Just like premature ejaculation, premature optimisation can leave you with a sticky mess that you’re just going to have to clean up later.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;"><a href="http://altdevblogaday.org/wp-content/uploads/2011/03/premature-ejaculation-1.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #528f6c; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"><img alt="" class="aligncenter size-full wp-image-1966" src="http://altdevblogaday.org/wp-content/uploads/2011/03/premature-ejaculation-1.jpg" style="border-bottom-style: none; border-bottom-width: 0px; border-color: initial; border-color: initial; border-left-style: none; border-left-width: 0px; border-right-style: none; border-right-width: 0px; border-style: initial; border-top-style: none; border-top-width: 0px; border-width: initial; display: block; font-size: 14px; margin-bottom: 0px; margin-left: auto; margin-right: auto; margin-top: 0px; max-width: 99%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" width="320" /></a></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: center;"></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">The big question is <em style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">when</em> is it too soon to optimise code?</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">Quite simply, it is premature to optimise code before you know what it does.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">It is not premature to consider performance when designing your code. You should have an idea of how much time your code will take to execute – you should at least know if it will be a potential bottleneck or not. In the cases where it is likely to be a bottleneck you most definitely should consider its performance during the design phase – in fact it should be a key influence on the design.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">It is however, premature to optimise purely on random program sample hits or obviously inefficient assembly without considering how that code is used.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">I love optimising code. I love the quantifiable results – seeing the number of milliseconds spent in a function drop by an order of magnitude, seeing the frame rate climb back into double figures, seeing god awful code morph into something simple, efficient, neat. I optimise for a living now – my children’s education, clothing and video games depend on me making someone else’s code run fast – and yet I still feel the pull of premature optimisation. I see a LHS penalty and want to fix it immediately, even though it’s in code that is only called twice a frame. I see a linked list and I want to beat it with a stick until it’s a nice sensible flat array. And the less we mention of scene trees the better.</div><div class="wp-caption aligncenter" id="attachment_1962" style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-left-radius: 3px 3px; border-bottom-right-radius: 3px 3px; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-left-radius: 3px 3px; border-top-right-radius: 3px 3px; border-top-width: 0px; display: block; font-size: 14px; margin-bottom: 1em; margin-left: auto; margin-right: auto; margin-top: 0px; max-width: 99%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 4px; padding-left: 4px; padding-right: 4px; padding-top: 4px; text-align: center; width: 435px;"><a href="http://altdevblogaday.org/wp-content/uploads/2011/03/office-space-movie.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #528f6c; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"><img alt="" class="size-full wp-image-1962" src="http://altdevblogaday.org/wp-content/uploads/2011/03/office-space-movie.jpg" style="border-bottom-style: none; border-bottom-width: 0px; border-color: initial; border-color: initial; border-color: initial; border-left-style: none; border-left-width: 0px; border-right-style: none; border-right-width: 0px; border-style: initial; border-top-style: none; border-top-width: 0px; border-width: initial; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 5px; max-width: 100%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" width="425" /></a><br />
<div class="wp-caption-text" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; font-style: italic; line-height: 16px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 2px; padding-left: 3px; padding-right: 3px; padding-top: 6px; text-align: center;">Optimising a scene tree.</div></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: center;"></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">To effectively optimise you need to be able to see the big picture. You need grok the higher level flow of code and data – what happens, how it happens, how long it takes to happen, what data is used and how that data is laid out. Optimisations at the high level can get you big wins with minimal code/data changes – but the big investment is the time you spend understanding that code in the first place. You need to own it – only then can you effectively optimise it.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">As bad as premature optimisation is, last ditch optimisation is worse. Three months before shipping is too late to optimise, you need to do it much, much earlier. Last ditch optimisation is dangerous – the code towards the end of development is as complex as it can get. Even small changes can have far reaching effects and so the larger changes required to make dramatic improvements in performance are often too risky at that stage of development. Once you’ve fixed the obvious bottlenecks you’re left with <a href="http://www.c2.com/cgi/wiki?UniformlySlowCode" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #528f6c; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;">Uniformly Slow Code</a>. The next stage of optimisation is then asset reduction (optimise textures, meshes, numbers of objects etc), followed by feature reduction. The final stage is studio reduction.</div><div class="wp-caption aligncenter" id="attachment_1960" style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-left-radius: 3px 3px; border-bottom-right-radius: 3px 3px; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-left-radius: 3px 3px; border-top-right-radius: 3px 3px; border-top-width: 0px; display: block; font-size: 14px; margin-bottom: 1em; margin-left: auto; margin-right: auto; margin-top: 0px; max-width: 99%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 4px; padding-left: 4px; padding-right: 4px; padding-top: 4px; text-align: center; width: 529px;"><a href="http://altdevblogaday.org/wp-content/uploads/2011/03/BFProfiling.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #528f6c; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"><img alt="" class="size-full wp-image-1960 " src="http://altdevblogaday.org/wp-content/uploads/2011/03/BFProfiling.jpg" style="border-bottom-style: none; border-bottom-width: 0px; border-color: initial; border-color: initial; border-color: initial; border-left-style: none; border-left-width: 0px; border-right-style: none; border-right-width: 0px; border-style: initial; border-top-style: none; border-top-width: 0px; border-width: initial; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 5px; max-width: 100%; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" width="519" /></a><br />
<div class="wp-caption-text" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; font-style: italic; line-height: 16px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 2px; padding-left: 3px; padding-right: 3px; padding-top: 6px; text-align: center;">Image from DICE's http://www.slideshare.net/repii/parallel-futures-of-a-game-engine-v20</div></div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">My advice? Profile continuously. Provide as much performance information as you can to your programmers, designers, artists, QA and producers as soon as you can. All are responsible for the performance of the game, but if they can’t see the effect of their work has on the game’s performance then how can they fix it? Put profile bars on screen, set budgets, log performance and performance warnings against those budgets – you need to see the performance issues sooner rather than later. Performance logs are great for comparing the relative performance between builds or finding performance issues within a part of a level.</div><div style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 14px; margin-bottom: 1.5em; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: justify;">Neglect performance at your peril – it’s an integral part of your deliverable, and just like any problem the sooner you address it the easier it is to fix.<br />
<br />
This post is one of the <a href="http://altdevblogaday.org/">#AltDevBlogADay</a> posts. Check it out for some great Game Dev articles.</div>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-23201343611177313272011-02-12T12:01:00.001+10:302011-02-12T12:01:00.483+10:30The Beauty Within<div style="text-align: center;"></div><div class="MsoNormal" style="text-align: center;">A traditional sculptor works with wood or stone striving to release the art inside the medium. It takes great skill and experience to mold such an unforgiving medium into something of beauty; something that anyone, even those of us who cannot sculpt, can appreciate.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkR1tomcoU7lH3Et7a7u_1MrdjfTAaxDjqPCZHulJXgUX2iDez-Cr7yCB9cmGW47RiDsulmzhki_ZXVN3EeOnL1nc9zG4ufNVgCECjX0nexkJgbOQmZoCU7nCVQyBlzBPuSG11/s1600/Lego.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkR1tomcoU7lH3Et7a7u_1MrdjfTAaxDjqPCZHulJXgUX2iDez-Cr7yCB9cmGW47RiDsulmzhki_ZXVN3EeOnL1nc9zG4ufNVgCECjX0nexkJgbOQmZoCU7nCVQyBlzBPuSG11/s320/Lego.jpg" width="306" /></a></div><div class="MsoNormal" style="text-align: center;"></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div style="text-align: center;">Not all art is immediately appreciable by the artistically uneducated – many (myself included) struggle to see the beauty in particular forms of art. One should, however, appreciate that some can see the beauty, expression and skill within forms of art that others cannot.</div><o:p></o:p><br />
<div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="http://www.beatmuseum.org/pollock/bluepoles.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="163" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQM2PUO0E5xf2NUsgHyFFQ9euUaTJ-1sJgoOvZsenrGMl6PE4OTptY2LFjTeR2BhoHsElrT49MCKHSNRzrDCRLdUIsIxxg_p7S_rsTdrMhoGySc242O6molPDSgRkxj8lPhmeG/s400/BluePoles.jpg" width="400" /></a></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">There is one field which is rarely singled out for its intrinsic beauty. Primarily because it requires an uncommon skill, and without that skill one is unable to perceive any of that beauty at all. It is ironic then that this uncommon field, with all its hidden beauty, does itself produce beauty that can be admired by the unskilled. Much like a master writer will craft a sentence of great beauty which itself describes a scene of beauty.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="http://bit.ly/fh11tD" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="215" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGLxWIbmN8NYrgImcZL0TlM8P45-rZ3gObtjoplr7XSOS_DoaHVP842Ma4Ng_a6lGNQTWJvAFyV-6PA7FyA9QpnZeS_fKZosEtCExHkDTfdOkx_-8RojX5nxZG1x09HydVvDrs/s320/Slisesix.jpg" width="320" /></a></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">The field that I allude to is of course programming, and in particular, console game programming (I use the term ‘console’ loosely here to mean any form of fixed hardware, whether it is an iPhone, a C64, a PlayStation 3 or anything in between). Programmers reading this article will undoubtedly have seen code which they consider beautiful or more likely, will have written code that they themselves perceive as being beautiful. What programmer hasn’t sat back after coding up a new system, and thought “Hmm, that’s pretty nice.”<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">It just <i style="mso-bidi-font-style: normal;">feels</i> good.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">So what is beautiful code? What makes code art?<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">As with all art, the answer is subjective. One man’s masterpiece of meta-templates is another’s pit of recursive despair. One woman’s inner loop of perfectly balanced stall free assembly is another’s indecipherable method of maintenance hell.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3BBZMOYD0eXxvRZRxGcQjM8x_-QAzqym2P3T0uACm_pqI-ytEk3Vo6euxZro06awpiQRn6qJ_4bq6jJbOMwVIuWxKt6uFtymUx-h6y2K2Tcrf_37TKbgvrTJIKL1niyRE0mXH/s1600/Dante.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3BBZMOYD0eXxvRZRxGcQjM8x_-QAzqym2P3T0uACm_pqI-ytEk3Vo6euxZro06awpiQRn6qJ_4bq6jJbOMwVIuWxKt6uFtymUx-h6y2K2Tcrf_37TKbgvrTJIKL1niyRE0mXH/s320/Dante.jpg" width="320" /></a></div><div style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;"><b style="mso-bidi-font-weight: normal;"><i style="mso-bidi-font-style: normal;">Beautiful code is elegant</i>.<o:p></o:p></b></div><div class="MsoNormal" style="text-align: center;">Elegant in exactly the same way that a mathematical solution can be elegant – it is <a href="http://dictionary.reference.com/browse/elegant">gracefully concise and succinct</a>. It does much without seeming to - effortless. There are often many solutions to a mathematical problem, but the most elegant one will often be the one that does it in the simplest way with the fewest steps. Note that the quest for that simple solution may be long and arduous, but once seen you exclaim “Oh, <i>of course</i>!”<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">Beauty in code spills from that same font.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcujiKHP9E1smsNn7_0PBs2tcPKX9YueAi5PCiz6KM8M0WfXdJF6CVIBSITFMOVhW305chmpdf9m80Kszj2MHUiPIPa-btUG7iRSEAaAYEE6p1Kr13n60HY3hWi58N4ZkPWQb0/s1600/basic.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcujiKHP9E1smsNn7_0PBs2tcPKX9YueAi5PCiz6KM8M0WfXdJF6CVIBSITFMOVhW305chmpdf9m80Kszj2MHUiPIPa-btUG7iRSEAaAYEE6p1Kr13n60HY3hWi58N4ZkPWQb0/s320/basic.jpg" width="320" /></a></div><div style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;"><b style="mso-bidi-font-weight: normal;"><i style="mso-bidi-font-style: normal;">Beautiful code works with its medium.<o:p></o:p></i></b></div><div class="MsoNormal" style="text-align: center;">Computer hardware is designed with a use in mind – each system will have its strengths and weaknesses and beautiful code works well with that hardware, with that medium, making the most of its strengths, compensating for its weaknesses. It works with the grain. Consider if you will, code running on a multicore system; minimal locking, no stalling, all cores running at maximum capacity. A thing of rare beauty indeed.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7fApojZYupjX152UF4KdZcPRLDpRSTT6s1d5ygMeXixNWRV1kz-k25ds9V_oAciUTQJFdjh6e23CWAZ9SR3lWUQUOFKJtNoZBXY3fRHHKRTZLw1SRNHBSXnJ2Ud7epnxi6eZd/s1600/woodCarving.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="264" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7fApojZYupjX152UF4KdZcPRLDpRSTT6s1d5ygMeXixNWRV1kz-k25ds9V_oAciUTQJFdjh6e23CWAZ9SR3lWUQUOFKJtNoZBXY3fRHHKRTZLw1SRNHBSXnJ2Ud7epnxi6eZd/s320/woodCarving.jpg" width="320" /></a></div><div style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;"><b style="mso-bidi-font-weight: normal;"><i style="mso-bidi-font-style: normal;">Beautiful code educates.<o:p></o:p></i></b></div><div class="MsoNormal" style="text-align: center;">It’s rarely the <a href="http://bit.ly/dYnaPk">code itself</a> which is beautiful – no more than the individual brush strokes in a painting are beautiful. The art, the beauty is in the talent that places those many strokes in just the right places in just the right way with just the right colour. True beauty in code can rarely be appreciated at a glance – study and comprehension are required. And study followed by comprehension <i>is</i> learning. Beautiful code shows us how it should be done; it gives us something to aspire to.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1oio7b9L8V9GaboefflZSEV2Urbf2bJ-v6n28UcWKC5o3g1NHTu6AQMfYxmKTSsFIK9cxFTLhkseeQrwPY3aQ1v6odj-WWXwEH-OTB-mMiNja6ewItXFtw5Rme-OwJNLyJCfW/s1600/Dali.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="281" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1oio7b9L8V9GaboefflZSEV2Urbf2bJ-v6n28UcWKC5o3g1NHTu6AQMfYxmKTSsFIK9cxFTLhkseeQrwPY3aQ1v6odj-WWXwEH-OTB-mMiNja6ewItXFtw5Rme-OwJNLyJCfW/s320/Dali.jpg" width="320" /></a></div><div style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">Beauty in code is something to strive for. It is why we love programming; it’s why we dream it, why we neglect our families for it, why we chose this career. But all too often the realities of deadlines, the limitations of the codebase we work within or the fact that it just needs to bloody well work <i>now</i> keeps us from dedicating that little extra time to produce that perfect, beautiful code.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">I posit that we as programmers must pursue beauty in our craft, at least occasionally. <o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
</div><div class="MsoNormal" style="text-align: center;">At worst it feeds our egos, but at best, it soothes our industry-soiled souls.<o:p></o:p></div><div class="MsoNormal" style="text-align: center;"><br />
<i><span class="Apple-style-span" style="font-size: x-small;">(This article is a part of the #<a href="http://altdevblogaday.com/">AltDevBlogADay </a>group)</span></i></div>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-79013093113192222222011-01-28T20:53:00.001+10:302011-01-30T15:05:59.882+10:30The Pain of Debuggery<div>(This post is part of <a href="http://altdevblogaday.com/">#AltDevBlogADay</a> - make sure you head over there and check out the many excellent game dev related posts) </div><div><br />
</div><div>As one of the older guys in the studios I work in (This is for #AltDevBlogADay right? For grumpy old game devs?), I occasionally hear young programmers whinging about how hard debugging is. About about how their debugger doesn't integrate quite the way they would prefer it to, about the complexity of multi-threaded debugging or about how the froth on their latte isn't quite the right consistency. When I hear this type of conversation I saunter over with my strong black coffee, introduce myself, tell them to pull their pants up and then tell them about a job I once had and a debugging session I once survived.<br />
<br />
<div align="center" style="font-family: Arial, Helvetica, sans-serif; font-size: 13px; font-weight: normal;"><img src="http://cache.gawkerassets.com/assets/images/7/2010/03/340x_a499d2be1d1fe4c22c_01.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px;" /></div><br />
This was in a time before I worked in the game industry, back in the early to mid '90s. It was my first job where I was being actually paid to program - I'd been there for a couple of years writing code for mining and some visualisation applications for defence - you know, helping big organisations to destroy both the earth and the humans on it. It was fun stuff - I was working on Silicon Graphics machines and writing 3D graphics in OpenGL - it was as close to writing games as I could get without actually writing games.<br />
<br />
<br />
Then I was moved onto an unusual new task, one that involved porting some PC code to a Silicon Graphics Indy. Now that in itself wasn't that unusual. The unusual bit of this job was that it was code for a program that was to be run in an abattoir. It was designed to take snap shots of the halved carcasses of the cattle as they moved down the production line, hung by their hooves on a hook, and then analyse those snap shots and determine meat grades, highlight any abnormalities in the flesh (tumors, severe bruising or the like) and then file the images and associated data to be used for stored cataloging. The code was awful - written by physicists (I can say that, I used to be one. Physicists tend to believe that you don't need to be taught to code, that you just need to transcribe the maths into code and it will all just work. Anyway, I digress...) The code was full of hardcoded floats, painfully copied out to far too many decimal places, and obscure mathematics written with complete disregard of the physical limitations of floating point numbers on that platform. It was impossible to debug, but I was assured that it was working.<br />
<div align="center" style="font-family: Arial, Helvetica, sans-serif; font-size: 13px; font-weight: normal;"><img height="332" src="http://upload.wikimedia.org/wikipedia/commons/f/f9/SGI_Indy_CRT_Keyboard_Mouse.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px;" width="464" /></div><br />
The Indy that was to be used wasn't hardy enough to be on the slaughter floor where the camera was (apparently they weren't designed to be immersed in blood), so that was kept in a concrete room not too far away. This room had only one window, and just outside that window was where the newly sloughed cow skin was taken and boiled in preparation for tanning. The smell from that neighbouring room was unpleasant, to say the least. This concrete room with the computer in it was the room where I had the fragrant pleasure of programming from.<br />
<br />
A small podium was built on the slaughter floor, onto which the camera was mounted. On the side of the podium (accessible from the ground) was a small LCD display with a keypad on it. This display would show the image of the carcass when it was captured and then flash up analysed results as a text overlay as it was saved to disk in the other room. That LCD and keypad also acted as an interface to the application (connected to the Indy via 20 or 30 meters of cable) - I could tweak the program settings from there as the carcasses moved past.<br />
<br />
Just behind the camera's podium (to my right as I faced the LCD) was where the cows were sliced in twain by a large mustachioed guy with a chainsaw. Yes, there was splattering blood, grinding bones and lots of noise - but that wasn't an issue. As the cow was cut in two, the insides of that cow *usually* ended up on a conveyor belt which shipped the steaming internals off to somewhere unknown. Occasionally, the stomach missed the belt or was split by the chainsaw - usually resulting in me having to leap to safety as the semi digested contents splashed to the ground about me. Or, I'd end up buying new shoes.<br />
<br />
The workers there seemed to take great pleasure in making me as bloody and gore splattered as possible. As the carcasses rolled past on their hooks in front of the computer's camera, there was a guy who would ensure that if the carcass was too damaged that it would be pushed off onto another track where it could be trimmed further (if possible) before returning it to the track with the other meat. This other track happened to pass through exactly where I would stand in front of the LCD, so if I wasn't paying attention then I'd be hit by 300kg of still quivering steak (this only happened once, to the great amusement of those around me).<br />
<div align="center" style="font-family: Arial, Helvetica, sans-serif; font-size: 13px; font-weight: normal;"><img height="364" src="http://images.lightstalkers.org/images/681623/RTR1VBHH_large.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px;" width="520" /></div><br />
So, yes, this was an unusual work environment for a programmer. To make it even more of a challenge, I would have to start work at 4am with the rest of the guys on the slaughter floor. I'd do my 8 hours there, checking readings, collecting data, doing a bit of coding in the stinky room, dodging carcasses and bursting stomachs on the slaughter floor and then drive the hour or so back to my office and code for a few more hours - 12+ hour days were fairly normal (and no, I didn't get overtime).<br />
<br />
I lost count of the number of weeks that I did this for. At some stage I was involved in a 5 car pile up on my way to the abattoir, with me being car number 5. I was fine, but my car and the $14,000 SGI Indy on the passenger seat didn't fare so well. The Indy worked, but not for long - the mother board had cracked when the Indy hit the car floor. Luckily it was still under warranty and Silicon Graphics replaced it for us (don't tell them it was due to the car crash - they might want their money back).<br />
<br />
As the code matured I was needed less at the abattoir, so I could drop in at the start of the day and install a new build, then head back to my office to code more or work on other tasks. There was no internet connection to the abattoir, so the only way to change anything was to drive there with the code on a 3.5" floppy. There were two technicians who were trained to start up the computers and set things running, but they weren't programmers and had no understanding of the OS or anything other than what we'd trained them for.<br />
<br />
One day I was at the office and received a call that the software was crashing consistently. I managed to duplicate the problem locally, and it was a quick and easy fix. But I didn't want to do the 2 hour round trip drive to the abattoir to install it, so I figured that I could "Airport '75" the tech guys and talk them through making the code changes required, compile it and set it running. I mean, how hard could it be?<br />
<div align="center" style="font-family: Arial, Helvetica, sans-serif; font-size: 13px; font-weight: normal;"><img src="http://www.cinemademerde.com/Airport_75-black2.gif" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px;" /></div>The first problem I had was that the mobile (cell) phone didn't work in the stinky concrete room - there was no signal at all. So, I directed Technician 'B' (a guy with no computing skills) to be the phone guy and he had to stand at the doorway of the stinky room and relay my instructions to Technician 'A' who would type in what he was told. This started fine - we managed to get emacs up and running from a console window which was a great start.<br />
<br />
Then the fun started.<br />
<br />
"You need to press Control X Control F directory_name slash filename.cpp"<br />
<br />
"Press what?"<br />
<br />
"The control key - bottom left corner of the keyboard. Hold it down and then press the 'x' key then the 'f' key, then let go of the control key and type in the file name... What can you see on the screen now? Ok, now press Control Meta Cokebottle..."<br />
<br />
...and so on...<br />
<div align="center" style="font-family: Arial, Helvetica, sans-serif; font-size: 13px; font-weight: normal;"><img height="355" src="http://www.ibm.com/developerworks/linux/tutorials/l-emacs/findfile.jpg" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px;" width="459" /></div><br />
This would have been hard enough at the best of times; it was like coding via Chinese Whispers - a noisy environment via a conduit who had no idea what I was talking about relaying my instructions to someone who had only the vaguest notion of what I was talking about. I had to direct the remote programmer to the correct point in the correct line in the correct file, make the changes required and then repeat until the code *should* have been fixed. Fortunately, it was an easy fix that required only a couple of changes - it only took us about half an hour to make the necessary code changes.<br />
<br />
Compilation was easy, "Hold the Alt key and press Enter and that will build it. Wait until its stops printing and then read me what you see on screen". I was hoping for success first time.<br />
<br />
"What all of it?" came the delayed relayed reply.<br />
<br />
"Shit"<br />
<br />
I spent the next hour deciphering the relayed compilation errors, reverse engineering them to try and figure what could possibly be the problem in the first place and then directing the remote programmer to the point in the file that might have been the problem in order to fix it. It was fun in almost exactly the same way that drunkenly falling over and smashing face first into concrete is fun - a moment of blissful free fall when you think you might have gotten away with it, followed by the realisation that this will take a long time to fix.<br />
<br />
But fix it I did, and it only took slightly longer then it would have for me to drive there and back. Which in my mind was a win.<br />
<br />
So next time you're feeling hard done by because you have to use printfs to debug some code, or you're having trouble understanding what a particular error relates to, just remember. It could be worse. Much worse. At least you're not standing ankle deep in blood and bile while you're doing it.<br />
<br />
And what happened to the program I ported? Well I ended up proving that the original code never worked reliably. It was too heavily dependent on the ambient lighting conditions and would give different readings depending on the time of day.<br />
<br />
So, what was your worst debugging session?</div>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com2tag:blogger.com,1999:blog-36348397.post-13961711119802026632011-01-08T15:30:00.004+10:302011-01-08T23:47:56.278+10:30Job Security through Code Obscurity<div class="separator" style="clear: both; text-align: center;"><a href="http://i.dailymail.co.uk/i/pix/2010/01/01/article-1239837-07BC600A000005DC-275_634x431.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="217" src="http://i.dailymail.co.uk/i/pix/2010/01/01/article-1239837-07BC600A000005DC-275_634x431.jpg" width="320" /></a></div><br />
The frantic world of the game programmer is fraught with danger. If I had a dollar for each time I’d seen a programmer thrown from their swivel chair by the force of an aneurysm caused by thinking too hard about their code I’d have almost one whole dollar. As a programmer you must take great care of your brain – it is your primary asset after all. You must feed it carefully with caffeine and chocolate and regularly cleanse it with alcohol. Over use will only wear it out, so here are some hints to minimise wear and tear on your brain, while at the same time, making your code so complex that no-one else will be able to step in and replace you.<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Use objects to obfuscate your code.</span><br />
This one is very important and there are many tried and true ways to do this. Start designing your system by thinking about the problem in terms of real world analogies. Break it down to a finite number of discrete objects (you could try an infinite number but that can take a long time) and then try and find some commonality between all those objects. Don’t worry if there is no common functionality, you can always create some. The best way is to create an empty abstract Object class that everything can inherit from – put a next pointer and some form of ID in there too. Ooh, and add reference counting so that you don’t have to think too hard about memory management (that stuff can be tricky). Best to put as many virtual functions as you can at the base level so you can always call the same methods on objects regardless of their type – everything needs an Update(), right? The side effect of this approach is that code that does similar things to similar data is spread widely through the code base and at run time is spread throughout memory (making it safer from memory leaks. Safety through distribution is the same reason that you never see two nuclear power plants next to each other).<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Specialise via inheritance</span><br />
Imagine you have an object, say a car, maybe a 4 door sedan, and you want to specialise that to have an extra door. Make sure you inherit from that Sedan class and reimplement at least the bits that are different. Extra points for calling the same virtual function in the parent class from that virtual function. The more you inherit, the more files you have, the more it looks like you are working hard, producing lots of code. Don’t forget to add extra functions that might be used in the future and code them with the best intentions (You can leave this for future-debugging).<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4hIlGkiEy-_QfltBE0h2K9hczWE2t6YHJASHAE9o9L1C3c40RfUn4Pf6LkAyDuX4zZVClga4TuzZxL79RoYO0nXekR4VX1pddCOPyMPr_74sFWPmpAphOVW3k8xTqj5Z5-hxZ/s1600/Obj-Hier.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="76" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4hIlGkiEy-_QfltBE0h2K9hczWE2t6YHJASHAE9o9L1C3c40RfUn4Pf6LkAyDuX4zZVClga4TuzZxL79RoYO0nXekR4VX1pddCOPyMPr_74sFWPmpAphOVW3k8xTqj5Z5-hxZ/s320/Obj-Hier.JPG" width="320" /></a></div><br />
This process works very well with utility classes too. If a utility class doesn’t allow you to use it in exactly the way you want to, then inherit from it and change the API. Extra points for reimplementing small parts of the utility via virtuals which will further obscure the code flow. Once you’ve done that, make sure that you forget how to use it and just cut and paste from the one place you initially used it but don’t forget change the variable names as required! If at a later date you’ve changed your mind, just inherit from it again and change it again – that way your old code will still work (probably) and you get a nice new interface.<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Use lots of patterns</span><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj21xNcOeJasY4fPd3kcb2yvIIKU8347CErnha5aQVpOLuxhAukPO0isc8BmGHsqL6hHKL_pd_IIbY9B9TmROPFxJTGJQ_2AGeAH34NtXDqb1VpHnz89aHq6Y7uH0DajD4-URNn/s1600/design-patterns.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj21xNcOeJasY4fPd3kcb2yvIIKU8347CErnha5aQVpOLuxhAukPO0isc8BmGHsqL6hHKL_pd_IIbY9B9TmROPFxJTGJQ_2AGeAH34NtXDqb1VpHnz89aHq6Y7uH0DajD4-URNn/s1600/design-patterns.jpg" /></a></div><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;"><br />
</span><br />
The Gang of Four spent a lot of time and effort writing Design Patterns and you can look really clever by simply implementing a few of their patterns (or even better, copying someone else’s) and then using that code over and over again wherever it might be even mildly applicable. You don't really have to think at all! This has the added benefit of allowing you to use some clever sounding pattern buzz words “Oh, it’s just my implementation of Flyweight Factory with a templatized Memento behaviour.” (That single sentence will guarantee you a raise whilst simultaneously spreading confusion and insecurity). It doesn't matter if you don't really understand what it does or how it works, as long as you remember its name and how to use at least a small part of it. This has the side benefit of allowing you to insert lots of code into a code base. And now it’s easier than ever as you can do it in the style of a “<a href="http://www.jamessiddle.net/docs/cyoa.pdf">Choose your Own Adventure</a>” book, making it fun too.<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Obscure code flow with virtuals and templates</span><br />
The best thing about virtual functions and templates is that they can really obscure the flow of your code, so that the only way someone can tell what is happening is by stepping through your code in a debugger (Note that this includes you too, but hey, you get paid by the hour right?). They do however provide a nice neat interface at the top level, so all you have to do is make sure your code works first time and you’re set.<br />
<br />
Meta-template programming is another great way to write code that no-one understands – use it liberally. Be sure to use templates for what they were designed for – producing indecipherable error messages and bloating code.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCNhOtCVK91FGE3bS7KvN9IAJyn1du6mC3UipemaaofxvVlYlgF0Z79okBrJrBo-9yy4AvOFRJ5sehjQX4OlTaXNAJ_Ujrhvjmro9EDJXOKAH0O_djQ8Vww2lqv9VyFGhRD_6F/s1600/awesome-c_plusplus-template-error.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="246" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCNhOtCVK91FGE3bS7KvN9IAJyn1du6mC3UipemaaofxvVlYlgF0Z79okBrJrBo-9yy4AvOFRJ5sehjQX4OlTaXNAJ_Ujrhvjmro9EDJXOKAH0O_djQ8Vww2lqv9VyFGhRD_6F/s320/awesome-c_plusplus-template-error.png" width="320" /></a></div><br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Future proof your code</span><br />
Make sure that when you design a class that you think about all of the things that it might possibly have to do in the future and write functions for every single one of them. It doesn’t matter if that code is never used or tested, just as long as it compiles. Its future proof and you’ve increased the amount of code that you’ve produced. Also, it pays to make at least some part of your new classes templates because, well, what happens when your processor is 64 bit and you’ve designed for 32? And the best part? With some compilers template code that isn't called isn't compiled! So it doesn't even have to compile and yet it will be future proof!<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Make your code thread safe</span><br />
This is nowhere near as hard as some of the technical engine programmer boffins would have you believe. All you have to do is put critical sections in all of your functions, ensuring that no two bits of code are run at the same time. This is great for high performance container classes.<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Include everything</span><br />
Is there anything worse than adding some code to a class and discovering when you compile that a header was missing? This can be avoided by including everything in every header – like this<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">#include "all.h"</span><br />
<br />
Then, whenever you add a new class, just insert the include into this single header and viola! Problem solved. This has the side benefit of dramatically increasing the time to compile, leaving you time to legitimately surf the web or clean between your toes.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://imgs.xkcd.com/comics/compiling.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="278" src="http://imgs.xkcd.com/comics/compiling.png" width="320" /></a></div><br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Avoid Data Oriented Design</span><br />
DOD is so 2010. For those of you that have avoided the discussions because you were too busy copying and pasting some more variations of your favourite design patterns into your code base or waiting for your latest change to compile, DOD is like a factory production line. You have very simple functions working on data in a very cache friendly manner, resulting in simple code which is highly maintainable and optimisable. This type of code will in no way benefit you, and anyway, learning new ways to think about programming will only wear your brain out even more. If you've finished Uni then you’ve learnt enough and deserve a rest in a high paid job.<br />
<br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Optimise assets, not code</span><br />
You've spent a lot of time (and some effort) building a beautiful system, with an elegant massive inheritance hierarchy, beautifully templatised routines, heaps of ambiguously overloaded operators and with code paths that would make Escher blush; why should you change that code because some non-programmer wants more than 3 particle systems on screen at once? Your code is art too – more complex and beautiful than they can comprehend. Optimisation is the responsibility of the designer and artist – the best games are built under strict constraints, so any performance issues will only serve to make a better game. (Plus, performance related criticism can be avoided by blaming the hardware.) Additionally, this is preparing you for coding on the next generation of consoles where memory accesses will be free and every instruction will be a single cycle.<br />
<br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">If you've written your code well it will be voluminous and uniformly non-performant, showing no obvious bottlenecks, which will result in other programmers leaving your code alone for easier optimisation pickings.</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: large;">Exit Strategy</span><br />
Following the above tips will help you to minimise wear and tear on the old grey matter, while keeping you gainfully employed for almost an entire game's development cycle.<br />
<br />
Good luck!Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com16tag:blogger.com,1999:blog-36348397.post-86774640120709576402010-07-23T09:13:00.006+09:302012-04-22T20:35:10.624+09:30A Question of SortsWhich is faster on the same data; an O(n) sort or an O(nLog(n)) sort?<br />
<br />
Answer: It depends. It depends on the amount of data being sorted. It depends on the hardware its being run on, and it depends on the implementation of the algorithms.<br />
<br />
I was recently doing some research into sorting implementations and their performance characteristics on the PlayStation®3 for my work at SCEE and I came across some interesting results which Sony Computer Entertainment Limited (SCEE ) has graciously given me permission to share with you here. Registered PlayStation®3 developers will be able to download the full source and documentation of my experiments through devnet and the documentation with some source is available from <a bitly="BITLY_PROCESSED" href="http://research.scee.net/articles">here</a> right now.<br />
<br />
Most of you should be familiar with the big ‘O’ notation. Basically it gives you an indication of the performance of an algorithm in relation to some bound ‘n’ – the larger the value after the Big O, the slower the algorithm. <a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/Big_O_notation">Wikipedia</a> has a <a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities">useful table</a> showing the different common time complexities, ranging from O(1) (constant time) through O(n) (linear time) to O(n!) (factorial time). Many of the common sorting algorithms that you are taught at Uni are O(n Log n) or slower with the best, the Radix sort, being O(n). Given what we know about the speed of memory access versus CPU speed, I thought it would be interesting to how different sort algorithms compare on console hardware. So I implemented about 17 different sort algorithms (some implemented from code found on the Internet and some from scratch) and ran them on different amounts and sizes of data then examined their performance on the PlayStation®3. <br />
<br />
<h3>
The Basic Sorts</h3>
First up, I looked at some basic sorts – all working on 65,536 contiguous floats <br />
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXaFKe9jI/AAAAAAAACEU/ReRf3lw9wOA/s1600-h/image12.png"><img alt="image" border="0" height="274" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXa5KlTOI/AAAAAAAACEc/m7MeWajYzN8/image_thumb6.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="362" /></a><br />
For a full discussion of all these sorts (and more), see the SCEE white paper, but for now it’s worth noting that the standard qsort (70ms) performs particularly badly, bitonic (90ms) is awful (due to lots of small recursive function calls causing LHS penalties) and even std::sort (35ms) isn’t particularly good. The iterative implementations of the quicksort algorithm (26ms) and the iterative merge sort (23ms) do perform the best from this set. It is also worth noting that the Shellsort (47ms), which is O(n<sup>2</sup>), is faster than the qsort() which is O(nlogn).<br />
<br />
<h3>
The O(n) Sorts</h3>
Next, let’s look at the radix sort. This sort is a very impressive algorithm, and I would recommend that all programmers have a look at how an orthogonal approach to a problem can provide very impressive results. There is a great explanation by Pierre Terdiman <a bitly="BITLY_PROCESSED" href="http://codercorner.com/RadixSortRevisited.htm">here</a> with an improvement on the algorithm implementation <a bitly="BITLY_PROCESSED" href="http://www.stereopsis.com/radix.html">here</a> by Michael Herf, on which my code was based.<br />
<div align="center">
<a bitly="BITLY_PROCESSED" href="http://lh4.ggpht.com/_hk4pJ9VK3zU/TEjXblJp1II/AAAAAAAACEg/wv77UfMQzSU/s1600-h/image201.png"><img alt="image" border="0" height="208" src="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXcbudMXI/AAAAAAAACEk/czsrXtwIIzM/image_thumb10.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="342" /></a> </div>
The Radix 8 sort (4.9ms) uses a histogram of 2<sup>8 </sup>entries and sorts the floats as raw bitfields, honouring sign by performing selective bitflips as explained in the second link mentioned above. This means that the source data is passed through 5 times, once for the histogram and then 4 times, once for each 8 bits. The radix 11 sort (3.6ms) uses a histogram of 2<sup>11 </sup>entries and only has to pass through the source data 4 times, once for the histogram and then 3 more times, once for each 11 bits. The times are impressive, less than 5ms for the radix 8 compared to 23ms of the iterative merge sort. Still, you’d expect that an O(n) sort would be faster than an O(nlogn) sort wouldn’t you?<br />
<br />
<h3>
Improving the Merge sort</h3>
The merge sort is a nice simple sort. It simply ‘zips’ two sorted buffers together to produce a sorted buffer of twice the size. It reads from both input buffers in a linear fashion and writes to the output buffer in a linear fashion so it is quite cache friendly in that respect.<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> template <class TYPE>
void Merge(TYPE *bufferA, TYPE *bufferB, TYPE *dest, int m)
{
TYPE* bufferAEnd = bufferA+m;
TYPE* bufferBEnd = bufferB+m;
while (bufferA < bufferAEnd && bufferB < bufferBEnd)
{
if( *bufferA < *bufferB )
*dest++ = *bufferA++;
else
*dest++ = *bufferB++;
}
while (bufferA < bufferAEnd)
*dest++ = *bufferA++;
while (bufferB < bufferBEnd)
*dest++ = *bufferB++;
}
</code></pre>
<br />
<br />
Now we know that cache is important, but once the cache has been considered instructions can become the bottleneck. If we can optimise the above code to do more than one float at a time and minimise branching then we should see an improvement in performance. <br />
<br />
The first step is to take as much unsorted data as we can and sort that into sets of sorted floats. A simple way to do that is to take 4 qwords at a time (16 floats) and sort their data in columns using vec_min() and vec_max() (I’ll use VMX instructions here to make it more generally useful. Porting to SPU intrinsics is straight forward - VMX documentation can be found <a bitly="BITLY_PROCESSED" href="http://www.freescale.com/files/32bit/doc/ref_manual/ALTIVECPIM.pdf">here</a>). Those columns can then be transposed to provide 4 qwords that contain four sorted floats each.<br />
<br />
Those internally sorted qwords can then be merged using a sort merge algorithm first mentioned in a <a bitly="BITLY_PROCESSED" href="http://www.trl.ibm.com/people/inouehrs/pdf/PACT2007-SIMDsort.pdf">paper from 2007</a> describing an Access-Aligned Sort (AA-Sort) algorithm. To explain it I’ll use sorting network diagrams.<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;"><br />
</span><br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Sorting Networks</span><br />
<br />
A sorting network is an intuitive diagrammatic way of describing a sorting algorithm (it’s another cool example of using diagrams to simply describe complex mathematical or algorithmic functions, just like <a bitly="BITLY_PROCESSED" href="http://en.wikipedia.org/wiki/Feynman_diagram">Feynman diagrams</a>). The diagram below shows a sort network which takes 4 unsorted inputs and sorts them.<br />
<br />
<br />
<div align="center">
<a bitly="BITLY_PROCESSED" href="http://lh4.ggpht.com/_hk4pJ9VK3zU/TEjXdAI7EII/AAAAAAAACEo/hugDMnsxKmo/s1600-h/image13.png"><img alt="image" border="0" height="152" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXdgyyaPI/AAAAAAAACEs/lc7Wg9VhToU/image_thumb11.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="555" /></a><span style="font-size: xx-small;">4 element sort network - Diagram from Wikipedia</span></div>
<br />
Each horizontal line is an input, each vertical line is a test and swap if the values are out of order. Taking it a step further, lets look at a sort network for 2 sets of 4 sorted values – I prefer to draw the sort networks on their side, running from top to bottom so that I can transcribe code easily beside it.<br />
<br />
<div align="center">
<a bitly="BITLY_PROCESSED" href="http://lh6.ggpht.com/_hk4pJ9VK3zU/TEjXeRX02vI/AAAAAAAACEw/0xKRvnwO_GQ/s1600-h/image19.png"><img alt="image" border="0" height="323" src="http://lh4.ggpht.com/_hk4pJ9VK3zU/TEjXfk1lMnI/AAAAAAAACE0/UGuv85QGdbI/image_thumb18.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="246" /></a><span style="font-size: xx-small;">8 element sort network</span></div>
<br />
The inputs at the top correspond to 2 quad words, X->W and A->D, and both of those already contain sorted values internally. The horizontal lines now represent compare and swap and the output at the bottom will be 8 sorted floats. Note the symmetry of the top 4 compares – each element in the first quad word is compared to the corresponding element in the second quad word. The following VMX instructions will perform that for us <br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> vec_float4 newLeft = vec_min(left,right);
vec_float4 newRight = vec_max(left,right);
</code></pre>
<br />
The second part of the sort network requires intra-qword comparisons and the best way to do that is to rearrange the elements via the vec_perm() instruction. Diagrammatically we can depict that by permuting the vertical input lines as we have in the following diagram;<br />
<a bitly="BITLY_PROCESSED" href="http://lh4.ggpht.com/_hk4pJ9VK3zU/TEjXgtDZxvI/AAAAAAAACE4/2gLdPRwobfM/s1600-h/image25.png"><img alt="image" border="0" height="412" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXhYBHU0I/AAAAAAAACE8/50pFNcIwRRU/image_thumb22.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="453" /></a> <br />
<br />
The dotted red lines correspond to extra compares that aren’t strictly necessary according to the initial sort network, but nevertheless are required if we are to use SIMD min/max functions. In all the above cases, the comparisons will not change their inputs. In the first case, we know that X is the smallest value (as it’s the result of the comparison of the two smallest inputs, X and A), therefore, comparing X with anything will result in X still being the smallest value. Similarly for the second case, D is, by that stage, the largest value out of the 2 input vectors, so any comparison with D will result in D always being the largest value.<br />
<br />
This leads us to the following code which corresponds directly with the diagram above.<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">inline void SORT_MERGE(vec_float4 &a,vec_float4 &b)
{
vec_float4 A = vec_min(a,b);
vec_float4 B = vec_max(a,b);
B = vec_perm(A,B,PERM(C,D,A,B));
vec_float4 min2 = vec_min(A,B);
vec_float4 max2 = vec_max(A,B);
A = vec_perm(min2,max2,PERM(X,Y,W,D));
B = vec_perm(min2,max2,PERM(B,Z,C,A));
vec_float4 min3 = vec_min(A,B);
vec_float4 max3 = vec_max(A,B);
a = vec_perm(min3,max3,PERM(X,Y,B,Z));
b = vec_perm(min3,max3,PERM(C,W,D,A));
}
</code></pre>
<br />
If you remember we had originally sorted random data into sets of 4 sorted qwords via vec_min/max and transposition. This SORT_MERGE function can then be used to take 2 of those qwords and sort them into a set of 8 sorted floats. If we run it on all of the 4 sorted qwords we end up with two sets of 8 sorted floats.<br />
<br />
This SORT_MERGE function can be used as a replacement for the single float compare, allowing us to step through our data (which now contains qwords which are sorted internally) and sort and merge 8 values at a time instead of 2. An implementation of this algorithm was timed as taking 7.22 ms – still slower than the radix sorts, but much faster than the original merger sort at 23ms.<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;"><br />
</span><br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Extending the SIMD SORT_MERGE</span><br />
<br />
Let’s take it a step further and merge two sets of 8 floats at the same time. Consider the following sorting network which depicts two sets of 8 sorted values being merged into 16 sorted values;<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXifF1lsI/AAAAAAAACFA/W9fwsDMfWQ0/s1600-h/image34.png"><img alt="image" border="0" height="340" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXjG7QOnI/AAAAAAAACFE/m1Ab0OMfIQI/image_thumb29.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="319" /></a><br />
<br />
We can reverse the second set of 8 sorted floats to produce the following sorting network which is easier to implement using SIMD instructions;<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXjtzCvgI/AAAAAAAACFI/NC4omZ1NzCM/s1600-h/image45.png"><img alt="image" border="0" height="370" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/TEjXkv1zCnI/AAAAAAAACFM/tMTl7LYb6HA/image_thumb38.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="328" /></a>Note the symmetry in the above diagram;<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXlURZGzI/AAAAAAAACFQ/ANZPhjI0_rk/s1600-h/image50.png"><img alt="image" border="0" height="325" src="http://lh4.ggpht.com/_hk4pJ9VK3zU/TEjXmPu_J7I/AAAAAAAACFU/ZA-Dm8n9nLw/image_thumb41.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="406" /></a><br />
<br />
The code for Part 3 is straight forward – it merely has to reverse the second set of 8 sorted floats and then compare the relevant qwords via vec_min/max. The following code does just that;<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">inline void BitMerge3(vec_float4& a, vec_float4& b, vec_float4& c, vec_float4& d)
{
vec_float4 cr = vec_perm(d,d,(vec_uchar16)PERM(D,C,B,A)); // d reversed
vec_float4 dr = vec_perm(c,c,(vec_uchar16)PERM(D,C,B,A)); // c reversed
c = vec_max(a,cr);
d = vec_max(b,dr);
a = vec_min(a,cr);
b = vec_min(b,dr);
}
</code></pre>
<br />
Part 2 is also straight forward;<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> inline void BitMerge2(vec_float4& a, vec_float4& b)
{
vec_float4 ta = vec_min(a,b);
b = vec_max(a,b);
a = ta;
}
</code></pre>
<br />
Part 1 is a little more complex in that it requires intra qword compares. Judicious use of vec_perm() helps;<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">inline void BitMerge1(vec_float4& XYZW)
{
vec_float4 ZWXY = vec_perm(XYZW,XYZW,(vec_uchar16)PERM(Z,W,X,Y)); // roll it instead?
vec_float4 min = vec_min(XYZW,ZWXY);
vec_float4 max = vec_max(XYZW,ZWXY);
vec_float4 minmax = vec_perm(min,max,(vec_uchar16)PERM(X,Y,A,B));
vec_float4 YXBA = vec_perm(min,max,(vec_uchar16)PERM(Y,X,B,A));
vec_float4 mins = vec_min(minmax,YXBA);
vec_float4 maxs = vec_max(minmax,YXBA);
XYZW = vec_perm(mins,maxs,(vec_uchar16)PERM(X,A,Z,C));
}
</code></pre>
<br />
We can put the entire system together like this (where a,b,c & d are all sorted qwords, with a<b and c<d);<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">inline void BitonicMerge(vec_float4& a, vec_float4& b, vec_float4& c, vec_float4& d)
{
BitMerge3(a,b,c,d);
BitMerge2(a,b);
BitMerge2(c,d);
BitMerge1(a);
BitMerge1(b);
BitMerge1(c);
BitMerge1(d);
}
</code></pre>
<br />
What this gives us is a better merge implementation than the SORT_MERGE routine. We can now take 16 floats at once and merge and sort them in one pass with no branching. This algorithm sorts 65,536 floats in 4.15 ms, which is comparable to the radix sorts.<br />
<h3>
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXm85fmsI/AAAAAAAACFY/GlLFWmpWTvE/s1600-h/image59.png"><img alt="image" border="0" height="216" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXnnYKFDI/AAAAAAAACFc/apnYquS9UwI/image_thumb46.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="355" /></a> AA-Sorts vs Radix sorts</h3>
What I’ve shown here is how an efficient implementation of an algorithm with a higher time complexity can run faster than (or at least approach the speed of) one with a lower complexity. The AA-Sorts are cache friendly and SIMD instruction set friendly and because of that, run very efficiently. In fact, for 32,768 values or less, the AAsort 16 implementation runs faster than both of the radix sorts.<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXoCk8wgI/AAAAAAAACFg/xhStGDwKxsM/s1600-h/image71.png"><img alt="image" border="0" height="281" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXpCDk9HI/AAAAAAAACFk/G3miJXiSyWg/image_thumb52.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="578" /></a> <br />
<br />
This is all very interesting, but why would you bother with an AA-sort over a radix sort? One very good reason is if you have multiple cores available to run your sort on. The AA-sort is very parallelisable – multiple merges can run in parallel on different parts of the data with no interaction other than waiting for earlier passes to complete.<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;"><br />
</span><br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Parallel AA-Sort</span><br />
<br />
<em>I’ll touch on the parallel AA-sort here, but for more detail please refer to the detailed document I put together for SCEE on the presentation site.</em><br />
<br />
I ported the AA-sort algorithm running on 16 input values to SPU in two parts. The first pass divides the unsorted input data up into 64kb chunks and passes them to SPUs which sort that data using the transposition and merge pass mentioned for the AA-sort algorithm, outputting a 64kb chunk of sorted floats. A second SPU task then runs, streaming those 64kb buffers and merges them using the bitonic merge algorithm. Merging is done as many times as is necessary to give a single sorted buffer. And the execution time? A shade over one millisecond (running on 4 SPUS).<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh5.ggpht.com/_hk4pJ9VK3zU/TEjXps36UEI/AAAAAAAACFo/QHomKagyKm8/s1600-h/image76.png"><img alt="image" border="0" height="281" src="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXqXpN7RI/AAAAAAAACFs/LOvnACeOIaI/image_thumb55.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="462" /></a> <br />
<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Parallel Radix sorts</span><br />
<br />
The radix sort is difficult to implement efficiently for parallel systems. It is also difficult to implement efficiently in SIMD instructions – there are implicit race conditions due the random access nature of the algorithm which limit pipelining of instructions. My first attempt at a SPU radix implementation used a double buffered DMA slot for each histogram entry, so writes corresponding to each histogram entry were cached on SPU until that buffer was full and was then DMA’d out. This implementation was instructionally complex, with lots of branching and, while ported to SPU intrinsics, was still slower than the PPU implementation while running on a single SPU, clocking in at 8.3ms.<br />
<br />
My second attempt took a two stage approach, running a radix sort on 64kb of input data and then using the AA-Sort merge passes to complete the sort. The radix portion of that task completed in 670us compared to the initial AA-sort via transposition and bitonic merge completed in 436us, so even that was slower. Now, an implementation coded to work on a single float key in a full qword of data could possibly run faster than the equivalent AA-Sort as you’ll skip the race conditions inherit in storing floats into qwords, but I haven’t implemented a version to test that yet.<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;"><br />
</span><br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Applying the Sorts to real data</span><br />
<br />
Sorting floats is great for university exams, but what game developers need is sorts on useful data. I modified the AA-sorts and the Radix sorts to run on key/data pairs, providing enough flexibility to sort a float with a pointer to a structure of some form, while minimising cache damage as much as possible. The Radix sort modification is trivial but the AA-sort requires a little explanation.<br />
<br />
Consider the start of the SIMD SORT_MERGE function;<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> vec_float4 A = vec_min(a,b);
vec_float4 B = vec_max(a,b);
B = vec_perm(A,B,PERM(C,D,A,B));
</code></pre>
<br />
This can be rewritten without vec_min/max like this;<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> vec_bint4 mask1 = vec_cmpgt(a,b);
vec_float4 A = vec_sel(a,b,mask1);
vec_float4 B = vec_sel(b,a,mask1);
B = vec_perm(A,B,PERM(C,D,A,B));
</code></pre>
<br />
Now, this code is a little slower on VMX (same speed on SPU as there is no equivalent single instruction to vec_min or vec_max), but it gives us a mask that defines the reorganisation of the values being sorted. If we had a quad word that corresponded to the data parts of the key/data pairs then we could then call vec_sel() on those with that same mask at the cost of just a few cycles<br />
<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> vec_bint4 mask1 = vec_cmpgt(a,b);
vec_float4 A = vec_sel(a,b,mask1);
vec_loat4 B = vec_sel(b,a,mask1);
vec_float4 Adata = vec_sel(aData,bData,mask1);
vec_float4 Bdata = vec_sel(bData,aData,mask1);
B = vec_perm(A,B,PERM(C,D,A,B));
Bdata = vec_perm(Adata,Bdata,PERM(C,D,A,B));
</code></pre>
<br />
We can do this as part of the initial sort pass on the raw data where we sort into columns and transpose into sorted quad words, and we can keep our data in that format for the duration of the sort then re-interleave as part of the final pass (or as an extra final pass). The same procedure (using masks and running vec_sel on both key and data) can be used throughout the algorithm whenever comparisons and merges are required.<br />
<br />
Here are the times for the key/data sorts, running on 65,536 elements compared to the same algorithms running on floats only;<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/TEjXrEcODbI/AAAAAAAACFw/e7oCV1ysv3Q/s1600-h/image81.png"><img alt="image" border="0" height="266" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/TEjXr8NOp8I/AAAAAAAACF0/GMfS6gAsJcw/image_thumb58.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="456" /></a> <br />
<br />
In both cases we’re looking at about a 50% decrease in performance to handle twice the volume of source data.<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;"><br />
</span><br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Summary</span><br />
<br />
Algorithmic complexity is not the only factor in determining the run time of an algorithm. The hardware has a large impact and knowing how to use that hardware efficiently and tailoring your algorithm’s implementation to that HW and instruction set can provide excellent performance. In this case we’ve shown a O(nlogn) sort that is faster than the radix O(n) for domains less than 32,768 elements in size and which parallelises very well, giving excellent performance with minimal synchronisation.<br />
<br />
All of the sorts mentioned in this article will soon be available in source form through devnet for PlayStation®3 developers and there is also a longer document with more information on performance plus more implementation details soon available through the SCEE research site. I’ll update this blog when they are available. Most of these sorts can be optimised further with a little work – I’ve endeavoured to show the essence of my point here, rather than to provide the absolute fastest and most robust and generic sorting code available. If you have improvements or alternatives to suggest, then please let me know – similarly, if you have questions, ask me here and I’ll help you out as much as I can.<br />
<br />
I hope I’ve given you some food for thought when it comes to not just sorting, but to algorithm implementation in general.<br />
<br />
<span class="Apple-style-span" style="font-size: 19px; font-weight: bold;">Flagrant self promotion</span><br />
<br />
As of June 30 I finished working for SCEE Developer Services and have set up a company to do contract programming work, specialising in PlayStation®3 programming with a particular focus on performance. So, if you have a game that needs an experienced hand to look at it to try shave off a few milliseconds per frame or would just like advice on how to design your engine to run efficiently on your console then please contact me at <a bitly="BITLY_PROCESSED" href="mailto:Tony at overbyte dot com dot au">Tony at overbyte dot com dot au</a> and let's see if I can help.<br />
<br />
<span class="Apple-style-span" style="font-size: x-small;">Edit: Added active link to Sony <a bitly="BITLY_PROCESSED" href="http://research.scee.net/articles">site</a> for source and documentation.</span>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com15tag:blogger.com,1999:blog-36348397.post-33133255051890697302010-03-18T16:52:00.006+10:302010-03-20T17:11:30.224+10:30Right of ReplyI’m proud of the <a bitly="BITLY_PROCESSED" href="https://docs.google.com/viewer?url=http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf" target="_blank">Pitfalls</a> presentation I did last December at GCAP – it’s done what it was designed to do: incite discussion about the way you use your data in performance specific applications. Most of what I’ve read online about it has been positive (outside of <a bitly="BITLY_PROCESSED" href="http://www.reddit.com/r/programming/comments/ag43j/pitfalls_of_object_oriented_programming_pdf/" target="_blank">Reddit</a> anyway) until <a bitly="BITLY_PROCESSED" href="http://oleganza.tumblr.com/post/452679094/pitfalls-of-lack-of-encapsulation" target="_blank">this blog</a> by Oleg Andreev was brought to my attention (You can go and read it if you like, I’ll wait for you). As there is no way to comment directly on his page, I thought I’d exercise my right of reply here.<br />
<br />
First, some background. This presentation was aimed at console programmers who are primarily concerned with performance. Those of of you that have read the presentation or attended my talk would know that at no point have I said that you shouldn’t use OO at all. It has its benefits just as it has its problems – my goal was to highlight an issue that I see very often in game engines, an issue that can be particularly difficult to fix once a complete engine has been built around it. Forewarned is forearmed I always say. <br />
<br />
<div align="left">Lets have a look at some of Oleg’s issues with my presentation;</div><div align="left"><br />
</div><pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: 104px; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 95.2%;"><code style="color: black; word-wrap: normal;"> for (int k=0; k < innerSize; k++, wmat++, mat++, bs++, wbs++)
{
*wmat = (*parentTransform)*(*mat);
*wbs = bs->Transform(wmat);
}
</code></pre><blockquote><span class="Apple-style-span" style="font-family: Arial;">Do you see those wmat, mat, bs, wbs pointers? These are private things pulled out of node objects under the claim of “excessive encapsulation is BAD”. Now object does not control its data and once you’d like to add another special-effects matrix over the node, you’ll have to learn not only the Node class, but the entire rendering codebase!</span></blockquote>Here he is correct, I <em>have</em> pulled out the internals of the Node objects and am publically fiddling with them. Oh the horror! The object isn’t controlling its data! Now this may shock some of you, so please pause and take a deep breath: <em>there is no node object</em>. You could consider it to be a component system if you like. You instead have Nodes with components which are bounding spheres and transforms and all of these components are updated in the same way. You don’t care what owns the bounding spheres or transforms – you only want to process them. To transform them. If you want decent performance on current gen hardware you need to be transforming your homogeneous data homogenously. Your I and D caches will love you for it. So will your coworkers who depend on your engine performing optimally. Nodes as a structure only exist (in this system) for convenience, a way of loosely binding the relevant bits of data together.<br />
<br />
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/S6HGnY_0FBI/AAAAAAAACCM/sgz4Rt6jKls/s1600-h/matrix2%5B3%5D.jpg"><img alt="matrix2" border="0" height="272" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/S6HGoAFQz6I/AAAAAAAACCQ/WgWYozXbmF8/matrix2_thumb%5B1%5D.jpg?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="matrix2" width="364" /></a> <br />
<br />
And as for having to learn the entire rendering code base to add in a new feature, if you don’t understand the implications of adding in a new feature to a rendering engine then you shouldn’t be adding it in. You <em>need</em> to know the impact of your code addition. I would argue that in the case presented here, the addition of a special effects matrix would have a more easily measurable impact and would be easier to do than embedding another data structure into a multiply-inherited object with an unknown data layout and interleaving its processing with whatever else is going on. For one, it could be done orthogonally to the existing processing, in which case it would be easy to measure its performance. If it has to be (or should be for performance reasons) interleaved with the rest of the processing, then you can see exactly how the compiler deals with your code and you can plan to mitigate its D$ and I$ usage. Just because its not OO doesn’t mean that its not maintainable.<br />
<br />
Now, as for Oleg’s solution:<br />
<pre style="background: url(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLdwwaTz9Kp76gfrNwkeMxSLhQCN-jtLlRh-rLzsAhQz5zSAofyJNNEX6ZCjcZ0YorTvdfZVpHvDO9YYGGq_c8SV4JpmULWgD18TmwjAsKBZQe2qNf45-AkjGL-9XLWEsbwMMm_w/s320/codebg.gif) #f0f0f0; border-bottom: #cccccc 1px dashed; border-left: #cccccc 1px dashed; border-right: #cccccc 1px dashed; border-top: #cccccc 1px dashed; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> for (int k=0; k < innerSize; k++)
{
children[k]->updateWithParentTransform(*parentTransform);
}
</code></pre><blockquote><span style="font-family: Arial; font-size: 100%;">Where updateWithParentTransform does the job involving wmat, mat, wbs and bs and gives you guarantee that this is the single file where these variables are accessed directly.</span></blockquote>What does updateWithParentTransform() do? Does it just update the parent transform? If there is documentation provided I could read that – hopefully its up to date. It might be in the header. If its not there, it might be in the implementation. Or I might just have to read the code (more likely, knowing games programmers). So I have to navigate to another file to read the same code which was originally <em>encapsulated </em>within the render class which did the original update – the code has just been moved to another less accessible area, but is functionally identical. Sure, making it virtual will mean that you could have different implementations (potentially in different files), but then we are <em>implicitly</em> impinging on performance with I and D cache damage, not to mention having multiple sets of code to optimise. <br />
<a bitly="BITLY_PROCESSED" href="http://lh3.ggpht.com/_hk4pJ9VK3zU/S6HGo6MqffI/AAAAAAAACCU/9G6dowAFr_k/s1600-h/jungle3%5B6%5D.jpg"><img alt="jungle3" border="0" height="274" src="http://lh4.ggpht.com/_hk4pJ9VK3zU/S6HGpkOgmqI/AAAAAAAACCY/iw2NFXZkb3Y/jungle3_thumb%5B4%5D.jpg?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="jungle3" width="364" /></a> <br />
<br />
This is important, so I’ll reiterate; there is nothing worse than trying to determine what a function does and having to navigate through a jungle of single line function calls that in classes that call other methods that call virtual functions that can do any one of a number of things which are only determined at run time. Sure, the methods may collapse down to a single line of asm upon compilation, but I as a programmer want to easily determine what is going on and I definitely don’t want to have to search through your rubbish. Additionally, onion code like this performs notoriously bad in debug builds where the functions generally aren’t inlined. My solution is transparent – you can see exactly what is going on there. It’s immediately evident. And don’t get me started on meta template programming. Anyway, where was I?<br />
<br />
<div>As for “gives you guarantee that this is the single file where these variables are accessed directly.”; if having only one file access the object’s data is so important to you, you could change the system around by building a NodeManager which could encapsulate the transforming, culling and processing of nodes along with their creation, modification and deletion. This manager could be an object if you like and that way you could say that your system is still object oriented (but really, Nodes in this case are not really objects – they are loosely coupled flyweight entities that are managed by the NodeManager. Additions to Nodes would then be limited via API to the manager class providing separation from the rest of the engine). The benefit of a single NodeManager over many smaller self processing nodes is manyfold – clarity of data, clarity of processing, ease of optimisation, ease of comprehension.<br />
<br />
</div><div>The astute among you would have noticed that I’m suggesting a solution that involves C++ and objects. This is because I am quite happy to use C++ – I like it in fact. I just don’t like how it can be misused. Anyway, back to Oleg’s comments;<br />
<blockquote><span style="font-family: Arial; font-size: 100%;">Also note that this method will be perfectly inlined by C++ compiler or smart dynamic Smalltalk/Self/JVM system, so the result code will do the same operations and memory accesses as the manually inlined code with “naked” private pointers.</span></blockquote>Sure, it will be inlined as long as it’s not virtual. Or too long. Or isn’t in the header. Or if the compiler decides for some unfathomable reason that it shouldn’t be inlined. But the issue isn’t inlining (which, yes, you’d hope it would do and you’d check your disassembly to ensure was happening), it’s transparency of implementation. I <em>need</em> to know what that function does with its data. And if you are chasing performance, you <em>need</em> to know too. In the case of a NodeManager class (or even my original Renderer interface), you know that all of your nodes are being processed in sequence and code like I presented makes absolute sense and its functionality is completely transparent.<br />
<br />
Now, on to Oleg’s second point;<br />
<blockquote><span style="font-family: Arial;">The second claim is to “Make the processing global rather than local” (slide 73). This is also awfully wrong. Tony suggests splitting the tree of nodes into arrays of nodes sorted by level. It is not only inflexible (or requires quite complicated algorithms to maintain the invariant), but is also pointless.</span></blockquote>Inflexible, yes. Awfully wrong? Maybe. Pointless? Hmmm. Oleg must have missed the slide that showed the performance boost gained by that process. Let me summarise: <br />
<blockquote><em>Performance improved from 12.9ms to 4.8ms.</em> </blockquote>Now that is a vast improvement. And that includes 3.3ms spent navigating the scenetree hierarchy in an inefficient OO fashion for rendering. So, if we remove that overhead (which is constant for all the implementations I presented) then we get an improvement from <strong>9.6ms to 1.5ms</strong>. Now I would say that a performance improvement of that magnitude is worth pursuing.<br />
<div align="center"><a bitly="BITLY_PROCESSED" href="http://www.inkart.com/pages/travel/tortoise_and_hare.html"><img alt="Scratchboard_Tortoise_And_Hare" border="0" height="362" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/S6HGqb2-eNI/AAAAAAAACCc/HCkN2VF8ZWI/Scratchboard_Tortoise_And_Hare%5B4%5D.jpg?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="Scratchboard_Tortoise_And_Hare" width="364" /></a> <span style="font-size: 78%;">from </span><a bitly="BITLY_PROCESSED" href="http://www.inkart.com/" title="http://www.inkart.com"><span style="font-size: 78%;">http://www.inkart.com</span></a> </div><br />
Now as to the inflexibility; I agree that it is inflexible as it stands, but that doesn’t mean that it’s useless. Most games have levels and scenery which are incidental and don’t change at all. These scenes are ideally suited to such an inflexible yet dramatically fast (or awfully wrong and pointless) solution. Continuously moving elements should be processed in a manner better suited to their needs – if there weren’t too many of them, a scene tree may even be suitable. Its all about choosing the correct solution for the problem at hand. Just because some data doesn’t want to behave well doesn’t mean that all the data has to suffer.<br />
<br />
<blockquote><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;">But still, he claims some performance gain out of the fact that nodes are not traversed recursively, but rather linearly using quite a brittle memory layout.</span><br />
<span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;">There is no point in that since node objects are so small that most of the data you need to update children using parent’s transformation matrix is already in the cache. And for the cached data there’s no difference how it is positioned: the access time is constant.</span><br />
<br />
</blockquote>No point? I wrote the code and I tested the code and then I profiled the code. Then I showed everyone the results. And the results showed a dramatic improvement in performance. That was the point. An improvement of performance. I apologise if I didn’t make that clear enough.<br />
<blockquote><span style="font-family: Arial;">But he did not only traded nothing for more complicated code, but also made his life harder to move from a single CPU to multiple CPUs (say, GPU): only recursive algorithms and encapsulation may give you an option to parallelize computation. By flattening algorithms and breaking encapsulation Tony cut himself a way to scale the performance horizontally (or, equally, made it harder to automatic parallelizing compiler to do its job).</span></blockquote>I found this comment interesting. I <em>think</em> the implication here is that in order to parallelise a computation, algorithms need to be encapsulated in an OO fashion (please correct me if I’m wrong). But from my experience (which consists of working with multiprocessor systems for the last 10 years) the easiest construct to parallelise is a linearly iterated array of independent data elements. Which is what I have produced (albeit in two passes as the secondary pass is dependant on the first). <br />
<br />
And, in actual fact, I have parallelised this algorithm to run on the PS3 using an arbitrary number of SPUs. Would you like to know the performance details? Running on SPUs dropped the time to execute from 1.5ms to 0.7ms (DMA bound as the processing required was miniscule). It was pretty simple to break the data up into a collection of parents with their children and process them in parallel in exactly the same way as I did on the PPU – as sets of arrays and bounding spheres. Note that I’ve not optimised this at all – it was purely a proof of concept, a direct port. An optimal version would run much faster again.<br />
<br />
And as for an “automatic parallelizing compiler”; let me make two points. Firstly, an “automatic parallelizing compiler” would find it far easier to optimise the final code I produced than some arbitrarily nested set of functions intertwined in an OO hierarchy (remember my comment about linearly iterated arrays of independent arrays?). And secondly, a compiler has to deal with the general case and as such will never produce optimal code for your special cases. You as the programmer know far more about your system and its data dependencies than a compiler ever could and this allows you to either provide hints to the compiler or, and optimise your code and data to allow the compiler to process it quickly. An “automatic parallelizing compiler” may give you an initial boost in performance but it will never provide you with the performance that a smart coder can extract manually.<br />
<br />
The whole point of the Pitfalls presentation was to show how encapsulation can impact negatively on your performance - if you do it naively. I’m not saying OO is totally bad, and I’m not saying encapsulation should never be used. I’m saying be aware of what you are doing and of how the HW is interpreting what you’re trying to do (and what the limitations of that HW are). I appreciate the convenience of OO for design and implementation, but sometimes performance is more important than convenience.<br />
Your comments are welcome on this blog, regardless of which side of the OO fence you sit on. Let me know your opinion.</div>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com22tag:blogger.com,1999:blog-36348397.post-23523793412112577382009-12-23T15:26:00.001+10:302009-12-23T15:26:46.085+10:30Pitfalls of Object Oriented Programming<p><a href="http://lh5.ggpht.com/_hk4pJ9VK3zU/SzGjCVueuoI/AAAAAAAACAA/DMPknXrkYCM/s1600-h/bargraph4%5B8%5D.jpg"><img title="bargraph4" style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" height="222" alt="bargraph4" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/SzGjDUSv3sI/AAAAAAAACAE/Qo2gKko40vQ/bargraph4_thumb%5B5%5D.jpg?imgmax=800" width="377" border="0" /></a> I recently spoke at Game Connect: Asia Pacific 2009 [<a href="http://www.gameconnectap.com/" target="_blank">GCAP</a>] and one of my talks was on the impact of naive OO code on the cache and an alternative data oriented solution (with an equivalent inspection of the cache impact). Its basically an extrapolation from my last blog entry, <a href="http://seven-degrees-of-freedom.blogspot.com/2009/10/latency-elephant.html" target="_blank">The Latency Elephant</a>, and as such I won’t say much more about it but for those of you who haven’t read it already (it’s been circulating around Twitter and on <a href="http://www.reddit.com/r/programming/comments/ag43j/pitfalls_of_object_oriented_programming_pdf/" target="_blank">Reddit</a> for a couple of days now) here’s a <a href="http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf" target="_blank">link</a> to it.</p> <p>Note this is a pdf version of PowerPoint slides used for an oral presentation and so animations etc are not present. But the essence of the talk is there – feel free to comment or ask for clarifications here.</p> Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-15995317948553994972009-10-27T09:14:00.003+10:302009-10-27T22:43:42.936+10:30The Latency Elephant.<p>When I started in the games industry back in 2000, one of my first major tasks (with the help of one other eager masochist) was to rewrite that company’s graphics engine for the PS2 (while a game was being written using it – a recipe for pain if I’ve even seen one). I designed and built it using the knowledge I’d attained from working in the Mining and Defence industries as well as what I’d learnt in academia. It was basically an object oriented engine – you had renderable objects that contained a lot of information about themselves; their state, size, orientation and position, references to vertex and texture data etc. These renderable objects were stored in a fairly flat hierarchy and DMA chains were constructed from the visible objects and used for rendering on the PS2. It was a simple engine (I don’t believe in overcomplicating things) and it did its job well enough, but over the years it increased in functionality and the performance demanded of it increased also.</p> <p><a href="http://lh5.ggpht.com/_hk4pJ9VK3zU/SuYmKJVOCmI/AAAAAAAAB-s/NgwT3074Cig/s1600-h/image16.png"><img title="image" style="border-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" alt="image" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/SuYmLKe_rTI/AAAAAAAAB-w/qG9XmvjALWU/image_thumb10.png?imgmax=800" border="0" height="220" width="420" /></a> </p> <p>The obvious bottlenecks were optimised and eventually what we were left with was an engine that, while functional enough, just didn’t run fast enough. Profiling at this point showed no tightly contained bottlenecks, it was as if a miasma of inefficiency was spread throughout the rendering system – everything was just a little bit slow. I bit the bullet and completely rewrote it, pulling out the object oriented sensibilities and replacing everything with flat homogenous arrays – the static world was still rendered in parcels, but each parcel was a lean set of arrays of bounding boxes, DMA chains and data already prepared sending directly to the HW. With data neatly laid out in such a fashion performance leapt by an order of magnitude – a test scene I was was profiling with went from taking 17ms to render to 2ms. Loading of the data from disc also sped up dramatically. Note that this was an engine that had been optimised and improved over 5 years.</p> <p>I had to learn a lot to extract this level of performance – I had to understand how the I and D caches worked, how the compiler transformed code, and how the data flowed through the hardware and software I was using and writing. I also learnt that in order to gain a high level of performance you will probably have to throw away your OO design and replace it with a design that considers data and the flow of that data as a primary concern.</p> <p>This is even more evident in today’s machines – it can cost up to 600 cycles to extract a piece of data from outside of the L2 cache on a Power PC processor! Do you have any idea how much processing you can do in 600 cycles? In order to extract a high level of performance, a programmer *must* consider the data over the processing of that data. If your data is not in cache friendly coherent streams then it doesn’t matter how few cycles your code takes to execute, all that matters is how fast you can get your data to your instructions. Precaching your data helps, but you still have to be able to look 400 cycles or so ahead to ensure that the required data is ready in the cache when you need it.</p> <p>This isn’t a new problem, but it is one that has been slowly creeping up on us. In the 80’s we had the pleasure of access to main memory being in the order of a single cycle or so – obviously the focus on design in such a system is on the instructions. Do you know what was written in the 80’s? C++ (well, started in ‘79 but first released in ‘85). Since the 80s CPU speeds have been increasing by 60% per year and memory performance has relatively crawled along at a measly 10% increase in performance per year. </p> <p><a href="http://www.cs.washington.edu/homes/tom/pubs/iram-micro.pdf" target="_blank"><img title="CPU_Memory_Comparison" style="border-width: 0px; margin: 0px auto; display: block; float: none; width: 546px; height: 304px;" alt="CPU_Memory_Comparison" src="http://lh5.ggpht.com/_hk4pJ9VK3zU/SuYmL2EL36I/AAAAAAAAB-0/f77gd6dWl3E/CPU_Memory_Comparison16.jpg?imgmax=800" border="0" /></a></p> <p> </p> <p>What this means is that this problem will only get worse. Adding extra levels of cache will help, better and bigger caches will help, but in the end you still need to get your data from the relatively slow main memory into your pipeline. And if you want your system to perform well, you will need to think very carefully about where the data that you want is, how much there is of it and how long it will take to get it.</p> <p>The reason that OO design is so bad for modern (console) architectures is that it treats data and code as being equally important. Bundling up all the associated data into a single contiguous chunk may be convenient for debugging and for your traditional OO programming mind set, but it will run badly. You are far better off allocating this data into homogenous pools (avoiding heavy malloc() calls is always a good idea anyway) or at least keeping the data that is used together contiguous (spatial and temporal locality of data is a necessary goal here).</p> <p>The other benefit of considering data in this manner is that it becomes much easier to parallelise. Your code is generally simpler (it is doing fewer things at once), dependencies are more obvious and functionality more delineated (making it easier to break up into independent tasks). You also know what you will be doing 400 or 500 cycles in the future so prefetching becomes easier too. Not to mention the ease of the migration of this code to SPU (assuming you have them).</p> <p>There is still a place for object oriented design in games, most definitely. C++ provides some very convenient ways to manage large systems of code, and 80% of your codebase isn’t going to be the bottleneck anyway. Its the 20% that gets executed 80% of the time that you need to worry about. If you aren’t clear on how data will flow through your system, or can’t know how it flows, then by all means build that system in an OO fashion, but be aware that you may (will) have to rewrite this code at a later date. Keep an eye on the data in the classes, be aware of how this data is used, note which data is used the most – and when that system becomes a bottleneck, refactor it so that it works efficiently under the hood. If your design is adequate then you should be able to maintain a similar interface and protect the rest of the game code from too much disruption. But, in order to make things easier on yourself, you should be considering the design of your data over the design of your code and you should be doing it now.</p> <p></p> <p>Some of the game development industry’s top programmers have been talking (and in at least one case, ranting) about this for years. Christer Ericson talked about it in his <a href="http://realtimecollisiondetection.net/pubs/GDC03_Ericson_Memory_Optimization.ppt" target="_blank">GDC 2003</a> presentation. Mike Acton persistently proclaims that <a href="http://macton.smugmug.com/gallery/8936708_T6zQX#593426709_ZX4pZ" target="_blank">C++ programming is Bullshit</a> and his <a href="http://cellperformance.beyond3d.com/articles/2008/03/three-big-lies.html" target="_blank">Three Big Lies</a> are fundamentally about <a href="http://cellperformance.beyond3d.com/articles/2006/04/performance-and-good-data-design.html" target="_blank">designing around data</a> instead of code. Recently, Noel Llopis published an excellent article on <a href="http://gamedeveloper.texterity.com/gamedeveloper/200909/?folio=43#pg45" target="_blank">Data-Oriented Design</a> in the September issue of Game Developer magazine. <a href="http://lh6.ggpht.com/_hk4pJ9VK3zU/SuYmN6A0a4I/AAAAAAAAB-4/RIfjAJ3kGyQ/s1600-h/image17%5B1%5D.png"><img title="image" style="border: 0px none ; display: block; float: none; margin-left: auto; margin-right: auto;" alt="image" src="http://lh3.ggpht.com/_hk4pJ9VK3zU/SuYmPRMz1-I/AAAAAAAAB-8/7fGEenJvl0I/image17_thumb.png?imgmax=800" border="0" height="325" width="420" /></a></p> <p>Memory access speeds have been the elephant in the room for years now, but now either the elephant is getting bigger or the room is getting smaller. Either way, we can’t afford to ignore it anymore.</p>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com1tag:blogger.com,1999:blog-36348397.post-56360823546140992062009-05-03T21:41:00.003+09:302009-05-11T22:10:30.455+09:30Remote Control<p>As the game development industry is getting older, so are it’s employees. These employees have families that are becoming increasingly important and these same families are putting extra demands on the time and even the location of these maturing developers. One increasingly appealing option is to work remotely – you get to spend more time with your family and spend more time at work too – you get your cake and eat it too. But trust me, its not all that simple – sometimes there is a little too much cake.</p> <p><a href="http://lh6.ggpht.com/_hk4pJ9VK3zU/Sf2J4pQm4EI/AAAAAAAAB8Y/efneF49Fqkg/s1600-h/bloodymesscake1thu%5B7%5D.jpg"><img title="bloodymesscake1thu" style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" height="160" alt="bloodymesscake1thu" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/Sf2J6Jvg9MI/AAAAAAAAB8c/lDIuv9BQ_r0/bloodymesscake1thu_thumb%5B5%5D.jpg?imgmax=800" width="244" border="0" /></a> </p> <p>I’ve spent the last 2 and a bit years working remotely for a couple of companies with half a dozen teams in almost as many time zones and it would have been a damn sight easier to have been working with them in the same building. </p> <blockquote> <p>“But…” you say, “but you get to code with no pants on…” </p> </blockquote> <blockquote> <p>“No!” I interject. “Well, yes, but No! Shut up. Listen to me first, here’s why its more difficult”;</p> </blockquote> <p>Communication is much, much harder. You can’t just turn around and abuse the dickhead behind you who thought that thread safe programming merely meant adding volatile to variable declarations. You can’t bump into the office graphics guru and strike up a conversation that leads to a new, more optimal way of performing motion blur. Emails <strong>will</strong> be misunderstood and misinterpreted – there is no substitute for face to face communication. You can’t under estimate the power and clarity provided by body language. Communication via instant messaging is a pain in the arse for anything of any detail.</p> <p>You will get overlooked for meetings and announcements – you are out of sight and out of mind. And the larger the team, the more likely it is that you will be forgotten. Large meetings are awful over the phone – you pick up a lot of ambient noise and different speakers in different positions mean that you hardly ever hear what is going on. Plus it is an order of magnitude more boring not being there. I mean, its often hard enough to stay awake in some team meetings when you are there, let alone sitting in your room alone, eyes closed as you concentrate on the current speaker, leaning back on your comfy chair, thinking about the code you’re working on, or what you’ll be doing after work, or the last episode of True Blood you’ve just watched… well, you get the idea.</p> <p>Testing your code becomes far more laborious – and it is also far more important that your code is robust. The first person to get blamed for code not working is the person that’s recently checked something in who isn’t there. Which, when you are working remotely, is always you. Here’s the typical cycle for submitting some code while you are working remotely;</p> <blockquote> <p>You spend a week writing your code. You’ve been very cautious and carefully verify that it works flawlessly with your data. So you go ahead and check out the latest version of the code in the main branch (which has been automatically tested, so you assume that it works) and merge it with your own. You attempt to test it against your own data but realise that you need the latest data. So, you do a grab of the latest data in the art repository, hoping that there aren’t too many extra assets to download. When that finally finishes you munge the new assets only to discover that you need the latest version of the editor to munge all of your data as there have been some fundamental changes. Without the benefit of an office full of machines that can be utilised for a distributed munge, you know that you’re going to be waiting for 4 or more hours before you can test again (BTW, the munge process is sometimes called cooking because the ambient temperature of your office rises by 10C while the machines you do have churn through gigabytes of data). Finally, you test your code against the newly munged data only to learn that it doesn’t work. QA assures you (over the phone or IM) that the latest build is working, so you spend a couple of hours trying to debug your code, fruitlessly. You stagger to bed, say hi to the wife, and sleep the restless sleep of a coder without functioning code. The next morning you spend some more time on your code then chat with various people about your problem only to find out that the version of the editor that you grabbed didn’t work with the version of the code you had and that it was fixed not long after you checked it out (or that you have to roll it back to an earlier version). At that point you check out the working version of the editor, check out the latest version of the code, merge, check out the latest data, compile, fire off a munge, kick the cat, put your pants on and head down the pub. </p> <p><img title="image" style="border-top-width: 0px; display: block; border-left-width: 0px; float: none; border-bottom-width: 0px; margin-left: auto; margin-right: auto; border-right-width: 0px" height="164" alt="image" src="http://lh3.ggpht.com/_hk4pJ9VK3zU/Sf2J7NMB_QI/AAAAAAAAB8g/SsN5S6CrQ-o/image_thumb%5B8%5D.png?imgmax=800" width="244" border="0" /></p> </blockquote> <p>Yes, it can be that painful – I’ve spent days trying to check in working code. Even with continuous integration and a good QA team, the lag between your code and data and the main branch can be troublesome to say the least.</p> <p>Another issue with coding is that you are pretty much on your own. If you are having problems with a programming problem it is very hard to get someone to help you remotely. Applications like <a href="http://www.uvnc.com/" target="_blank">UltraVNC</a> are excellent and I recommend that you do peer reviewed code checkins using something like this, but it is (at least initially) a large intrusion on a co-workers time to get them to remote into your machine and help you with your code. It becomes less of an issue as they become more used it but it is still a hurdle to cross.</p> <p>Sure, you have more privacy and less distractions – actually, no. You have more distractions – family, a fridge with (theoretically) more food, alcohol, TV, games (not in the fridge), the internet with no-one looking over your shoulder, it’s just started raining and you’ve noticed that guttering is leaking so you get up on the roof to fix it ‘cos you can’t see where it’s leaking when its not raining…and did I mention games? The temptation to work your own hours “when you feel like it” is quite high – I mean, you’re not impacting on anyone elses schedule are you and anyway, you work better at 3am, right? That is until you pull an all nighter, sleep most of the next day and then can’t be arsed to work the night away again and before you know it you’ve lost a day of work. </p> <p>Time zones complicate things somewhat also. It’s not too bad if you are only an hour or so out, but when working from Australia with the US or UK, you have a very large discrepancy in work hours. My current employer is in the UK and our regular office hours do not overlap at all – in order to communicate directly I need to spend part of my evening working. Sure that means that I manage to avoid watching some crap TV with The Wife(TM), but part of the reason to work from home was to spend time with the family wasn’t it?</p> <p>One of the hardest things I’ve found is the lack of human interaction. I miss the idle banter you get in an office, the pointless chats while making coffee, the new friends you make while arguing over lunch, the things you learn when asking a co-worker to help you with a coding problem. </p> <p>So you can see that there is some bad thrown in with the benefits of working remote and pantless. There is a lot of good though – when you hit the zone there is nothing to break you out of it – the flow keeps on going and going…. You can modify your working hours (a little) without affecting your routine too much – an hour here or there to take the kids to the doctor or swimming – and you can easily put in more hours when the need arises.The 5 second commute is awesome. <br /></p> <p>There are a few things that you can do to help you deal with working remotely.</p> <p>You <strong>must</strong> know the team you are working with. Programmers are notorious for not trusting other people’s code. If you’ve not worked locally with a team before I would advise that you meet with them and spend at least a week or two working on site, learning the ropes, appreciating your workmate’s strengths and personalities and, importantly, socialising with them. It is important that everyone understands each other’s sense of humour (or lack thereof) – it helps with textual communication. I would advise that the remote worker and local team have at least one video conference a week – cover what you have worked on, what you will be working on, any problems you’ve been having as well as sprinkling the meeting with idle banter. You need to maintain a social connection with the team – if people like you then they are more likely to respond to your emails earlier and help you more.</p> <p>Be disciplined with your working hours. Start at the same time, lunch at the same time, try and finish at the same time. Regular work hours will help you to maintain a sense of work life and home life – you need to maintain a sense of separation otherwise you’ll either end up working all the time or alternatively, watching Oprah and Dr Phil instead of working. Take short breaks like you would in a workplace – culture some rituals; for instance make a coffee (not instant mouth rot coffee, make something a little more involved. I use a stove top percolator to make beautiful coffee from beans bought at the local market). This will give you the type of break that you would get naturally at work, and give your brain a little time to work on in the background. Be careful that these rituals don’t become a form of procrastination though. </p> <p><a href="http://lh4.ggpht.com/_hk4pJ9VK3zU/SggcuBL76lI/AAAAAAAAB8s/l81B5_iizcM/s1600-h/image%5B2%5D.png"><img title="image" style="border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; border-left: 0px; margin-right: auto; border-bottom: 0px" height="244" alt="image" src="http://lh6.ggpht.com/_hk4pJ9VK3zU/SggcvMqBcyI/AAAAAAAAB8w/rj6Z91FM4qI/image_thumb.png?imgmax=800" width="211" border="0" /></a> </p> <p>If you have a significant timezone difference, try to schedule some regular hours where you overlap with the team you are working with. Take those hours out of your regular work day but be consistent. If your workmates expect you to be working within their work hours on a regular basis then they are more likely to instigate communication during those regular “cross over” hours. Make sure that your workmates are aware that you are working – running a instant messaging client will ensure that your coworkers know when you are online. </p> <p>With the problems involved with the latency in checking in and building or cooking your data, the best solution I’ve found is to get the local team’s QA to check in and label munged data and binaries when they test a specific code set. This means that you won’t have to worry about building the data yourself and you remove a level of complexity and another source of potential errors. Also, with the size of modern game’s data sets its often quicker to download 5GB of data than it is to munge it (assuming of course that you have a decent internet connection. If you don’t then I suggest you relocate until you do). Of course this doesn’t work when you are changing the munging yourself. </p> <p>Be proactive with communication. Answer all of your emails promptly (if you work in a dramatically different timezone then you have the benefit of getting a full days emails when you log in in the morning). Regularly email team members with questions and even simple communication – you need to cultivate your relationship. Maintain an online presence via instant messaging. Don’t let yourself be forgotten. If you are being forgotten for meetings, ring the meeting room yourself. Make sure that management realise how important it is that you are included. I’ll mention it again as its so important; video conference at least once a week. The best remote relationship I had with a team was one where we had a video conference every morning. It was just a short scrum type stand up meeting, but invaluable as far as building a relationship with the team and understanding what everyone was doing and, even more importantly, letting your team mates know what you are working on. <br /></p> <p>So, if you have read this far, congratulations. This is a big topic, one that I deal with daily and one that I think is becoming more and more relevant to the modern programmer. I recommend you read <a href="http://www.randsinrepose.com/archives/2009/04/15/the_pond.html" target="_blank">The Pond</a> for an excellent article on working remotely from the point of view of a manager. I’ve managed a remote worker before and the best thing I can recommend is to call regularly for updates and to catch issues early and to just maintain a base level of interaction, letting the remote worker know that you know they are there and working and that they are appreciated. It is also imperative that the manager sets realistic milestones for the remote worker and gets regular status updates - it is all too easy for a remote worker to drift off into a little corner of the codebase which is incredibly interesting, yet ultimately irrelevant.</p> <p>Working remotely is hard. It is more work for manager and worker alike, but it can be very successful and rewarding as long as both parties are willing to address problems as soon as they arise. I'm privileged to have been able to work from home for these last few years - my children know nothing different. Daddy has always been there, and if things work out, Daddy will always be there. </p> Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-74064582478262787592009-02-09T09:48:00.007+10:302009-02-09T17:27:14.882+10:30The Cost of RedundancyAs I'm still waiting for my new job to kick off (I'm quietly building card cubes - not even halfway done yet), I thought I'd pen a few thoughts on the effects of redundancy on the individuals involved.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixTOCXhjKt1Rj1uUs8cbh8CTCVJ0zuyvSPeg4v46i48254hCTWeLyNcQKDGu97NhYWUdkvyTVdzLrHnoM0ewD7qhfaTl38x18k-fXQzs8-UOTmtpWxQ__OJsu6JTypz3GmibTc/s1600-h/Pandemic_Tower.JPG"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 214px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixTOCXhjKt1Rj1uUs8cbh8CTCVJ0zuyvSPeg4v46i48254hCTWeLyNcQKDGu97NhYWUdkvyTVdzLrHnoM0ewD7qhfaTl38x18k-fXQzs8-UOTmtpWxQ__OJsu6JTypz3GmibTc/s320/Pandemic_Tower.JPG" alt="" id="BLOGGER_PHOTO_ID_5300681134297066418" border="0" /></a><br />This particular redundancy went relatively smoothly for me, I saw it coming a few months ahead and was able to prepare for it. That doesn't mean that I was happy for it to happen - not at all. With any project you work on you form a close bond with the people you work with. They are your friends, your surrogate family - you spend more waking time with them than any other group. You spend years working together toward a common goal; the thrill of building a new project, drunken discussions at 3am over how a certain feature should be implemented to improve the gameplay experience, or how a crucial piece of tech should be rewritten to make it faster, more stable, simpler, smaller, better. You strain your relationships outside of your workplace with neglect - all your focus is on building the next big thing.<br /><br />It's no wonder that when that gets taken away from you there is a grieving process. Having a game canned is bad enough, but when you lose your game, your friends, and your job, it's even harder. And with the current financial problems in the industry (not just the games industry) it's getting harder to find work without relocating. Moving yourself is hard enough, moving yourself and your family interstate or internationally to a job that you hope is more stable than your last is incredibly stressful.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1BP6hFhMd6hek6huNhv6VC49HCt_pNKZ3DxSEwJbPKOx8FYJWN3DXDO4eVD1DZa4dZ_wJnT7iHnc7Z2EN3SvZkzpkqyE4YGnKQZoekyJAkbffEO9kk0XvxK77_Tl9Tgxy2UAy/s1600-h/BWHOTF.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 192px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1BP6hFhMd6hek6huNhv6VC49HCt_pNKZ3DxSEwJbPKOx8FYJWN3DXDO4eVD1DZa4dZ_wJnT7iHnc7Z2EN3SvZkzpkqyE4YGnKQZoekyJAkbffEO9kk0XvxK77_Tl9Tgxy2UAy/s320/BWHOTF.jpg" alt="" id="BLOGGER_PHOTO_ID_5300686740357728290" border="0" /></a><br />My first redundancy was like that - I had no idea it was coming. I'd heard a rumour a few days before it happened and refused to believe it. When the axe fell and the entire studio was disbanded I was devastated. I was in the process of organising an extension on my house, I had a very young daughter, my wife was pregnant and all of a sudden I was without a source of income. And all this just before Christmas. That was hard. And the outcome was to uproot my family and move interstate.<br /><br />This time, however, I was far more prepared and a bit more senior, so finding work was easier. I really feel for the people that have just managed to break into the industry and have been laid off, especially since the market is flooded with good people that have been made redundant from any number of studios that have been recently shut down by EA. Or THQ. Or Midway. Or pretty much anywhere. It's a very volatile industry now. Forewarned is forearmed - keep your eyes open for delayed milestones, delayed milestone payments, sudden dramatic scope changes in the game, groups of experienced people leaving, anything that should make you nervous. It is surprisingly easy to ignore this stuff, to blindly tell yourself that it'll be OK and it'll all work out fine. I know, I've done it myself.<br /><br />I'm not saying that you should jump ship at the first sign of difficulty, just that when things start to go awry, you might want to sharpen your resume, polish up your LinkedIn profile or talk to mates in the industry and see how their workplace is faring.<br /><br />Now, there is an upside to redundancies and studio closures, and that's new studios starting up. Give a group of enthusiastic, experienced developers a nice fat redundancy payout and they might just decide to band together and form a startup. Or another cash flush studio may decide to buy up some of the newly available talent and start a new branch of their own in the city where yours closed down. With these new opportunities comes new friendships, new bonds, new projects and goals. On top of that, your old friends will have scattered to the four corners, widening your industry network, potentially providing you with a source of employment in the future (and a place to crash when you go on holiday).<br /><br />I have no regrets in choosing Pandemic Brisbane for my previous place of employment - I've met some fantastically talented people, made some great new friends and learnt a lot about all aspects of the game development process. I've also learnt a lot about myself, my relationship with my family and what I want to do with my life. I'm proud of what we built, even though it never saw the light of day.<br /><br />To my friends and ex-coworkers - it's been a pleasure. Good luck with whatever you're doing now - working in the industry again, or looking for work there or elsewhere. Hopefully I'll see you around.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-68810311938416891852009-01-23T15:40:00.008+10:302009-01-23T16:27:12.160+10:30Card BlancheSo, I'm between jobs. I've finished Mirror's Edge (Test of Faith on my first play through), played Wipeout HD until my eyeballs dried out, done some work around the house, reorganised my game collection (by platform then genre, with surprisingly smooth transitions between genres), rearranged the lounge room and have got to the the point where there is nothing left to do other than tidy up my office. I mean, I've sent my work equipment back (I work from home BTW) and have reformatted and reinstalled the HW I have (2 years of application accretion will slow your machine down somewhat, even if it does have 4 cores) but the physical space is still a bloody mess.<span style="text-decoration: underline;"><br /><br /></span><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHrYs4VQpJOrMsKm85lSJOjjtJuYrgTu3QoS0B4eUUfubJTYqCE4xt9DbycecwRPUlGTBAWVxoi5dii5F_XndVh68mWDx_Dva59Lz2ay4srsWyqaGD3240tR1InchFooApMDMY/s1600-h/Pandemic_tower_of+_cards_2.jpg"><img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 214px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHrYs4VQpJOrMsKm85lSJOjjtJuYrgTu3QoS0B4eUUfubJTYqCE4xt9DbycecwRPUlGTBAWVxoi5dii5F_XndVh68mWDx_Dva59Lz2ay4srsWyqaGD3240tR1InchFooApMDMY/s320/Pandemic_tower_of+_cards_2.jpg" alt="" id="BLOGGER_PHOTO_ID_5294363059746382290" border="0" /></a>In between some boxes of ancient computer peripherals I found a box of boxes of business cards - all up, about 1,000 Pandemic business cards. I think I used maybe 40 cards in my 3 years at Pandemic - why do we need so many? (For the internet rumour mongers amongst you, thats the real reason for Pandemic's troubles - too many business cards per person).<br /><br />Now, to the point of this blog: What am I to do with these cards? I'm sure there is something cool and geeky that could be done with them (I have one idea that probably won't work), so I'm looking for (polite) suggestions from the internets. Drop me a comment if you have a good idea - if its within my ability to execute it I will.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-54793458350504898202009-01-14T15:34:00.004+10:302009-01-15T07:56:08.713+10:30For those of you that are wondering...yes. I have been made redundant. But that's OK, I feel pretty good about it. <a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.concepttshirts.co.uk/geek/will-code-for-food.gif"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 190px; height: 190px;" src="http://www.concepttshirts.co.uk/geek/will-code-for-food.gif" alt="" border="0" /></a><br />Now I'm free to do something really cool...<br /><br />[Update: Just to clarify - I'm confirming my own redundancy here, nothing more or less. EA had made an announcement in December that headcount would be reduced. I'm just one of those heads.]Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-85062813230127434992008-12-11T21:44:00.003+10:302008-12-11T22:49:46.341+10:30Beyond the Critical SectionI've been working in a few different codebases in the last couple of years, all of which are supposedly 'Next Generation', and I've been consistently let down with the quality of parallel programming that I've seen. Most programmers seem to just want to slap a critical section around any and all code that has even the most remote chance of possibly being involved in some sort of thread-unsafe behaviour. Or even worse, inserting spinlocks instead.<br /><br />So, what I tried to do for my talk at GCAP was to provide a simple introduction to the parallel primitives - mutex, semaphore, barrier, et al. - and provide some vocabulary and foresight of some of the basic problems intrinsic to parallel programming. I also briefly raced over some of the patterns discussed in "<a href="http://www.amazon.com/Patterns-Parallel-Programming-Software/dp/0321228111">Patterns for Parallel Programming</a>" and then sped through some lock free programming examples (based on "<a href="http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916">The Art of Multiprocessor Programming</a>").<br /><br />When I say 'sped' and 'raced' I meant it - the presentation was 184 slides long and I had 45min to cover them all (it took me 46min - out of all the attendees I think only Yahtzee appreciated the magnitude of that effort (or, more likely, he'd fallen asleep in a previous lecture and had just woken up and thought that spontaneous applause would cover his faux pas)).<br /><br /><br /><div style="width:425px;text-align:left" id="__ss_793120"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/Tony.Albrecht/parallel-programming-beyond-the-critical-section-presentation?type=powerpoint" title="Parallel Programming: Beyond the Critical Section">Parallel Programming: Beyond the Critical Section</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=parallel-programming-1228978587101190-1&rel=0&stripped_title=parallel-programming-beyond-the-critical-section-presentation" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=parallel-programming-1228978587101190-1&rel=0&stripped_title=parallel-programming-beyond-the-critical-section-presentation" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object><div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View SlideShare <a style="text-decoration:underline;" href="http://www.slideshare.net/Tony.Albrecht/parallel-programming-beyond-the-critical-section-presentation?type=powerpoint" title="View Parallel Programming: Beyond the Critical Section on SlideShare">presentation</a> or <a style="text-decoration:underline;" href="http://www.slideshare.net/upload?type=powerpoint">Upload</a> your own. (tags: <a style="text-decoration:underline;" href="http://slideshare.net/tag/gcap">gcap</a> <a style="text-decoration:underline;" href="http://slideshare.net/tag/free">free</a>)</div></div><br /><br />My presentation is hopefully embedded above and you should be able to view it embedded there or you can head through to <a href="http://www.slideshare.net/Tony.Albrecht/parallel-programming-beyond-the-critical-section-presentation">Slide Share</a> and browse it there.<br /><br />Below are some interesting links that I stumbled across in my research for this talk. Hopefully there's something interesting for some of you in there;<br /><br /><ul><li><a href="http://parlab.eecs.berkeley.edu/pubs/EECS-2006-183.pdf">The Seven Dwarves of Parallel Computing</a></li><li><a href="http://www.top500.org/">Top 500</a> - the top 500 Super Computers.</li><li><a href="http://www.jpaulmorrison.com/fbp/index.shtml">Flow-Based Programming</a></li><li><a href="http://blip.tv/file/136679/">Design Patterns for Parallel Programming: Learning to Think Parallel</a></li><li><a href="http://www.infoq.com/presentations/erlang-software-for-a-concurrent-world">Erlang - software for a concurrent world</a></li><li><a href="http://aigamedev.com/architecture/hierarchical-logic-multi-threading">Hierarchical Logic and Multi-threaded Game AI</a></li><li><a href="http://www.boyet.com/Articles/LockfreeQueue.html">Lock-free Data Structures: The Queue</a></li><li><a href="http://www.valvesoftware.com/publications/2007/GDC2007_SourceMulticore.pdf">Source Multicore</a> (GDC 2007)</li><li><a href="http://beautifulpixels.blogspot.com/2008/08/multi-platform-multi-core-architecture.html">Multi-Platform Multi-Core Achitecture Comparison</a></li><li><a href="http://http//media.cs.uiuc.edu/Apresos/seminars/UPCRC/2008-09-19/UPCRC__2008-09-19_02-58-PM_files/flash_index.htm">Parallel Programming Patterns</a></li><li><a href="http://msdn.microsoft.com/en-us/magazine/cc872852.aspx">Design Considerations For Parallel Programming</a></li><li><span style="font-weight: bold;"></span><a href="http://www.gamasutra.com/visualcomputing/blog/2008/10/sponsored_video_robust_ncore_c.php">Robust N-Core Capable Game Engine Design</a></li><li><a href="http://www-users.cs.umn.edu/%7Ekarypis/parbook/">Introduction to Parallel Computing</a></li><li><a href="http://forvo.com/word/dijkstra/">How to pronounce dijkstra</a><br /></li><li><a href="httphttp://www.netrino.com/node/202">Mutexes and Semaphores Demystified</a></li><li><a href="http://www.microsoft.com/whdc/driver/kernel/locks.mspx">Locks, Deadlocks and Synchronization</a></li><li><a href="http://msdn.microsoft.com/en-au/magazine/cc817398.aspx">Solving 11 Likely Problems In Your Multithreaded Code</a></li><li><a href="http://blogs.intel.com/research/2007/08/what_makes_parallel_programmin.php">What Makes Parallel Programming Hard?</a> - read the comments</li></ul>I can thoroughly recommend the two books I mentioned earlier in this post, particularly The Art of Multiprocessor Programming. It has heaps of Java examples, and, as with everything in programming, you learn the most when you code it up yourself.<br /><a href="http://www.microsoft.com/whdc/driver/kernel/locks.mspx"><br /><br /><br /><br /></a>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-11305624734849479502008-11-24T09:32:00.005+10:302008-11-24T11:26:49.060+10:30GCAP recap<span><span>I wasn’t sure what to expect from this year's <a href="http://www.gameconnectap.com/index.html">GCAP</a>. Previous years had been declining in quality while the price of entry was generally climbing, so I thought that this year would be more of the same. This year was better than I expected, in some areas at least. The quality of the sessions that I attended was definitely up on previous years – I went to GDC in San Francisco earlier this year and the sessions I attended here would not have been out of place there.<br /><br />The opening keynote was interesting – <a href="http://www.animallogic.com/#Home">Animal Logic</a>’s Zareh Nalbandian’s support for the <a href="http://www.gdaa.com.au/">GDAA</a>’s push for tax incentives was good to hear. Kevin and Stuart of <a href="http://www.torusgames.com/">Torus </a>presented their highly sensible approach to cross platform engines – from a company that has released over 90 titles, theirs was an approach bred from a vast pool of experience. “10 ways to Loose your team” (good spelling guys) was a packed session with a lot of</span></span><span><span> audience participation – it showed that the Australian industry is finally growing up when it comes to team management. <a href="http://www.epicgames.com/">Epic</a>’s Jay Wilbur told us to buy his middleware and Evan Spytma from <a href="http://www.popcap.com/">PopCap </a>showed us how to make a game that will sell 75 million copies (as long as it’s Bejewelled).<br /><br />The parties this year were short and sweet - The Epic Welcome party on Thursday night was totally Epic - a storm swept in and gave us an awesome (if somewhat damp) view of the constant lightning and torrential downpour from the Hilton balcony. Friday night's awards dinner had, for the first time ever, a comedian who was actually funny. <a href="http://www.deblob.com/">De Blob</a> deservedly won many of the awards, but there were disappointingly few comp</span></span><span><span>etitors to challenge it. Drinks after the dinner progressed extremely well, so well that attending the closing of the conference the next morning proved far beyond my ability. I did make it to the Game On party that night, but that proved to be rather quiet and free booze was the last thing I wanted after the previuous night (Thanks to <a href="http://www.australiangamer.com/about_matt.html">Matt </a>for offering me a ticket to Game On).<br /><br />The most disappointing thing about this year’s conference for me was the lack of attendees. From my estimate there were only around 300 people there – quite a few students and a noticeable lack of actual game developers. The Australian game development industry seems to have lost faith in the GDAA’s conference. This is not just the senior and management staff – it’s all staff from junior through to seniors. Cost is an issue as always - $500 for 2 days of sessions, regardless of their quality, is a lot of money. </span></span><span><span>Not to mention the added expense of travel and accommodation.<br /></span></span><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIlikd1gYScTjZOsHMhayLxYZ2uHlBDNZNvhUy_0vKhHBx8klCjLRzD-93YI87BfJkOEZObaZFZjSf2Iya6vesFwJFUMaocj8GifWqkORhVsb-xwYi3P1DzkAyFX4PZidaYRnM/s1600-h/AGDC_bag.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 214px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIlikd1gYScTjZOsHMhayLxYZ2uHlBDNZNvhUy_0vKhHBx8klCjLRzD-93YI87BfJkOEZObaZFZjSf2Iya6vesFwJFUMaocj8GifWqkORhVsb-xwYi3P1DzkAyFX4PZidaYRnM/s320/AGDC_bag.jpg" alt="" id="BLOGGER_PHOTO_ID_5272020420829216658" border="0" /></a><br /><span><span>My advice? Drop the fluff – do we really need the bags? I know they are tradition, but so was the nerf battle and that was dropped years ago. I have all of the bags since ’99 and the only one I still use is the cloth bag from ’99 which I keep clothes pegs in. The free food is nice, sure, but most would be happy to be able to buy what they want from an on site café. What we need are some more big name speakers – many of the Aussie studios have OS counterparts. EA, Pandemic, Creative Assembly, 2K games, THQ and many of those studios are sure to have speakers that would jump at a chance to travel to and speak in Australia. There are also many ex-pats that have gone on to work with some top studios world wide and I have no doubt that a trip home for the cost of presenting a session is a good deal.<br /><br />So, in a nutshell, GCAP needs to encourage local attendees, and to do that it needs to be seen to be providing content that is worthy of the price of entry. Unfortunately, Aussies rarely believe that they can learn from their local peers so OS talent and celebrities need to be brought in to provide a justification to attend. And/or the price needs to drop. Would a leaner GCAP with more international weight succeed? Or have Australian developers given up on GCAP as a lost cause? Time will tell, but for the first time in years I can see a glimmer of quality shining from the GCAP sessions. Hopefully this will grow and we can have a local conference that is worth attending.<br /><br /></span></span>Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-61077744914292181012008-09-25T01:11:00.004+09:302008-09-25T06:31:47.642+09:30GCAP 2008Finally the GCAP '08 <a href="http://www.gameconnectap.com/pdf/2008/GCAP%2008_Program.pdf">program</a> has been announced and the good news is that the GCAP organising committee has listened to reason and speakers have free registration for the entire conference. Well done guys.<br /><br />In an <a href="http://seven-degrees-of-freedom.blogspot.com/2008/05/what-i-have-to-pay-to-speak.html">earlier blog entry</a> I offered some advice on how to make the conference more of a success. My recommendations were:<br /><ol><li><span style="font-weight: bold;">Big Names</span>: Well they have <a href="http://www.escapistmagazine.com/videos/view/zero-punctuation">Yahtzee</a>. That's good from a media and marketing point of view, but not really relevant as far as providing world class sessions (Although I'm keen to see him). <a href="http://www.popcap.com/">Popcap </a>will be there, cool, plus the usual veteran Aussie presenters (including myself) but no real big names.<br /></li><li><span style="font-weight: bold;">Call for abstracts early</span> : Early call was good, but it took a long time to get back to the submitters - nearly 6 weeks. I thought they'd forgotten about me.<br /></li><li><span style="font-weight: bold;">Don't make the speakers pay: </span>Big tick here.<br /></li><li><span style="font-weight: bold;">Keep the registration fee as low as possible</span>: Registration is $400 to $500. I'm not convinced that GCAP is offering that much value but it is within $200 at least. It'll be easier to judge after the conference.<br /></li><li><span style="font-weight: bold;">Throw a decent party</span>: No details there yet, just a 'Gamers Night'. We'll have to wait and see.<br /></li></ol>All in all, I think the GCAP committee are trying to do the right thing - I'd not like to be organising a conference like this. It's a vast amount of work. My only complaint is that there doesn't seem to be many sessions - I'd like to have seen another stream running rather than just the three they have now.<br /><br />Overall I think this year's conference has potential - there are a few sessions I'm keen to attend. My fear is that the registration costs will keep developers away - especially those that are not being sponsored by their employers.<br /><br />I look forward to meeting some of you there - I'll be at the bar.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-82416749614038003512008-08-28T21:37:00.002+09:302008-08-28T22:07:37.102+09:30CLI FTWThe command line interface can be a very powerful way to interact with a computer. Sure, it's not very intuitive to the great unwashed masses, but who cares - right? Microsoft, in its quest to provide an interface that even an idiot can use has created an interface that is perfectly suited for idiots. But now, it looks like things are coming back in a full circle.<br /><br /><a href="http://labs.mozilla.com/2008/08/introducing-ubiquity/">Ubiquity </a>is an awesome new interface from Mozilla Labs providing a simple yet powerful keyboard interface to do, well, lots of stuff. Check out the video...<br /><br /><center><br /><object width="400" height="298"> <param name="allowfullscreen" value="true"> <param name="allowscriptaccess" value="always"> <param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=1561578&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1"> <embed src="http://vimeo.com/moogaloop.swf?clip_id=1561578&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="298"></embed></object></center><br /><br />I thought that <a href="http://goosh.org/">goosh </a>was cool, but this easily eclipses that. And I love the fact that they provide you with details on how to build your own commands. I might even give it a shot myself...Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-31078090492008526962008-08-11T21:58:00.006+09:302008-08-18T06:32:50.065+09:30The ZoneThe Zone. If you are a programmer, a gamer, an athlete, musician, author, <a href="http://www.paulgraham.com/hp.html">artist </a>- any activity which requires a high level of focus and training then you know what it is. It's those dilated periods of time where your level of concentration is so intense, so pure, that you perform at your peak. It's almost an emotion, a controlled sense of elation. It's a direct connection between what you are and what you do. To think is to act...<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.sega-universe.de/img/articles/6157/Giga-Wing-2-jap--4.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 289px; height: 231px;" src="http://www.sega-universe.de/img/articles/6157/Giga-Wing-2-jap--4.jpg" alt="" border="0" /></a><br />A gamer playing Ikaruga, Mars Matrix, GigaWing - the screen an impenetrable mass of bullets, indecipherable to the uninitiated - feet twitching, eyes unblinking, mind and body performing a complex dance its been trained to do over many, many midnight sessions. An athlete sprinting, throwing, leaping - no random thought, pure focus, pure action. An author, words flowing from fingertips, worlds of imagination building, collapsing, evolving. The joy of such a state, of such a level of performance is what keeps the performer returning, training, improving, excelling.<br /><br />Such a state of mind is Nirvana for a coder. It's a state where right brain intuition seems to complement the left brain logic, providing leaps of logic, moments of brilliance. You can see not just the code, but the systems you work within - how they interact, where they work well, where they don't. An almost palpable web of logic - of cause and effect - with you as the spider at the centre, tweaking, modifying, deleting, rebuilding. Your focus is perfect, you can hold everything in your head.<br /><br />It takes time to get into The Zone. Time to reach that point where it all flows effortlessly. Music helps - it cushions the mind from distraction, but for the coder it's more than just that. It complements the art of coding, it balances the mind - logical left brain and musical right brain firing in unison together.<br /><br />Interruption, however, can spell disaster. Email alerts, phone calls, co-workers, life - they all break the flow of The Zone. So, how can you ensure you spend as much time in The Zone as possible?<br /><br />Discipline is one way - the strength of mind to focus completely, shutting out all outside temptations or irritations. But that's hard.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.funnyjunksite.com/wp-content/uploads/2007/08/programmer.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px;" src="http://www.funnyjunksite.com/wp-content/uploads/2007/08/programmer.jpg" alt="" border="0" /></a><br />Another way is to become completely anti-social - stop bathing, eat high sugar, high fat foods with heaps of preservatives that make you flatulent. Never answer your phone, grunt when addressed directly. If you have to talk to someone, never look them in the eye. Don't wear shoes and don't cut your toe or fingernails. People will soon leave you alone and you can transcend into the Zone in peace. Your codebase will love you for it. Your partner will leave you, but hey, they were just a distraction anyway, weren't they?<br /><br />The middle ground? Reduce your distractions and interruptions. Email, IM and the Interweb are all enemies of The Zone. Once you're in The Zone these distractions are irrelevant, but they can keep you from getting there. But, the biggest, fattest Zone killer? <a href="http://powerof2games.com/node/32">Long compile times</a>. The gap between hitting F5 and actually running your code is the time when you drift away from that sweet spot of concentration. Minimising that gap will not only keep you close to The Zone but will also improve your productivity and save your company (assuming you are working for money) a fortune.<br /><br />Consider 20 programmers with a compile/link time of 5 min and lets guess that they compile 5 times an hour, 8 hours a day. That's nearly 3 programmer days wasted <span style="font-style: italic;">per</span> day. You're better off permanently dedicating one programmer to improving build times - not only will the code/compile/debug cycle be quicker, you'll keep your programmers in The Zone for longer.<br /><br />Another option is <a href="http://unittest-cpp.sourceforge.net/">unit testing</a> - not only is it a great way of ensuring that your code is working but it also provides a great place to rapidly implement and test new features in isolation.<br /><br />So, I ask you, do you hit The Zone at work or only at Home? And, what <a href="http://www.di.fm/mp3/goapsy.pls">music</a> best transports you there?Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-88154117727525844262008-06-21T20:29:00.008+09:302008-12-10T02:30:18.962+10:30A Parallel Future<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.top500.org/files/imagecache/gallery/files/systems/Roadrunner_1207.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px;" src="http://www.top500.org/files/imagecache/gallery/files/systems/Roadrunner_1207.jpg" alt="" border="0" /></a><br />I was recently doing some research on parallel programming and stumbled across <a href="http://www.top500.org/">Top500</a>, a site which catalogs (twice a year) the top 500 super computers in the world. As of June 18 the fastest computer in the world is the IBM cluster, "<a href="http://www.top500.org/system/9485">Roadrunner</a>" which hit 1.026 petaflop/s. That's 1,026,000,000,000,000 floating operations per second, or over a million gigaflops. To put that into <a href="http://www.tgdaily.com/content/view/37621/128/">context</a> the PS3 is rated as having 218 GFLOPS (I'm not counting the GPU here), X360 hits 115 GFLOPS and the PS2 stretches to almost 6.2GFLOPS while the old Xbox hit 7.3GFLOPS (including the GPU).<br /><br />Now, this Roadrunner sounds like one hefty piece of machinery. It cost about US$100million and consists of 278 refrigerator sized racks occupying about 480 square meters of floor space, 10,000 connections requiring over 88km of optic cable and weighs over 226tonnes. For its processors its using 6,562 dual-core AMD Opteron chips as well as 12,240 Cell chips all feeding off of 98 terabytes of memory and is running Red Hat Linux.<br /><br />Yup, you heard right, this baby is running off of what is effectively 12,240 souped up PS3s.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.top500.org/files/imagecache/gallery/files/systems/Roadrunner%20supercomputer.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px;" src="http://www.top500.org/files/imagecache/gallery/files/systems/Roadrunner%20supercomputer.jpg" alt="" border="0" /></a><br />The Roadrunner is also incredibly energy efficient, using only 2.35 megawatts or 437 million calculations per watt (compare this to my PS3 devkit which I use to <a href="http://www.choice.com.au/viewArticle.aspx?id=106346&catId=100245&tid=100008&p=5&title=Computers%27+energy+costs">heat my office</a>).<br /><br />So, what are they using it for? Personally, I'd be playing 1,026,000 games of Super Stardust HD at the same time. But no, not those guys. They're using it (primarily) to watch the US's nuclear weapons stockpile (plus a little bit of astronomy, energy, human genome, and climate change research).<br /><br />Am I the only one that is a little concerned that you need a million gigaflop machine to watch a big pile of nuclear weapons?<br /><br />This info also got me thinking, "How powerful will the next generation of consoles be?". For that, we need to look at some graphs...<br /><br /><img class="image preview" src="http://www.top500.org/static/lists/2008/06/perfdevel/Performance_Development.png" alt="" title="static/lists/2008/06/perfdevel/Performance_Development.png" /><br /><br />This is a logarithmic graph of the processing power of the worlds top super computers over the last 15 years. Notice how scarily linear it is? We're looking at a tenfold increase in flops every 3.5 years (approximately). So, from this graph, the PS3 is faster than the worlds fastest super computer back in 1995 - the Fujitsu VPP-500.<br /><br />So, do consoles exhibit the same type of logarithmic trend in power? Lets have a look;<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTK5AdPDIY1PJXPXnMcZF-W6g8rFzJRxqnWspsDUdMLBjZD-_ZX5iRfPJfn46b4_-Xu8yFm-prdSbSh9cZqiL6mIgsdpy4eSffMgiPyLhY9RNsmlgOq8x8SX9GoA8KeZ-okGOM/s1600-h/Console_mflops.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTK5AdPDIY1PJXPXnMcZF-W6g8rFzJRxqnWspsDUdMLBjZD-_ZX5iRfPJfn46b4_-Xu8yFm-prdSbSh9cZqiL6mIgsdpy4eSffMgiPyLhY9RNsmlgOq8x8SX9GoA8KeZ-okGOM/s400/Console_mflops.jpg" alt="" id="BLOGGER_PHOTO_ID_5214327974425642322" border="0" /></a>Now, extrapolating (madly), I've added in the next Xbox and Playstation. Looking at previous releases dates and performance I've put the next Xbox being released in 2010 with approximately 2 Teraflops (2,000 GFLOPS) and the PS4 at 2012 with 10 Teraflops. I readily admit that those last two numbers and dates have been pulled out of my arse, but they'll do for a guesstimate. So, lets average those last two numbers and dates and get a new console being released in 2011 by the Sony/Nintendo/Microsoft conglomerate with about 6 teraflops (yes, I know FLOPS are not a great indication of performance but they'll do for this exercise).<br /><br />How many processors would we need to produce the 6 TFLOPS we're talking about here? If we look at the Roadrunner, we can see that each tri-blade runs at up to 400GFLOPS, so thats as good a number as any to use. That gives us 6,000/400 = 15 tri-blades (lets make it 16 becuase programmers like powers of 2). Each tri-blade is, you guessed it, 3 cells and each cell is 8 SPUs. That gives us 16*3*8 = 384 current day processors! Crazy I know. Lets assume some improvement via Moore's Law over the next 5 years so we can halve that 384 twice to get 96 processors.<br /><br />So, where does that leave my predictions for the next generation of consoles? I'm picking between 64 and 128 processors pumping out around 6TFlops (about 380 <a href="http://seven-degrees-of-freedom.blogspot.com/2007/11/tony-units.html">Tony Units</a>). And for memory (just guessing here) I'd say we're looking at a between 4 and 8GB (factor of 16 change in memory for each generation). And of course content will be delivered digitally.<br /><br />So, programmers, if you want to be useful in the next generation of games development, you'll need to be fluent in the language of parallel programming. That's one prediction I can guarantee.<br /><br /><br /><br /><img src="file:///C:/DOCUME%7E1/TALBRE%7E1/LOCALS%7E1/Temp/moz-screenshot-1.jpg" alt="" /><img src="file:///C:/DOCUME%7E1/TALBRE%7E1/LOCALS%7E1/Temp/moz-screenshot-2.jpg" alt="" /><img src="file:///C:/DOCUME%7E1/TALBRE%7E1/LOCALS%7E1/Temp/moz-screenshot-3.jpg" alt="" />Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-81885179702176163542008-05-20T19:08:00.008+09:302008-05-23T12:24:45.464+09:30What? I have to PAY to speak?<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.gameconnectap.com.au/pix/logos/logo_GCAP2K8.gif"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px;" src="http://www.gameconnectap.com.au/pix/logos/logo_GCAP2K8.gif" alt="" border="0" /></a><br />It's coming up to that time of year again - <a href="http://www.gameconnectap.com.au/index.html">GCAP'08</a> is bearing down upon us. This will be the tenth annual game developers conference in Australia - AGDC for many of those years, and now, the inaccurately named GCAP (I'm not counting the other gatherings like FreePlay).<br /><br />This year they seem to be a little more organised, putting out a <a href="http://www.gameconnectap.com.au/speak.html">call for abstracts</a> now - due by July 08, well before the November conference starts. This is a good thing. But, this year, in order to encourage talent to speak at this conference they have stated the following<br /><br /><blockquote>Potential presenters should note that all costs to attend the Conference must be met from their own resources. As a commitment to attend and support the Conference, accepting presenters are required to pay a submission fee for each paper which is then subtracted from their registration fee. All speakers must be registered for the full Conference program.</blockquote>So, if I want to speak at this conference, I have to pay full conference registration <span style="font-weight: bold;">and</span> I have to pay up front part of that registration for the privilege. Now, I understand that this is to prevent people from dropping out after they've committed to speaking, but is this really the right way to do it? <span style="font-style: italic;">Threaten</span> us?<br /><br />If you want good speakers, provide good benefits. Make people <span style="font-style: italic;">want</span> to donate a considerable amount of their own time in order to present a good session. Yes, it means that you won't get registration fees from those 20 or so speakers, but if so few people are registering that you <span style="font-style: italic;">really need </span>the fees from those people that actually make the show worth attending, then I think you've got greater problems than just financial ones.<br /><br />So, how can you make GCAP worth attending?<br /><ol><li><span style="font-weight: bold;">Big Names</span>: Announce that some World Class Talent is speaking. Make a buzz. Do the ground work early - if <a href="http://www.go3.com.au/">GO3 </a>can do it why can't GCAP/GDAA?<br /></li><li><span style="font-weight: bold;">Call for abstracts early</span> (they got that one right this time) and make sure that the people that choose the sessions know what the sessions are actually about. Use the talent in the industry to help choose - I'm sure that there are a few senior/lead artists, designers, programmers and managers that would donate a small amount of time to select the most interesting sessions. To make sure that those seniors are interested in helping, make sure that Number 1 is out there. Also, release the titles of the sessions and their abstracts as early as possible.<br /></li><li><span style="font-weight: bold;">Don't make the speakers pay</span>. It's stupid. You don't need to make it harder to get good sessions - you need to make it easier. Offer incentives - booze, strippers, or transport around Brisbane in a rickshaw pulled by Chuck Norris.<br /></li><li><span style="font-weight: bold;">Keep the registration fee as low as possible</span>. The first AGDC in 1999 was about $300 and it slowly climbed after that to nearly $1000. Then GCAP took over and promised to lower the registration fee - which it did. And it has been slowly climbing since then. Now I know it's expensive to run a conference, and a lot of hard work, but if you have some Big Names and the fees aren't too bad then you'll get a lot more attendees. Offer studio deals for sending large numbers of people from a single studio. I'm sure Krome and Pandemic would jump at the deal. When you factor in the cost of transportation from, say Melbourne to Brisbane, plus accommodation for 3 nights, plus registration you're looking at around $900 per person before the registration and 2 days off work is even considered (although there is a bridge nearby that you can sleep under if you want to save on accommodation and you can earn easy money in the Valley if you're not picky).<br /></li><li><span style="font-weight: bold;">Throw a decent party</span>. The first AGDC threw a great dinner party (that didn't cost the ridiculous amounts that the current ones do) and each year there a great party in one form or another. The party last year in Melbourne was crap - a bad venue with hardly any drinks and no where to talk. One of the things that is best about GCAP is the social and career networking, and the best social lubricant I know is alcohol. Make sure that the speakers are there (and want to be there) and provide a chance for people at all levels of experience to mix. Make it memorable - Sony's rent a date at the GCAP before last was memorable, the one with the mechanical surf board and sumo suits was memorable. But don't try and make developers dance - that way lies disaster.<br /></li></ol>All of the above will help rebuild the credibility of the GCAP conference, something which it is lacking. It needs to be a collection of great talent - a place for locals to strut their stuff and to learn from the best in the world. If just one GCAP turns out to be even "really good" it'll make it easier to get more people attending future conferences.<br /><br />One of the key issues is that the games developers themselves doesn't see much value in attending GCAP. And for the most part they are correct. The above points will help to convince the developers that GCAP is a conference worth attending, and once the local games industry starts taking it seriously and is willing to contribute, both parties will reap the benefits.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-2406355218576917942008-05-06T21:49:00.004+09:302008-05-07T20:14:28.824+09:30Multithreaded Programming is Magic<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.vansciverbobbinlace.com/MagicThreads.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 200px;" src="http://www.vansciverbobbinlace.com/MagicThreads.jpg" alt="" border="0" /></a><br /><br />I've been playing around a bit with programming with multiple threads recently. I've been hearing a lot of moaning and groaning from programmers about the problems with this type of programming - the horror of deadlocks, the tragedy of race conditions and the impossibility of debugging. I'm quite enjoying it.<br /><br />I think the main reason that people avoid multithreading is that you can't just hack it to make it work. You actually have to understand what you are doing. You have to be able to step through multiple threads that are running at the same time, potentially modifying the same variables, at slightly different times. You need to be able to 'see' the potential problems inherent with multiple threads running the same code at the same time, to be able to step through the critical sections either in your mind or on paper and try every possible combination of statements executed in every possible order. You can see why programmers avoid it as much as possible.<br /><br />I had the good fortune to stumble across this little gem: <a href="http://greenteapress.com/semaphores/">The Little Book of Semaphores</a>. It is a treasure trove of the fundamentals of multithreading synchronisation as well as full of problems, hints and solutions to ensure that you completely understand the constructs your are playing with. It was while doing one of these problems (although they feel like puzzles) that I realised something; it felt like playing Magic The Gathering.<br /><br />I used to play a bit of Magic the Gathering way back in the day. I came in at about the the 3rd revision and spent far too much money buying far too many little bits of cardboard with pictures on them. What I loved about the game was sitting there planning your next move and the almost infinite possibilities that lay before you. You'd have a series of options, a number of cards on the table, and hopefully, a strategy. The cards had strict rules as to how they could be played - what their effects were and the order that they could be played in. Your task was not only to attack your opponent but to be able to defend yourself - you had to anticipate reactions to any action you could take as well as the reaction to the reaction - and there were always so many ways you could screw or be screwed by your opponent. And when you introduced even more opponents it got exponentially harder. It used to give me splitting headaches - just like multithreaded programming.<br /><br />But for some reason I enjoy it - a chance to stretch the old grey matter I suppose.<br /><br />Anyway, the Little Book of Semaphores (which isn't that little, weighing in at 279 pages) is quite useful. It has a collection of basic synchronisation patterns - simple constructs that you can use to help you solve more complex problems. It should be compulsory reading for any programmer working on the current generation of consoles (and PCs I suppose). To make the most of the system you are working on you need to understand how to use it to the full extent of its capabilities. You can program a multiprocessor system with using much in the way of multithreading, but you are hamstringing yourself in the same way as if you program in C++ using only procedural C constructs and ignoring OO.<br /><br />Which brings me to my next point. The main problem I see with parallel programming is the lack of solid patterns. Contrary to Christer's blog entry "<a href="http://realtimecollisiondetection.net/blog/?p=44">Design Patterns are from Hell!</a>", patterns can be damn useful when used appropriately as tools to help you to solve complex problems. Sure, patterns used without fully comprehending them can result in over engineered, confusing, inappropriate, inefficient solutions. Like any tool, they have to be used correctly. If you don't understand the problem you are going to botch the solution.<br /><br />I think that programming on multiprocessor machines requires a new paradigm (or paradigms) - we need a better way to think about the way we solve problems in parallel. Progressing from C to C++ we were given better ways to provide encapsulation, abstraction, and object oriented programming in general. We learnt to think in terms of objects and their interactions with each other which aided solving problems and building large complex systems. We need the same for parallel systems.<br /><br />Is it a new (or <a href="http://www.haskell.org/">old</a>) language? New (or <a href="http://www.jpaulmorrison.com/fbp/index.shtml">old</a>) patterns? Regardless, when thinking about solving problems on multithreaded systems, we need to be able to think and solve in a fashion that suits that system. I don't know what that fashion is, but I do know that now that as we are all forced to use multiprocessor systems, we'll soon discover this new paradigm, these new patterns and modes of thought. And once we have a few of these tools widely understood and used, subsequent patterns and tools will evolve and make our lives a little easier when programming the next generation of 128 core consoles.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-7932379388315406902008-04-23T09:50:00.003+09:302008-04-23T10:00:38.687+09:30Resisting the Next GenerationIts been awhile since my last blog entry - I've been far too busy to do much other than work and sleep (or not sleep and help look after the kids). So, in lieu of writing something myself, you can now read <a href="http://www.escapistmagazine.com/articles/view/issues/issue_146/4837-Resisting-the-Next-Generation">something written</a> by <a href="http://graffitigamer.com">someone else</a> about stuff that I've said.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-65767119688801013602008-03-11T22:50:00.004+10:302008-12-10T02:30:19.146+10:30The Breeding EdgeHas anyone else noticed the changing landscape of the game development studio over the last 10 years? A developer's monitor used to be solely reserved as a repository for plastic models of obscure Japanese anime and video game characters and now those freakishly geekish icons are being interspersed or even replaced with baby photos. Yes, you heard right, game developers are not only having sex, they are actually breeding.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE_mkce0bvugg5Jp8TPsL-3JOVu7ixCOBru3gbYQbFiXRiIUvM1xuLZpnOFBPzhCPzOv-WqQGIAitoSlcFUfRUYgMRg5RKbBGSASxroZYgJvXxpiZVUuGtMlWSeaaF9Rwc_-vB/s1600-h/NewBornBaby.JPG"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE_mkce0bvugg5Jp8TPsL-3JOVu7ixCOBru3gbYQbFiXRiIUvM1xuLZpnOFBPzhCPzOv-WqQGIAitoSlcFUfRUYgMRg5RKbBGSASxroZYgJvXxpiZVUuGtMlWSeaaF9Rwc_-vB/s320/NewBornBaby.JPG" alt="" id="BLOGGER_PHOTO_ID_5176470408642715154" border="0" /></a><br />This is a good thing even though the thought may repulse you.<br /><br />It indicates that developers are growing up and, more importantly, that game studios are becoming more secure and responsible. Game developer Breeders have either been in the industry for a number of years and feel secure enough to procreate or have migrated into the industry with rugrats already in tow. These Breeders have years of experience behind them and are often in senior positions - the studios value them and to ensure that they will stay with the studio, will offer security, support and realistic working hours. Yes, <span style="font-style: italic;">realistic working hours</span>. Think about it - Breeders have a Significant Other that runs their life and if they are not home to help to help feed and delouse little v2.0 at an appropriate time then we have an unhappy SO. An unhappy SO means an unhappy game developer. An unhappy developer doesn't work well. So, to keep these experienced developers the studio has to offer sensible working hours (and pay) and to provide sensible working hours you need decent scheduling and planning.<br /><br />So, Breeders in the studio force the studio to grow up. There's a symmetry there that is appealing.<br /><br />But its not all one way - hiring a Breeder has its perks for a studio too. If you can entice a Breeder into your studio, especially if it involves a change of residence, the breeder is likely to stay for longer as the pain of moving an entire family is unpleasant to say the least. Trust me, I've done it twice in the last two years. Breeders are going to be as sure as they can that your studio is one that they will settle at for years to come. Breeders are generally older and are more likely to have more experience in their field. They are also good at dealing with very young children and so are ideal for dealing with designers.<br /><br />Breeders attract Breeders. Where there is one, more will come - they are a sign of a maturing studio. If there are no Breeders in your studio, ask yourself why.<br /><br />Is your studio breeding Breeders?Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-36734384552549427392008-02-28T09:18:00.004+10:302008-03-12T21:59:04.893+10:30Some great game development blogs<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.bruceongames.com/wp-content/uploads/2008/02/game-development-essentials.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px;" src="http://www.bruceongames.com/wp-content/uploads/2008/02/game-development-essentials.jpg" alt="" border="0" /></a><br /><br /><br />If you've just come from <a href="http://www.bruceongames.com/">Bruce </a>or <a href="http://www.jakeworld.org/JakeWorld/main.php?left=leftframeN/blogs.php&main=blog/BlogDisplay.php">Jake</a>'s site and are wondering why you're looking at the exact same blog entry - sorry. Just skip down and read some of my recent stuff. Otherwise, you really should check out some of the blogs mentioned below. As Bruce says<br /><blockquote></blockquote><blockquote style="font-family: arial;">These are the guys who actually make the games that everyone plays, so they know what they are talking about. And when they analyse a game they do so with an authority no magazine could match. These guys are the complete opposite of the fanboy, they are intelligent, informed and incisive.</blockquote>I'll reproduce the links here and add a few more that I read regularly at the bottom<br /><ul><li><a target="_blank" href="http://mainlyaboutgames.blogspot.com/">Mainly About Games</a>. Informative and well written it has a nice personal feel to it.</li><li><a target="_blank" href="http://www.dopass.com/">Dopass.com.</a> Short entries not just about gaming. Funny at times.</li><li><a target="_blank" href="http://apaththroughpossibility.blogspot.com/">A path through possibility</a>. Irregular updating but well worth a read for some incisive commentary.</li><li><a target="_blank" href="http://japanmanship.blogspot.com/">Japanmanship</a>. An incredibly good read of a Western developer’s life in Japan.</li><li><a target="_blank" href="http://www.magicalwasteland.com/">Magical Wasteland</a>. Refreshingly irreverant.</li><li><a target="_blank" href="http://www.dreamdawn.com/sh/">Survival Horror.</a> Does what it says on the tin.</li><li><a target="_blank" href="http://www.gamedev.net/">Gamedev.net</a>. A big and serious site with a lot of good content.</li><li><a target="_blank" href="http://randomencounters.vox.com/">Random Encounters in Imaginary Realms</a>. Just cherry picks the good stuff.</li><li><a target="_blank" href="http://www.cheeky.gr/">Cheeky</a>. Sparse and interesting development diary.</li><li><a target="_blank" href="http://aaiiee.wordpress.com/">Peter Mackay’s projects and development diary</a>. Quake on Gamecube.</li><li><a target="_blank" href="http://www.lifeintherain.com/">Life In The Rain.</a> Often long interesting personal articles.</li><li><a target="_blank" href="http://www.t-machine.org/">T=Machine</a>. Wide ranging blog with much that is happening at the sharp end online.</li><li><a target="_blank" href="http://blackcompanystudios.co.uk/blog/">Black Company Studios</a>. Semi diary semi event driven articles. Nice.</li><li><a target="_blank" href="http://msinilo.pl/blog/">.mischief.mayhem.soap.</a> A serious game developer’s blog.</li><li><a target="_blank" href="http://blog.jakeworld.org/JakeWorld/main.php?left=leftframeN/blogs.php&main=blog/BlogDisplay.php">JakeWorld Blog.</a> The life of a game developer.</li><li><a target="_blank" href="http://www.gamefeil.com/blog.html">Gamefeil.</a> Games, comics, diary.</li><li><a target="_blank" href="http://scientificninja.com/">Scientific Ninja</a>. Technical stuff here.</li><li><a target="_blank" href="http://www.devbump.com/">Devbump</a>. Aggregation of gaming articles.</li><li><a target="_blank" href="http://nimblebit.blogspot.com/">Nimblebit</a>. Game development diary. Lots of technical stuff.</li><li><a target="_blank" href="http://beznesstime.blogspot.com/">It’s Bezness time</a>. Bedroom developer diary.</li><li><a target="_blank" href="http://www.lai.as/">I love it, I feel like Sisyphus.</a> On start-ups, game development and programming.</li></ul><br />Others that I personally enjoy reading are;<br /><br /><ul><li><a href="http://passfieldgames.blogspot.com/">Game Musings</a>. An Aussie veteran game designer working on his own casual games as well as in the 'hardcore' industry.</li><li><a href="http://powerof2games.com/">Power of Two Games</a>. Noel Llopis and Charles Nicholson - two vets working on their own thing.</li><li><a href="http://steve-yegge.blogspot.com/">Stevey's Blog Rants</a>. Googler with a lot to say (and some of it is actually worth listening to)</li><li><a href="http://herbsutter.spaces.live.com/blog/">Sutter's Mill</a>. Lots of tech posts on parallel programming in particular.</li><li><a href="http://www.eelpi.gotdns.org/blog.wiki.html">TomF's Tech Blog</a>. Great stuff when he actually posts.</li></ul><br />Do you have any that you'd recommend? Leave a comment and let me know.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0tag:blogger.com,1999:blog-36348397.post-58920071420334639862008-02-27T23:03:00.006+10:302008-02-28T07:28:08.500+10:30/* Oops */I've been having problems with the comments on this blog ever since I changed templates. Changing back doesn't seem to help but I've managed to hack the HTML enough to allow comments again and am trying this new "<a href="http://www.intensedebate.com/">Intense Debate</a>" comment thingy. Unfortunately (or fortunately) I've lost half of my old comments but I'll stick with it and see how we go.<br /><br />Please leave a comment and we'll see what happens...<br /><br />Update: I also borked my counters, so if you visited in the last 12 hours, please come back and click where you clicked before. Thanks.Anonymoushttp://www.blogger.com/profile/02511842341497944754noreply@blogger.com0