Monday, May 23, 2011

Abusing texel blur to smooth out DXT5 textures used by user interfaces

If the game has the memory for it, I generally prefer to use uncompressed textures for 2D UI elements because a compressed texture format such as DXT5 will almost certainly degrade the image quality.

An important part of making a good looking UI is supporting pixel-accurate rendering - this is especially important on smaller screens such as mobile devices where space is limited and small fonts & controls are common. If the UI is drawn a half-texel off or anything other than 1:1 pixel:texel, there will be blur that will make the UI appear less crisp. In some situations it may not be noticeable, but in some situations it's completely unacceptable; for instance small 8 point fonts that would have been clear become a blurry mess when the one-pixel-wide parts of a character texel lie on a screen pixel boundary. Even worse is when you have small pixels moving at sub pixel distances; there is simply no avoiding terrible strobing artifacts in this case as characters alternate between sometimes clear and mostly blurry while they move (I was reminded of this recently when working on the credit roll for Star Ninja). Not having pixel accurate rendering for UI is pretty much the same as telling the artist that no matter what they make, they have to put a blur filter on it before they save to make sure it is usable in sub pixel positions which of course would be ridiculous - but that's how it will look much of the time if you don't have pixel accurate rendering. I suspect this is why many games have UI elements that are bigger than they really need to be - it's just easier to ignore pixel accuracy and compensate by making everything big. Sometimes however, when UI elements are small, there is no avoiding the need for pixel accurate rendering.

Of course, when pixel accurate rendering is used the UI shows exactly what the textures contain, pixel for pixel. Uncompressed textures rendered pixel accurately look great, and it seems to usually be the best solution if your game has the memory for it. What if your UI elements are so numerous that it's out of the question to save them all as uncompressed textures? Or perhaps other factors are putting pressure on your overall memory use and you have to squeeze everywhere possible? It can & does happen. So, just compress it and be done with it, why not? The problem is, pixel accurate drawing of compressed textures only serves to exaggerate the artifacts caused by the compression. Sometimes these artifacts can cause the resulting texture to fail to meet quality requirements.

The good news is that the final result actually tends to benefit if compressed textures are drawn a half pixel off because the filtering that occurs for free will smooth out the artifacts caused by compression. It still won't look as good as uncompressed textures, but it will often be an improvement over pixel accurate rendering. By maintaining a consistent half-pixel offset in rendering these elements, you continue to benefit from the filtering and also eliminate strobing if the UI moves.

Here's an example using the level picker button in Star Ninja. Click for a zoomed in view of the button. The uncompressed and DXT5 versions are the direct output of the content pipeline. I had to make the bilinear filter version in photoshop due to time constraints but it does reasonably match the results I was seeing in the game engine when I was originally doing these tests.

The uncompressed obviously looks the best. The DXT5 compressed version looks the worst because of how the gradients don't respond well to the compression, particularly on the top and bottom edge. The last version is the same DXT5 texture shifted 0.5 pixels which has the effect of blurring out the compression artifacts.

Another option for crisp UI with DXT5 compression is to make sure the artists carefully inspect the compressed results of their work and modify the texture until the artifacts are not a problem. This sounds reasonable, and can sometimes be depending on the artist and toolchain, but if the assets are going to be procedurally packed into a sprite sheet (like this tool does) then it may be difficult to guarantee the artifacts for one sprite sheet layout will be the same as another. This is because the DXT5 compression is based on a compressing 4x4 pixel blocks; if the sprite sheet is built where the elements can shift within the 4x4 block the artifacts will be different. This page has a good explanation of how the compression works.

While uncompressed textures are generally preferred for user interfaces, the use of compressed textures is sometimes necessary but will result in compression artifacts. By rendering these elements pixel accurately but offset one half pixel in x & y, a consistently smoothed version of the texture will be rendered which will mask some of the compression artifacts and can improve the final image quality.

Tuesday, May 17, 2011

The insidious nature of unmanaged resource leaks in XNA games.

While working on Star Ninja's screen transition system last week, I discovered a memory leak problem that was ultimately determined to be the result of a simple oversight - The instanced model system wasn't disposing the vertex buffers and index buffers it had created. These are created at runtime to prepare large arrays that are a series of duplicates of the master instance data with the bone indices and index indices set up to do the SkinnedEffect instancing technique as described in one of the official XNA samples. Finding that out was a lot more time consuming than I'd have liked. The only reason I noticed it was because of another bug that had sized the vertex buffer allocation incorrectly and when I fed a larger mesh into the system, memory problems started to show up.

While I do most development and testing on a PC, mainly for the time saving benefit of "edit and continue" which isn't available on WP7 or XBox, I do make a point of running and doing a few quick tests after every significant task to make sure everything is still working well. Recently, I added a conditional compilation symbol "STRESS_TEST" which allows me to just run the game and it will churn through all the levels doing pretty random stuff. The tester is a great way for me to monitor for peak memory usage; the code will periodically print out the peak memory retrieved from Microsoft.Phone.Info.DeviceExtendedProperties. Since certification requires the game to stay under 90MB, this is a pretty important thing to stay on top of.

After the last batch of changes where one small part included changing which assets were being fed into the instanced object renderer, I saw the memory usage spike unexpectedly and almost immediately over 100MB and more as time went on. So I ran the game using a CLR memory profiler (YourKit for .NET) which to my surprise wasn't telling me anything useful - object counts were pretty similar and managed heap was about the same. This suggested to me that the memory usage being reported by WP7 is process memory and not just the managed heap. While interesting to know this, it doesn't help in finding what was sucking all the extra memory.

Because the asset change was just one change of what was probably a few too many, I didn't immediately recognize that as being related to the problem. Because my usual reference-leak-finding technique weren't working, I found myself disabling large swaths of code to narrow down the source of the unmanaged memory leak. After a few hours of divide & conquer, I eventually I found the problem was due to these vertex & index buffers not being disposed. The code was simply not releasing these unmanaged resources and neither the CLR or the XNA API had any way to know that something needed to be cleaned up. With this failure now understood, I reviewed all the code for anything that might need a call to Dispose(), called it at the appropriate time, and the unmanaged memory leaks disappeared.

In the end, a lot of time was wasted due to not paying enough attention to objects implementing the IDispose interface. Sufficiently chastised by this, I made a point of reviewing the Dispose Pattern in case it would help avoid this in the future. I had glossed over it before but had avoided using it because C#'s limitation that classes can only derive from one other class made me leery of introducing that kind of limitation to the general codebase without a good reason. Most of the time, interfaces provide the desired results with only a little more work and without the single-derivation limitation and so using interfaces rather than derivation had been my typical approach. This memory leak situation provided the motivation required to start using the dispose pattern in the hopes that future errors could be avoided.

To my surprise, I soon found there isn't a standard Disposable base class in .NET. Why not? Perhaps because it's so simple that people just write them whenever needed? Who knows. Simpler classes seem to be provided on a regular basis, but that's how it is.

So here's one you may find slightly more useful than the basic Disposal pattern.

/// <summary>
 /// A generic implementation of the Dispose pattern, useful for classes that need IDisposable, don't need to 
 /// derive from something else, and are used as a base class for other classes.
 /// </summary>
 public class Disposable : IDisposable
  /// <summary>
  /// Set to true as soon as Dispose is called and before the
  /// call to Dispose(true) is made, which means this bool is 
  /// only useful to code outside the scope of the disposal process.
  /// </summary>
  public bool IsDisposed { getprivate set; }
  public void Dispose()
       throw new Exception();
   IsDisposed = true;
  protected virtual void Dispose(bool disposing)

This has one feature beyond the standard Dispose pattern - a bool that is set when it is disposed, which can be a useful debug build assertion in code that uses the object, particularly when an object is being bounced around among various systems and detecting a disposed object can prevent more mysterious exceptions at a lower level. Also, I couldn't think of a situation where it would be useful to call Dispose on an object twice, so the check for IsDisposed in Dispose() will help find situations where this somehow happens by accident.

The bool IsDisposed could perhaps be made visible to #DEBUG builds, and the check converted to a Debug.Assert(), but I prefer to have these failures detected at the earliest time in all configurations in order to prevent other, possibly more subtle, bugs from occurring. While it's true that an object that has had Dispose() called upon it can technically continue to be used, I prefer to consider Disposed objects to be "off limits" where I expect them to be inert and ready for garbage collection so I often check the IsDisposed property at entry points to large systems just to make sure an object is still valid.

After reviewing the code for all IDispose interfaces and converting all suitable classes to derive from the new Disposable base, I found the code in general was a little more organized (especially class hierarchies where multiple classes in the hierarchy implemented IDispose) and generally more robust in that I knew the class finalizers would be taking care of any disposals that, for whatever reason, weren't explicitly triggered.

While the Disposal class is all well and good, the main thing to remember is to always make sure to pay attention to the objects you create and if they implement IDisposable, make sure it's getting called at some point because unlike most of C# there isn't any magic code that will clean it up for you. The Disposable class doesn't eliminate the need to actually write the code to dispose the objects you are responsible for, but it can make your library code a little more robust when you provide a Disposable as a return value because this way the finalizer will at least be sure to eventually dispose the object in case the caller doesn't.

Thursday, May 12, 2011

Why XBox Live Gamer Tags should be accessible to all Windows Phone games

Let me start by first saying, I fully appreciate the need and reasons behind having the XBox Live branding on the Windows Phone be a marketplace tool to help identify AAA phone games to the customers. It does a lot for the platform to have some part of the catalog include titles that have (generally) higher production values, QA and all the rest that comes with the responsibilities of an XBL contract. Limiting access to leaderboards, friends, avatars and other XBox Live features is, for the most part, a reasonable thing that allows Microsoft to ensure the quality of games using these features meets their standards and gives a little more value to the developers who make the effort to be part of XBox Live.

There is one feature however that really, really, needs to be made available to all Windows Phone game developers: the gamer tag. Providing access to this will allow developers who who have any kind of online component to their games have some kind of reasonable expectation that we won't be exposing our gamers to profane or otherwise unacceptable names. It also makes the task of porting games between XBLIG and WP7 that much easier, because XBLIG games have access to the gamertag (despite not being XBox Live products).

Right now, smaller developers have no choice but to accept user generated names, feed them through banned word filters and implement some kind of reactionary cleanup system for when bad names do sneak through. Unless we make users do a registration process or access unique phone IDs (which has its own set of issues), we are unable to uniquely identify users. This is really not doing anyone any good, and it makes all non-XBox Live WP7 multiplayer games suffer as a result. The obvious problems are multiplayer experiences that have parental control issues. An even more serious problem for the developer is that we have to spend an inordinate amount of time dealing with the fact that there is an API that could give us a safe player name, but simply chooses not to so we wind up spending a lot of time on something that could otherwise be used to make a higher quality game.

I can only hope that the right people at Microsoft will change their mind about letting WP7 apps access gamertags because it would improve the quality of multiplayer WP7 games for the entire platform. This is, presumably, a simple switch that could be done in time for Mango. I'm not holding my breath for this change but it would be great if it happened.

Tuesday, May 10, 2011

Adding ImageMagick to Star Ninja's XNA content pipeline

This week, I've been working on Star Ninja's high score system. The underlying UI system is the same one created for Atomic Sound and Moonlander, which uses a custom content pipeline that does a lot of things to prepare data for the UI system. One of the features is to process fonts and incorporate them into a sprite sheet, saving various bits of metadata required to render these fonts later. Since our content pipeline does a lot of different things well beyond the scope of this post, I'm going to limit the example code to the parts related to integrating ImageMagick into the bitmap font generator tool found on the XNA App Hub.

Back to the problem at hand.

During development of the high score screen, I found myself looking at this (which is populated with random data for now):

Not bad, I was thinking to myself, fairly pleased to be able to make something I didn't consider terrible. Something wasn't great though, which is the actual high score table font. Plain white and boring, it really needed something to stand out a bit more because it seemed to blend with the background too much. I wanted a better looking font, one that had shadows and perhaps other features. So I thought about it for a bit, realizing what a hassle it would be to create color fonts in Photoshop and making the existing font rendering pipeline use that data instead of the existing font rendering technique. Doable, but not ideal. Time is short and that seemed like a terribly cumbersome process especially when I considered the inevitable "can you make the shadow a little bigger" or other change requests that might come in. Custom images per character is an entirely unacceptable way to solve the problem - error prone and difficult to track font metadata (spacing, mainly). I don't mind creating one-off assets, but if there's any real chance of having to iterate then a system to automate the process is often justified.

So I looked into a couple things, the first being Photoshop Scripting. Rejected this because Photoshop scripting is more of a UI automation and not a background process suitable for a content building script. The second one I looked at was ImageMagick, which turned out to not only be pretty cool, but well suited for this task. It's an image processor that is typically used as a console command in batch files, but it includes an OLE component which allows me to use it slightly more easily within the content pipeline. It wouldn't have been much trouble to start up a batch job and use the console command, but the OLE component makes it all a bit cleaner. I couldn't find any examples of using C# with ImageMagick's OLE component, so it seemed like a good idea to write up a little bit about how it can be used within the context of XNA content processing.

The font processor we use is similar to what is found in bitmap font generator found here. To gain access to ImageMagick, just add a reference to the ImageMagick OLE object to the bitmap font generator project (or your content pipeline).

Around line 190 of MainForm.cs in the XNA bitmap font generator, you will see the bitmap that is generated by rasterizing a character from a font.

                        // Rasterize each character in turn,
                        // and add it to the output list.
                        for (char ch = (char)minChar; ch < maxChar; ch++)
                            Bitmap bitmap = RasterizeCharacter(ch);

That's where we hook in. To do something useful with ImageMagick, you will need to set it up and pass it a command line. Because the OLE component takes the arguments as an array of objects, each object being a string with each argument, I wrote a helper function to split a standard string into this array:

object[] GetImageMagickArgs(string args)
   if (args == null || args.Length == 0)
    return null;
   string[] args1 = args.Split(' ');
   object[] result = new object[args1.Length+2];
   for (int i = 0; i < args1.Length; i++)
    result[i+1] = (object)args1[i];
   return result;

You may notice how the string[] is copied to the object[]. This is because the ImageMagic API requires the type of the array to be exactly an array of objects and string[] doesn't match so it will throw an exception if you don't do this.

I get the ImageMagicArgs using a custom parameter to our content pipeline, so you'll need to find a suitable way to get the arguments to your content pipeline or to the bitmap font generator if you are using that. Once the bitmap and arguments are ready, this function can be called to process the character:

private Bitmap ProcessCharacter(object[] args, char ch, Bitmap bitmap)
   if (args == null || args.Length == 0)
    return bitmap;
   int chInt = (int)ch;
   var src = "c:\\temp\\char-" + chInt.ToString() + ".png";
   var dest = "c:\\temp\\char-" + chInt.ToString() + "-output.png";
   bitmap.Save(src, ImageFormat.Png);
   var m = new ImageMagickObject.MagickImage();

   args[0] = src;
   args[args.Length - 1] = dest;
   var r = m.Convert(args);
   var bitmap2 = Bitmap.FromFile(dest);
   return (Bitmap)bitmap2;

Finally, add the processing of the character to the point mentioned above:

var imageMagicArgs = GetImageMagickArgs(ImageMagickString);
                        for (char ch = (char)minChar; ch < maxChar; ch++)
                            Bitmap bitmap = RasterizeCharacter(ch);
bitmap = ProcessCharacter(imageMagicArgs, ch, bitmap);

The end result as you probably can see is that each character is rasterized as normal, then fed to ImageMagick as a temporary file and then later re-read back into the bitmap.

This technique is not without shortcomings. Ideally, I would have used the ImageMagickObject.Stream method to feed the data in directly without the use of a temporary file. However, the docs for this are sorely lacking and it wasn't worth the time to figure out - the temporary files blast through so fast I really don't see the need to spend more time on that. For whatever reason, ImageMagick was keeping a write lock on each file until the component was finalized so I had to create a different file for each character (there is no Dispose method to control this, unfortunately). The biggest shortcoming of this however is that the processed bitmap is the same size as the input bitmap which means processing that spills over the edge will leave the rasterized/processed font with a visible edge. Ideally I would resize the bitmap to contain any effects that might be created for the font and then crop it and adjust font/spritemap metadata accordingly when it was done. I may do this at some later time, but for my specific needs today, this works. I just needed a small effect, something that fits within the existing bitmap.

By feeding in the ImageMagick string "-alpha on ( +clone -channel A -blur 0x1.5 -level 0,50% +channel +level-colors black ) compose Over +swap", feeding the fonts through the pipeline and then running the game, I was rewarded with this image:

Much better!

Here's a closeup of the letter 'A' with and without processing:

The text is much more clear against the background and we now have a system which can be used for any font in any of our games moving forward. As a huge bonus, it's pretty much automated and we can tweak settings and regenerate the textures without dealing with individual letter image files. With a little more work to support larger output bitmaps, more dramatic effects could be used but this is a good solid step in the right direction.

In other news, Star Ninja is going to have local & global high scores tracked across four different game modes! :)

Tuesday, May 3, 2011

Using PIX to help figure out graphics glitches in Star Ninja

As Star Ninja rapidly approaches completion, I've been working on a lot of game polish tasks. I really want this game to make a great first impression and to that goal I've been working on streamlining the UI and making transitions between screens look nice. 

Recently, there was a problem with the screen transition logic that rendered a cross fade between the level selection and the gameplay screen over the course of a second or so. For a while, I didn't really think too much about what was essentially a 1-frame screen flicker but once I noticed it I knew it had to be fixed. 

Single frame render glitches are always hard to deal with unless you have the right tools & process. The first goal is to identify what is really going on. Second, reproduce it reliably and quickly; without that, a lot of time can be wasted. At this point you can iterate with the debugger and tools to take a close look at what is usually a problem with a lot of moving parts. That's where PIX comes in. 

To give an idea what I was looking at, here is the level picker menu, the glitched screen and a frame not long after the transition was done *Note: the art and level is not final, this is a game in development after all!
The menu screen cross fades with the game, but at the last frame of the crossfade it was doing this:

Here's the frame right after it:

Pretty glitchy there in the middle, but since it's only for one frame it's almost impossible to detect it as more than just a flicker. PIX can record an XNA application's stream of graphics device calls, giving you the ability to analyze every last call made to DirectX. This is an enormously useful tool for diagnosing problems such as these.

Because this is happening for only a single frame, I chose to record the stream rather than fumble with breakpoints which is always a hassle when dealing with UI and timing related problems. To do this, the PIX experiment needs to be set up as such:
Note that you have to create and configure each trigger, there isn't a magic "setup stream recording" button. No big deal once you know what to do though. I find I need to check "Disable D3DX analysis" on the Target Program tab when using PIX with XNA apps, it doesn't work without that for me. Might be my system configuration, or maybe an XNA compatibility issue (who knows).

So, once set up, click Start Experiment to run the game. Press your key to start and stop the stream recording to capture the problem. Exit the application, wait a moment, and PIX will pop up a new window like this:

From here, you can "scrub" the video to any frame and drill down into any frame to inspect the sequence of DirectX events. After a bit of digging around in the data, I found that the cause of the problem could be seen by selecting the "Render" tab, find the correct frame of the stream that showed the problem and then select the Depth channel in the "Channel(s):" combo box. This is what I saw:

Clearly, the menu was writing the depth buffer and the game screen wasn't able to draw correctly because of this. Keep in mind this was happening during the transition, where the code is actually drawing both screens to make the fade effect work. This is a new situation for the game because prior to the transitions being added, all screens were rendered without concern for how they might interact with other screens. 

I didn't want to render into a render targets and then back to full screen quads because that would be too slow. The code is already doing a render target for the gameplay screen as it fades in; this problem happened on the very last frame as the gameplay screen was rendering directly to the back buffer like it does when it's not part of a transition. What was happening is the gameplay screen was affected by the previous frame depth buffer results, but only for the one frame. To save an extra bit of time here and there, the game doesn't do a full screen clear at the beginning of each frame unless it is known to be necessary. Some people have told me that the screen clears are so fast that I shouldn't bother, it's too early to optimize, and I should just clear it whenever its convenient, but the fact is the PIX logging shows the Clear operation takes enough time that it's worth it to me to avoid doing when possible or useful. During transitions, the phone is already using a lot of GPU because of the render target usage so this is a good time to avoid wasteful operations. Optimizing early is sometimes a bad thing, but if I know at the outset that between various options one is faster than another and not too much trouble to do I'll always go for the faster option because it tends to make the entire application more robust in the long run with fewer architectural performance problems to go back and wish I had done right in the first place.

To fix this, I simply added this line of code between the two screens to reset the depth buffer during the transition, causing the screen draw order to determine visibility:

GraphicsDevice.Clear(ClearOptions.DepthBuffer, Color.Black, 1, 0);

In the end, it was a simple bug that was easy to fix. Without PIX, I would have been left recording video and analyzing it frame by frame and guessing what was wrong. Fortunately I was able to use PIX to quickly identify and solve the problem.