Post by Patrick McFarland Post by Neil Bradley
Considering that the context of the original message was an in-memory
surface vs. a video memory surface, by comparison video memory is MUCH
slower to write to that main system memory.
Then try to bullshit why hwsurfaces are faster than swsurfaces? hmm?
Good grief. This is very simple:
Writing data to video memory is slower than writing data to system memory.
Copying data between video memory is much faster than writing data to video
There are four approaches to take:
1. Composite your scene in system memory, and upload the final image to
2. Composite your scene by writing pixels directly to video memory.
3. Upload your images to video memory and composite your scene using the
video hardware to manipulate the images you uploaded.
4. A combination of the above.
SDL lets you do all four of those, but guess which one is the slowest?
That's right, number 2, writing individual pixels over the bus. Guess
which option most people try to make their programs go faster? That's
right, option number 2, because video memory is faster than system
memory, right? Right, but only if you don't have to read or write to it. :)
So, why is option 1 faster than option 2? Because block copies of data
to video memory is considerably faster than individual pixel accesses.
Not only can you take full advantage of the width of the bus data path,
but most hardware can queue up DMA block transfers that execute in parallel
with the main CPU.
Option 3 is the fastest, and the one that the next version of the SDL API
is going to be optimized for. Why is this fastest? Because you can get
all of your data on the video card before you need it, and compositing
your scene is a simple matter of sending a few commands to the video
hardware, waiting (or not) for the hardware to finish, then sending a
command to display the finished product. This is the codepath that 3D
hardware is designed to be most efficient at, and by having a rich
command set, able to make some truly amazing visual effects.
So, unless you know exactly what you're doing and know exactly what
hardware platforms and driver configurations you are targeting, the
most efficient way to use SDL is to set a video mode using whatever
the optimal video depth is available and with no hardware surfaces.
This will set up SDL to do all blitting in system memory to a single
back buffer and then copy the contents of this buffer with no conversion
to the screen when you call SDL_UpdateRects(). This means that you
need to be able to handle any of 8 bpp, 15 bpp, 16 bpp, and 32 bpp.
The easiest way to handle this is just to call SDL_DisplayFormat() on
your artwork to get it to the current display format so blits don't
need to do any conversion. Note that if the display hardware is at
8 bpp, you may want to dither to a specific palette yourself since
SDL's color conversion routines are designed for speed, and do not
do any dithering.
One of the problems with this approach is that you have no control over
when the scene displays, with respect to the refresh rate. I'll let you
in on a dirty little secret. There isn't anything you can do about this
unless you're running in fullscreen mode. However, if you know what you
are doing, and are running in fullscreen mode, you can request that SDL
give you a page flipped display surface in video memory, by passing the
(SDL_FULLSCREEN|SDL_DOUBLEBUF) flags to SDL when you set the video mode.
If you successfully set these flags, you get two video memory buffers
that are alternately displayed when you call SDL_Flip(). Where possible
this flip is synchronized with the vertical blank, to avoid tearing.
Now as soon as you do this, you're in video memory land and need to
create as many surfaces in video memory as possible. Conveniently,
SDL_DisplayFormat() will put your surfaces in video memory if the display
surface is also in video memory. This will make blits between the hardware
surface and the screen very fast. HOWEVER since no 2D blitters support
alpha blending in hardware, this means that alpha blending will be really
really slow: read a pixel from the source surface, read a pixel from the
destination surface, perform the blend in system memory, write the pixel
back out to video memory. Reads from video memory are even slower than
writes, so you'll get terrible performance if you do this.
So, to sum up: Stick to software surfaces unless you really know what
you're doing; they're supported on every platform and they're fairly fast.
If you really know what you're doing, you can get page-flipped video memory
on some hardware/driver combinations, but you'll need to be able to fall
back to a software back buffer in the cases where you can't get directly
to video memory. If you're not changing the entire screen every frame,
and you're using a software display surface, try using SDL_UpdateRects()
to only update the portions of the screen which have changed.
Use SDL_DisplayFormat() and SDL_DisplayFormatAlpha() whenever possible.
If you're doing alpha blending, always use a 3D API or software memory
for 2D work. If you are using software memory and have alpha channels or
colorkeys in your images, use SDL_RLEACCEL - it speeds up blits immensely
by encoding the operations needed to get your image on the screen without
having to do expensive pixel-by-pixel checks at blit time.
Finally, if you know you're only going to run on 3D hardware, and want to
do lots of fancy visual effects, consider using OpenGL instead of 2D blits.
SDL does provide an API for setting up an OpenGL context and swapping the
video buffers, and the input handling doesn't change at all. You can even
convert SDL surfaces to textures and display them using OpenGL commands.
Example code for this is provided in the testgl.c file in the SDL source
Whew, I should write this up and stick it on the website - it's really a FAQ.
Questions are welcome, and I'll be able to answer them next week.
-Sam Lantinga, Software Engineer, Blizzard Entertainment