Discussion:
Feasibility/correctness of calling GL in another thread
(too old to reply)
godlike
2014-01-14 20:05:10 UTC
Permalink
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:

Code:
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}



In the rendering thread:

Code:
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}



The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?

------------------------
Panagiotis Christopoulos Charitos
AnKi 3D Engine (http://www.anki3d.org/)
Stefanos A.
2014-01-16 08:25:28 UTC
Permalink
This should work, provided your GPU drivers can do context sharing without
going belly up. (This includes first-gen Atoms with PowerVR IGPs and some
Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.

That said, why do you need two OpenGL contexts?
Post by godlike
Hi all,
In the game engine that I am working on, I am designing a rendering threadthat essentially executes all OpenGL calls (including SDL_GL_SwapWindow)
instead of the main thread. The problem is that I am not quite sure if the
scenario that I have in mind is something safe or it will lead to undefined
behavior.
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}
The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in
another thread and that I am doing some SDL stuff in the main thread and
others in the rendering thread (polling events).
What do are your thoughts? Will this work or not?
------------------------------
Panagiotis Christopoulos Charitos
AnKi 3D Engine <http://www.anki3d.org/>
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Jonas Kulla
2014-01-16 10:08:32 UTC
Permalink
Post by godlike
Hi all,
In the game engine that I am working on, I am designing a rendering threadthat essentially executes all OpenGL calls (including SDL_GL_SwapWindow)
instead of the main thread. The problem is that I am not quite sure if the
scenario that I have in mind is something safe or it will lead to undefined
behavior.
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}
The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in
another thread and that I am doing some SDL stuff in the main thread and
others in the rendering thread (polling events).
What do are your thoughts? Will this work or not?
I'm doing almost exactly the same thing as you described in my engine: do
polling/processing of
SDL events and setting state of the window in the main thread, and doing
the rendering in another
dedicated thread. The only difference is that I create the window in the
main thread, pass that
pointer into the rendering thread, and create the GL context there (I also
use only one thread).

Haven't had any problems with this setup on Mac/Linux (Windows untested,
but should be fine).
Jonas Kulla
2014-01-16 10:09:47 UTC
Permalink
Post by Jonas Kulla
I'm doing almost exactly the same thing as you described in my engine: do
polling/processing of
SDL events and setting state of the window in the main thread, and doing
the rendering in another
dedicated thread. The only difference is that I create the window in the
main thread, pass that
pointer into the rendering thread, and create the GL context there (I also
use only one thread).
Haven't had any problems with this setup on Mac/Linux (Windows untested,
but should be fine).
Whoops, meant to say "I also only use one GL context".
slimshader
2014-01-16 14:29:31 UTC
Permalink
This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)
MonoGame does the exact same thing and it appears to be working fine.
That said, why do you need two OpenGL contexts?
Post by godlike
Hi all,
In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
    poll_input_and_joystick_events_using_SDL()
    do_other_things()
}
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
    execute_GL_calls()
    SDL_GL_SwapWindow(window);
}
The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).
What do are your thoughts? Will this work or not?
Panagiotis Christopoulos Charitos
AnKi 3D Engine (http://www.anki3d.org/)
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org (http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org)
I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled background uploding (and 2nd ctx) in the end on iOS, didn't have enoych time to investigate
Forest Hale
2014-01-16 19:22:15 UTC
Permalink
Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that's some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.
This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)
MonoGame does the exact same thing and it appears to be working fine.
That said, why do you need two OpenGL contexts?
2014/1/14 godlike <>
Hi all,
In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am
not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(...);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}
The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).
What do are your thoughts? Will this work or not?
Panagiotis Christopoulos Charitos
AnKi 3D Engine <http://www.anki3d.org/>
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled
background uploding (and 2nd ctx) in the end on iOS, didn't have enoych time to investigate
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier
slimshader
2014-01-21 09:31:30 UTC
Permalink
Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?

That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.
Post by Forest Hale
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.
So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.
The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that's some lovely ping-pong there.
While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.
I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.
Forest Hale
2014-01-21 10:23:35 UTC
Permalink
The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.

Hence don't use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still be pretty smooth because you're still running all your file I/O and other heavy operations on the other thread.
Post by slimshader
Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?
That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with
textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don't use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.
So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.
The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I'm sure this isn't free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that's some lovely ping-pong there.
While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn't gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.
I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier
slimshader
2014-01-21 11:35:28 UTC
Permalink
Post by Forest Hale
The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.
Hence don't use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still be pretty smooth because you're still running all your file I/O and other heavy operations on the other thread.
What you are saying is really scary. You mean that even after I loaded a level using 2nd shared ctx and it is not used anymore, mere fact that it exists causes main thread context to go through some kind of locking mechanism? What if I then destroy 2nd ctx? Does the lock go too?

Is it specific to a driver or a platform? I deal with Win and iOS. Is it specific to GL version used? I am still limiting myself to GL 1.0 as there is too much driver issues on Win with anything above and I am doing 2D games.
Jared Maddox
2014-01-21 18:24:32 UTC
Permalink
Date: Tue, 21 Jan 2014 11:35:28 +0000
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Content-Type: text/plain; charset="iso-8859-1"
Post by Forest Hale
The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.
Hence don't use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you're still running all your file I/O and other
heavy operations on the other thread.
What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?
There is simply no way for the driver to know that you won't be using
that context if it's still around, so how can it do otherwise?
Graphics card vendors don't normally sell programs intended to
optimize your NON-graphics code, and knowing that you won't be using
the context again basically falls into the same category of things as
that.
What if I then destroy 2nd ctx? Does the lock go too?
That will depend on the driver. Thus, you should assume "No".
Is it specific to a driver or a platform?
I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who's involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.
I deal with Win and iOS. Is it
specific to GL version used?
It's possible that it could happen in DirectX as well. I don't know if
they have any "lockless" APIs, but even if they do it doesn't mean
that everyone implements it without locking.
Forest Hale
2014-01-21 22:25:08 UTC
Permalink
For Direct3D the HAL always locks (like OpenGL's shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice
compared to OpenGL (which goes "full throttle" in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one
additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!).

Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered
exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series
and AMD FirePro series) but I do not have data on those.

I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded
usage, I don't know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly
broken; I'm not placing bets.
Post by Jared Maddox
Date: Tue, 21 Jan 2014 11:35:28 +0000
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Content-Type: text/plain; charset="iso-8859-1"
Post by Forest Hale
The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.
Hence don't use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you're still running all your file I/O and other
heavy operations on the other thread.
What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?
There is simply no way for the driver to know that you won't be using
that context if it's still around, so how can it do otherwise?
Graphics card vendors don't normally sell programs intended to
optimize your NON-graphics code, and knowing that you won't be using
the context again basically falls into the same category of things as
that.
What if I then destroy 2nd ctx? Does the lock go too?
That will depend on the driver. Thus, you should assume "No".
Is it specific to a driver or a platform?
I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who's involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.
I deal with Win and iOS. Is it
specific to GL version used?
It's possible that it could happen in DirectX as well. I don't know if
they have any "lockless" APIs, but even if they do it doesn't mean
that everyone implements it without locking.
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier
slimshader
2014-01-22 10:17:34 UTC
Permalink
Great stuff guys, thanks. I removed my 2nd ctx and now all do all texture operations on the main thread (2nd thread pushes texture data to queue and waits for main thread for upload). It was actually (surprisingly) easy to do with std::promise/future.

Level loads bit faster now and I have additional benefit of things working the same way on Win and iOS. Good to know that I should do the same for D3D implementation.
Post by Forest Hale
For Direct3D the HAL always locks (like OpenGL's shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice
compared to OpenGL (which goes "full throttle" in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one
additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!).
Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered
exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series
and AMD FirePro series) but I do not have data on those.
I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded
usage, I don't know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly
broken; I'm not placing bets.
Post by Jared Maddox
Date: Tue, 21 Jan 2014 11:35:28 +0000
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Content-Type: text/plain; charset="iso-8859-1"
Post by Forest Hale
The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.
Hence don't use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won't care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you're still running all your file I/O and other
heavy operations on the other thread.
What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?
There is simply no way for the driver to know that you won't be using
that context if it's still around, so how can it do otherwise?
Graphics card vendors don't normally sell programs intended to
optimize your NON-graphics code, and knowing that you won't be using
the context again basically falls into the same category of things as
that.
What if I then destroy 2nd ctx? Does the lock go too?
That will depend on the driver. Thus, you should assume "No".
Is it specific to a driver or a platform?
I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who's involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.
I deal with Win and iOS. Is it
specific to GL version used?
It's possible that it could happen in DirectX as well. I don't know if
they have any "lockless" APIs, but even if they do it doesn't mean
that everyone implements it without locking.
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
--
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Nathaniel J Fries
2014-01-22 17:52:32 UTC
Permalink
You can do this with drivers that support context sharing, sure. But it would make for simpler code and be more portable to do the opposite: render in the main thread, process events in a secondary thread.

SDL_PumpEvents will still need to be called from the main thread for most OSes. But except for a couple of user-initiated loops on Windows, this should have no effect on framerate (I've benchmarked the equivalent code of SDL_PumpEvents and it usually takes about 5microsec to run on a Pentium 4 - 60fps requires loop time < 16ms, or 3000x that)

------------------------
Nate Fries
Nathaniel J Fries
2014-01-23 05:29:27 UTC
Permalink
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as nothi
ng more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).

------------------------
Nate Fries
Jonathan Greig
2014-01-23 06:33:25 UTC
Permalink
Having a tail pointer in a singly linked list is always a good idea when
optimizing for performance. It makes all items appended to the end of the
list or removed from the end of the list faster because it takes constant
time O(1) and if you are accessing the last element frequently, that's
icing on the cake :)
Post by Nathaniel J Fries
You should also question whether you need a second thread at all. In the
times that processor frequencies averaged in the lower megahertz range, it
made sense to do non-graphcal processing on another thread which was
capable of simulating concurrency with the graphical thread (gaming systems
were single-pprocessor back then). But that was the '90s, and this is the
2010s - processor frequencies on gaming rigs can be as much as 20x higher
than in the nineties, and while multi-core processors have made the use of
threads even less costly, they have done nothing to alleviate the design
issues associated with it or the limitations in graphics drivers.
Which is not to say that it makes no sense to have another thread
depending on your needs. But unless this engine you're developing is
strictly in-house, needs are for the programmer using the engine to decide,
and not the engine itself - the aim of the engine merely ought to be to
provide an easier means to meeting such needs.
If you aren't doing this to leverage multicore execution (which would most
likely be a premature optimization; the root of all programming evils), but
for concurrency, there are also better options. You might consider a task
queue (which carries the benefit that it can easily be made multi-threaded
if the programmer using the engine does find need to leverage multicore
execution, without at all necessitating it; it also carries lower execution
overhead than context switching, which is necessary for multithreading on a
uniprocessor or overburdened multiprocessor system; and may even wind up
costing less memory [once you consider all the locks, thread-local
variables, and the memory for the context itself]. It's also extremely
simple to implement - in C or C++, it can be implemented as nothing more
than a singly-linked-list of function pointers [having a tail pointer may
make things even simpler and faster]).
------------------------------
Nate Fries
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
slimshader
2014-01-23 10:13:54 UTC
Permalink
I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you'd consider 2010s "gaming rigs". That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.

I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don't want them having locks all the time.

BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?.
Post by Nathaniel J Fries
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.
Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.
If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as not
hing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).
slimshader
2014-01-23 10:18:57 UTC
Permalink
I might be missing something here but how do you even implement a list without tail pointer? You always keep at least one end otherwise it would be inaccessible. In any case, node-based lists suck :P
Post by Nathaniel J Fries
You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.
Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you're developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.
If you aren't doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It's also extremely simple to implement - in C or C++, it can be implemented as n
othing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).
Post by Nathaniel J Fries
Nate Fries
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org (http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org)
Stefanos A.
2014-01-23 11:02:36 UTC
Permalink
Post by slimshader
I hear you, but I am in the niche of older machines, in fact I am having
trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in
fact I think it is about time I implement D3D renderer)), also mobile
devices are not what you'd consider 2010s "gaming rigs". That being said,
secondary thread is (as I said before) not used to speed-up level loading
but rather to keep main (event processing, rendering) thread responsive.
I do use task queues (double-buffered) but they are per-thread. Since
cross-thread tasks are very seldom I don't want them having locks all the
time.
BTW. Since we are on threading / GL topic: do you guys render from main
thread? What are your update vs render step strategies? If you do them on
separate threads how do you sync later (condition vars seem obvious
choice)?.
After trying several threading strategies, my current preference is to keep
rendering and window management to the main thread, but handle input on a
secondary thread. So far, this has proven the best method to maintain
responsiveness without impacting compatibility.

Regarding D3D... I prefer to use ANGLE to get OpenGL ES 2.0 on systems
without proper OpenGL support. This way, I only need to maintain two
renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows &
~(Nvidia | AMD)).

This way, I can also use shaders across the board. ANGLE works all the way
down to GMA 950 (and probably GMA 500/Poulsbo, although I haven't tested
that), so there's very little reason to use the fixed-function pipeline.
Microsoft recently announced they will be working with Google to port ANGLE
on WinPhones and Metro, so D3D will be strictly unnecessary going forward -
as an indie developer, this suits me perfectly.
Post by slimshader
You should also question whether you need a second thread at all. In the
times that processor frequencies averaged in the lower megahertz range, it
made sense to do non-graphcal processing on another thread which was
capable of simulating concurrency with the graphical thread (gaming systems
were single-pprocessor back then). But that was the '90s, and this is the
2010s - processor frequencies on gaming rigs can be as much as 20x higher
than in the nineties, and while multi-core processors have made the use of
threads even less costly, they have done nothing to alleviate the design
issues associated with it or the limitations in graphics drivers.
Which is not to say that it makes no sense to have another thread
depending on your needs. But unless this engine you're developing is
strictly in-house, needs are for the programmer using the engine to decide,
and not the engine itself - the aim of the engine merely ought to be to
provide an easier means to meeting such needs.
If you aren't doing this to leverage multicore execution (which would most
likely be a premature optimization; the root of all programming evils), but
for concurrency, there are also better options. You might consider a task
queue (which carries the benefit that it can easily be made multi-threaded
if the programmer using the engine does find need to leverage multicore
execution, without at all necessitating it; it also carries lower execution
overhead than context switching, which is necessary for multithreading on a
uniprocessor or overburdened multiprocessor system; and may even wind up
costing less memory [once you consider all the locks, thread-local
variables, and the memory for the context itself]. It's also extremely
simple to implement - in C or C++, it can be implemented as nothing more
than a singly-linked-list of function pointers [having a tail pointer may
make things even simpler and faster]).
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
slimshader
2014-01-23 12:46:46 UTC
Permalink
After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.
But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?
Regarding D3D... I prefer to use ANGLE to get OpenGL ES 2.0 on systems without proper OpenGL support. This way, I only need to maintain two renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows & ~(Nvidia | AMD)).
This way, I can also use shaders across the board. ANGLE works all the way down to GMA 950 (and probably GMA 500/Poulsbo, although I haven't tested that), so there's very little reason to use the fixed-function pipeline. Microsoft recently announced they will be working with Google to port ANGLE on WinPhones and Metro, so D3D will be strictly unnecessary going forward - as an indie developer, this suits me perfectly.
 
I had ANGLE on my radar but now you really got me interesting in this. I am only really interested in 2 platforms: Win and iOS, this means I would only need to maintain GL ES renderer. That would be great. Definitely going to look into it.
Stefanos A.
2014-01-23 14:27:08 UTC
Permalink
Post by Stefanos A.
After trying several threading strategies, my current preference is to
keep rendering and window management to the main thread, but handle input
on a secondary thread. So far, this has proven the best method to maintain
responsiveness without impacting compatibility.
But how that does help? If main thread is blocked, then you don't refresh
your screen to show the impact of processed events. In my experience event
handling is tiny fraction of a frame. Do you mean that with 2nd thread
event handling you avoid "busy" system cursor / window appearing to hang?
The point is not to improve performance but to minimize latency between the
user pressing a button and the world reacting to that button press.

If you handle input in your rendering thread, then any dip in the framerate
will increase input latency, which can be jarring (esp. on slower systems
that cannot maintain a stable framerate.) By spawning a separate thread for
input, the OS scheduler will "smoothen out" input latency even when your
framerate dips below 10 fps.

Of course, this only helps if your world update rate is decoupled from your
framerate. In my case, I will skip up to 12 frames in order to guarantee a
pseudo-fixed update rate. In other words, I prioritize world updates (60
updates/sec no matter what) and only render frames as a best-effort.

This way, if the player presses the "fire" trigger then she will shoot the
enemy immediately even if she is running at 5 fps.

If the input was handled in the same thread, then the "fire" button would
take up to 200ms to register - or it would be skipped completely, if the
player lifted her finger before the 200ms mark. This would place the player
at a severe disadvantage (hi, Diablo 3!)
slimshader
2014-01-23 17:31:25 UTC
Permalink
After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.
But how that does help? If main thread is blocked, then you don't refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid "busy" system cursor / window appearing to hang?
The point is not to improve performance but to minimize latency between the user pressing a button and the world reacting to that button press.
If you handle input in your rendering thread, then any dip in the framerate will increase input latency, which can be jarring (esp. on slower systems that cannot maintain a stable framerate.) By spawning a separate thread for input, the OS scheduler will "smoothen out" input latency even when your framerate dips below 10 fps.
Of course, this only helps if your world update rate is decoupled from your framerate. In my case, I will skip up to 12 frames in order to guarantee a pseudo-fixed update rate. In other words, I prioritize world updates (60 updates/sec no matter what) and only render frames as a best-effort.
This way, if the player presses the "fire" trigger then she will shoot the enemy immediately even if she is running at 5 fps.
If the input was handled in the same thread, then the "fire" button would take up to 200ms to register - or it would be skipped completely, if the player lifted her finger before the 200ms mark. This would place the player at a severe disadvantage (hi, Diablo 3!)
Clean now :)

A question: I just tried to build minimal GL ES2 app under Win but I am getting unresolved externals for glClear, glClearColor and 2 more in sdl_main function. So it clearly wants to use full GL. I assume you use SDL2 with ANGLE?
Stefanos A.
2014-01-23 17:46:51 UTC
Permalink
I am using ANGLE with and without SDL2, but I'm using C#/OpenTK which loads
both libraries dynamically - so I cannot really help you on these errors,
sorry. ("Dynamically" in this case means using LoadLibrary +
GetProcAddress("eglGetProcAddress") and then using eglGetProcAddress to
load the rest of the entry points.)

IIRC, I had to compile SDL2 from hg in order to get ANGLE working, but it
was otherwise straightforward.
2014/1/23 slimshader <>
After trying several threading strategies, my current preference is to
keep rendering and window management to the main thread, but handle input
on a secondary thread. So far, this has proven the best method to maintain
responsiveness without impacting compatibility.
But how that does help? If main thread is blocked, then you don't refresh
your screen to show the impact of processed events. In my experience event
handling is tiny fraction of a frame. Do you mean that with 2nd thread
event handling you avoid "busy" system cursor / window appearing to hang?
The point is not to improve performance but to minimize latency between
the user pressing a button and the world reacting to that button press.
If you handle input in your rendering thread, then any dip in the
framerate will increase input latency, which can be jarring (esp. on slower
systems that cannot maintain a stable framerate.) By spawning a separate
thread for input, the OS scheduler will "smoothen out" input latency even
when your framerate dips below 10 fps.
Of course, this only helps if your world update rate is decoupled from
your framerate. In my case, I will skip up to 12 frames in order to
guarantee a pseudo-fixed update rate. In other words, I prioritize world
updates (60 updates/sec no matter what) and only render frames as a
best-effort.
This way, if the player presses the "fire" trigger then she will shoot the
enemy immediately even if she is running at 5 fps.
If the input was handled in the same thread, then the "fire" button would
take up to 200ms to register - or it would be skipped completely, if the
player lifted her finger before the 200ms mark. This would place the player
at a severe disadvantage (hi, Diablo 3!)
Clean now [image: Smile]
A question: I just tried to build minimal GL ES2 app under Win but I am
getting unresolved externals for glClear, glClearColor and 2 more in
sdl_main function. So it clearly wants to use full GL. I assume you use
SDL2 with ANGLE?
_______________________________________________
SDL mailing list
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
Jonathan Greig
2014-01-23 19:32:02 UTC
Permalink
Post by slimshader
I might be missing something here but how do you even implement a list
without tail pointer? You always keep at least one end otherwise it would
be inaccessible. In any case, node-based lists suck [image: Razz]
slimshader,
While I normally don't reference Wikipedia for programming matters, look at
the tradeoff section ( http://en.wikipedia.org/wiki/Linked_list#Tradeoffs )
and particularly the chart where the last element is known. I was mostly
clarifying Nathan's comment that keeping track of the last element _will be
faster_ rather than _may be faster_ in case others wish to take his
approach. When he mentioned tail pointer, he meant keeping track of the
last element. Ultimately, the best choice of container is suited to the
particular problem and as Nathan mentioned, a singly-linked-list is easy to
implement. In some cases you may not have the luxury of using a standard
library and rolling your own is the only choice.
Nathaniel J Fries
2014-01-24 00:25:41 UTC
Permalink
Post by slimshader
I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you'd consider 2010s "gaming rigs". That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.
I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don't want them having locks all the time.
BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?
It may still make sense to avoid the second thread by making tasks shorter. Use non-blocking or even asynchronous I/O, or memory mapping (all three options will require writing system-specific code), and you will find that resource loading tasks will not impede the render and event processing tasks (since upload to GPU already blocks other API calls on most implementations anyway). Blocking on file reads is probably the bottleneck pushing you towards the direction you're going.
Post by slimshader
I was mostly clarifying Nathan's comment that keeping track of the last element _will be faster_ rather than _may be faster_ in case others wish to take his approach.
This is true 99% of the time, but not 100% of the time. Maintaining a tail pointer is occasionally not worth the added complexity, and I have seen situations in which it was actually an optimization to remove it. O(1) and O(n) for tail insertion are equal when there is only one element, but the O(n) code may actually perform better. This is why I disagree with professors and professionals alike treating the big-O notation as the silver bullet of algorithms. It is a useful tool, but there is never a silver bullet.

------------------------
Nate Fries

Continue reading on narkive:
Loading...