In terms of Quake 3, when is the best time to begin threading? The answer here is profiling because we must have a good understanding of the application’s behavior before we can decide the type of parallelism that will work best with this specific subsystem.
Sampling the application with the Intel VTune Performance Analyzer will provide a detailed analysis of the amount of CPU time that is being used for each specific function and module. This data provides useful insight that will help decide the portions of the engine that multiple threads should be run in parallel in order to maximize performance.
In addition, the Vtune Performance Analyzer can be used to generate a call graph of a specific application’s workload, providing you with the self-time and total time at each node and call hierarchy. This information, along with sampling data, provides useful insight that will help you determine the best place along the call tree to do a data decomposition.
If you are interested in doing a functional decomposition, the best way to do this would be by moving the renderer onto a separate thread. This involves moving all the quake GL calls over to a single thread because the graphics drivers will not be thread-safe and will not be able to handle calls on the different threads as wells. To offset this, the front end will prepare a frame and the back end will render it on a separate thread.