Here is the summary of the text below: Thread pools are superior. Which is hardly surprising, but now I have numbers to back it up. You can download the benchmark code I wrote from my website.
The last couple of months have been spent with heavily multithreaded Qt code. Fun, yes, but also challenging. I have always had three rules about multithreading: 1: Don’t do it! 2: Don’t do it! 3: Don’t do it yet. Threaded code is harder to build and debug, and with event driven applications you rarely need to use it.
But in some cases multithreading is necessary. One example was my last customer. They had an application that had threads talking to hardware. This is a perfectly good example of something that certainly fits well in a separate thread.
When the hardware did something, they generated a work request. This was a QThread subclass that they called start on, and let it do it’s thing, and deleted the thread on completion. Not an unreasonable approach, but as they were generating around 10 threads per second, I did not like it. This is a textbook example of what QThreadPool is for.
I made a change on some of the work requests so they used QThreadPool instead. The modification to the code is minimal. Instead of subclassing QThread, they should subclass QRunnable. And then QThreadPool should run and delete the thread instead of the thread itself handling that.
First results were surprising: The work request time increased instead of going down. I told them not to worry, because the time of an individual request is not really what is interesting, the effect on the whole system is more important. In some use cases this is not true, but the new work request time handling was still well within their requirements.
So, the question remains whether I was right or not. Will a thread pool implementation use less total cpu time than a QThread based implementation? To answer this question, I wrote a small test benchmark. The code is very simple, but it does produce a pretty heavy load when run. As mentioned above, you can download it here.
Each work request makes a list of 1000 integers and then sorts them. Simple piece of work, and not something the compiler can do any tricks on. This simple function is just called by my QThread and QRunnable subclasses in the run() method.
static QList<int> doWork() {
QList<int> numbers;
// Fill the list with some numbers
const int numberSize = 1000;
int currentNumber = 0;
for (int i = 0; i < numberSize; ++i) {
currentNumber = (currentNumber + 367) & 1023;
numbers << currentNumber;
}
qSort(numbers);
return numbers;
}
The tests do a number of iterations and on each iteration creates a number of threads and then goes to sleep for a bit, giving the worker threads some time for their work. With the numbers below, all was run with creating 25 worker threads on each iteration and sleeping for 10ms between each iteration. At the end, the QThread implementation deletes all the thread instances because QThreadPool does this automatically.
This is the loop that creates the worker threads and executes them. After this code, it deletes the threads and waits for all of them to finish.
void ThreadStarter::run() {
QList<QThread*> threads;
threads.reserve(mIterations * mThreadsPerIteration);
for (int i = 0; i < mIterations; ++i) {
for (int t = 0; t < mThreadsPerIteration; ++t) {
if (mType == Thread) {
WorkThread* thread = new WorkThread;
thread->start();
threads << thread;
} else {
WorkRunnable* runnable = new WorkRunnable;
QThreadPool::globalInstance()->start(runnable);
// Mimic the same work as with threads
threads << 0;
}
}
msleep(mSleepTime);
}
...
I ran this with a growing number of iterations. The table below shows the number of iterations, the time for the thread based implementation to run, and finally the time for the threadpool implementation:
i: 10 thread ms: 107 runnable ms: 102
i: 20 thread ms: 215 runnable ms: 206
i: 30 thread ms: 322 runnable ms: 311
i: 40 thread ms: 429 runnable ms: 407
i: 50 thread ms: 546 runnable ms: 507
i: 60 thread ms: 651 runnable ms: 610
i: 70 thread ms: 765 runnable ms: 713
i: 80 thread ms: 860 runnable ms: 812
i: 90 thread ms: 966 runnable ms: 912
i: 100 thread ms: 1084 runnable ms: 1022
i: 110 thread ms: 1181 runnable ms: 1119
i: 120 thread ms: 1289 runnable ms: 1218
i: 130 thread ms: 1404 runnable ms: 1318
i: 140 thread ms: 1505 runnable ms: 1420
i: 150 thread ms: 1624 runnable ms: 1521
i: 160 thread ms: 1722 runnable ms: 1622
i: 170 thread ms: 1842 runnable ms: 1728
i: 180 thread ms: 1944 runnable ms: 1823
i: 190 thread ms: 2043 runnable ms: 1924
As you can see, the conclusion is completely clear: In every case, the thread pool implementation is faster than the QThread implementation. So even though each worker thread could finish faster with the thread implementation, the total system time is higher. This is relevant when the machine gets heavily loaded (as it did for my customer), because then you need the threads to take up as little total time as possible.
I have run this with other numbers of threads generated and sleeping time, and on different machines. The conclusion have always been the same. Not a single time has the QThread approach been the fastest.
There is another benefit to the thread pool based approach that has nothing to do with speed. This is the problem of deletion of the worker threads. It’s not trivial to figure out how to delete the thread object, once it’s done. Of course, a connect from finished() to deleteLater() usually does the trick. Unless you do something like thread->moveToThread(thread) and call exec() in the run method. With the thread pool approach it’s trivial, since the pool will by default just delete the worker thread object when it’s done.
So, my recommendation stands: Unless you have a really good reason not to, you should use a thread pool for worker threads.