Web Worker Tests

Moving my web worker performance test results into their own thread here, so I don’t clutter other discussion.

Update 10/21/2015: Don’t bother reading this comment, the test it describes is flawed and essentially deprecated. Instead, start here.


copied from Gitter:

All results are representative of ideal conditions.

Chrome worker creation: 30-50ms
Firefox worker creation: 10-20ms

Chrome postMessage() Main --> Worker (first message): ~.275ms (first message)
Chrome postMessage() Main --> Worker (subsequent ): ~.08ms
Chrome postMessage() Worker --> Main (first message): ~.24ms (first message)
Chrome postMessage() Worker --> Main (subsequent ): ~.08ms

Firefox postMessage() Main --> Worker (first message): ~.06ms (first message)
Firefox postMessage() Main --> Worker (subsequent ): ~.04ms
Firefox postMessage() Worker --> Main (first message): ~.1ms (first message)
Firefox postMessage() Worker --> Main (subsequent ): ~.07ms

It seems Chrome workers take longer to spin up and first messages are roughly 3x slower, but overall it’s nice numbers. If we assume the average postMessage() takes .1ms, that should allow for a liberal amount of communication between the main thread and workers inside the time span of a single frame

Obviously it’s ideal conditions, and it doesn’t include heavy deserialization or cloning operations.

@oldschooljarvis awesome work… I’m going to start doing performance tests later tonight as well I’ll post in this thread!

Update 10/21/2015: Don’t bother reading this comment, the test it describes is flawed and essentially deprecated. Instead, start here.


@andrewreedy Thanks! Looking forward to it.

So, I’ve just made a new test, and the results of this one are quite odd, if not depressing.

Test source code:

Test itself (hosted)
http://project42.xyz/webworkerperftests/test04/

Basically what it does is just fires off a message to the worker, upon receipt a message is fired back, and once the main thread receives the reply, the cycle continues, infinitely, as fast as it can. Then a RAF loop simply measures how many messages are being sent per frame and dumps the results to the console, once per frame.

On my system, the results vary greatly depending on the browser.

  • Chrome [ Windows v45.0.2454.85 m (64-bit) ] will average anywhere from 900-1200 messages per frame, smoothly.
  • Firefox [ Windows v40.0.3 ] seems to bottleneck or choke. It’s easier to see it in action rather than explain it, but basically every third frame is a proper number (i.e. 900-1200), the subsequent frame is half that, and then it completely chokes - reporting 2 messages sent (sometimes for a few frames), and then the cycle repeats.
  • Internet Explorer [ v11.0.22 ] doesn’t choke, but averages about 120 messages per frame, or roughly 10% of what Chrome does.

I’m not sure what’s going on. Any thoughts?

I guess there’s some latency in the communication (to send and wait for a reply before sending again); possibly the browser is also doing some buffering (just a guess). What would the results be like if we:

  1. Send max requests to the worker, consecutively
  2. Wait for max replies to arrive
  3. If replyCount < THRESHOLD, increase the size of max on next loop to find the sweet spot

Re THRESHOLD, don’t forget that after all results are in, the renderer layer still needs to update the DOM (with some ops being relatively expensive), repaint, and other misc admin. The common suggestion is 10ms :smile:

With that, back to 1st message, if we say .16ms (for two messages), that’s actually terrible, since 10ms / .16ms = just 62.5 exchanges in ideal conditions! Can you post the code from that check too? Are you just recording the time directly before and after the post message, or do you include the time to receive the event?

Update 10/21/2015: Don’t bother reading this comment, the test it describes is flawed and essentially deprecated. Instead, start here.


@gadicc Not sure how we’ll end up handling it and/or designing the parallelism architecture (if at all).

[quote=“gadicc, post:5, topic:21”]Can you post the code from that check too? Are you just recording the time directly before and after the post message, or do you include the time to receive the event?
[/quote]

I made a new test that’s similar.

Source:

Live (hosted):
http://project42.xyz/webworkerperftests/test05/

This one just sends one message per frame to a worker, and then back, and logs the DOMHighResTimestamp difference courtesy of performance.now() once per frame.

However, this isn’t pretty either, and actually revealed something new:

  • Chrome [ Windows v45.0.2454.85 m (64-bit) ] looks beautiful.
  • Firefox [ Windows v40.0.3 ] has terrible latency issues, with the UI thread stalling as high as 10ms prior to receive every few frames. The shocking part about this, is that we’re only sending two messages per frame.
  • Internet Explorer [ v11.0.22 ] doesn’t support performance.now() inside a worker context. I suppose I should write it its own test or something.

All prior tests thus far have been Windows 8.1, with Firefox (32-bit) and Chrome (64-bit).

After wondering whether some of these issues were on my end, or a result of 64-bit Chrome performance, I decided to install Chrome 32-bit on my Linux virtual machine and compared it against Firefox 32-bit on the VM. The results were the same behavior as on desktop, just scaled down numbers due to less available CPU power.

Update:

I was able to fix the awful Firefox performance in Test05 via the following:

function rafLoop() {
    //Send a message, and receive a reply (once per-frame).
    setTimeout(function() {
        t = performance.now();
        worker1.postMessage(t);
    }, 0);

    window.requestAnimationFrame(rafLoop);
}

Just wrapped the UI thread message dispatch in a setTimeout() block. Apparently what was happening was when a new message was sent, it would continually bump off receipt of the last message. So, the setTimeout() block serves to lower the priority in sending a new message, allowing the browser’s event queue to process the onmessage listener first.

That said, Firefox’s performance is still crap compared to Chrome. Worker → UI messages are ~0.5ms, whereas Chrome is uniformly below 0.01ms in either direction. However, I can’t claim to understand full optimization technique for FF yet, either. The fact that Chrome doesn’t require optimization speaks for itself though.

Here’s a live verison of the fix: http://project42.xyz/webworkerperftests/test05-optimized/

That is really discouraging @oldschooljarvis. Would you say we should put off web workers for awhile then?

@talves I still have more tests to do. Up next is probably 2x UI threads + 3-5 workers. After that, seeing what performance on inter-thread copy operations looks like.

Despite these numbers being somewhat discouraging, there’s many strategies we can use to mitigate. Foremost is probably batching engine events before sending them across the pipe to another thread, then seamlessly decomposing the batched events from a single Message event.

I’ve slightly altered the test to lower the verbosity and measure the roundtrip instead. I’m getting the following numbers (Firefox outperforming Chrome!):

Chrome (Mac 45.0.2454.85): 0.21ms
Firefox (Mac 39.0.3): 0.17ms

Note: this is a roundtrip from UI --> Worker --> UI

index.js:

var t = performance.now(),
    samples = 60,
    i = 0,
    n = 0;

console.log('UI load timestamp: ' + t);

var worker1 = new Worker('worker.js');

worker1.onmessage = function (e) {
    i++;
    // Receive and log.
    t = performance.now();
    if (i % samples === 0) {
        console.log('UI --> Worker --> UI: ', (t - e.data.ui), '~', n / samples);
        n = 0;
    } else {
        n += (t - e.data.ui);
    }
};

var rafLoop = function () {
    //Send a message, and receive a reply (once per-frame).
    setTimeout(function () {
        var t = performance.now();
        worker1.postMessage({ui: t});
    }, 0);

    window.requestAnimationFrame(rafLoop);
};

rafLoop();

worker.js:

onmessage = function(e) {
    //Receive and log.
    var t = performance.now();
    e.data.worker = t;
    postMessage(e.data);
    t = null;
}

update:

Using a hacked-together polyfill worker class in the main UI thread gives me the following roundtrip data:
Note: this is without actual workers, just a fake polyfill one.

chrome: 0.013ms
firefox: 0.002ms

making the overhead roughly 0.2ms and 0.17ms

Firefox however will drop to 0.0005ms after a period of 15 seconds… which feels really weird to me. The other way around would make more sense probably.

Next test: doing some actual work inside the workers to see how that influences the behaviours.

update 2:

Adding heavy computation inside the workers just adds to the overall overhead, as expected.

One thing that I took out of this test is that it’s fairly easy to create a polyfill Worker, allowing the user to maybe determine whether to use real workers, or route everything through the polyfill worker for ‘normal’ behaviour.

Using the polyfill in a browser without real workers would add more overhead to the UI thread than without the polyfill. We should use only real workers when supported, and the library should also work without works (can be configurable, and by default tries to use workers if the browser supports it).

Do we even want to support a browser that doesn’t have workers (<IE10)?

In my opinion, we do not want to be supporting any browser that does not support it. Was it you @joe that said we are building for the future? I cannot remember who said it. :smile:

Someone can always fork or shim to support the old browsers if they need to.

1 Like

Yes, yes it was! :smile:

1 Like

Hey @dieserikveelo, sorry for the late reply - haven’t had much time the last couple days.

Nice test. While I had my suspicions that Firefox’s console.log() method was causing the terrible performance in my last test (Test05), your test basically confirmed this was the case.

In the name of purity (and science!), I rewrote a new test to do effectively the same thing as Test05–that being, to send and receive a message each frame, recording the results:

Source:

Live (hosted):
http://project42.xyz/webworkerperftests/test06/

The difference is that this test no longer uses console.log() at all while it’s running the test, and it logs the results to a pair of pre-initialized arrays. After the test is done, it computes the results and dumps to console. While the code is an utter mess, I don’t think it’s possible to get the test itself any leaner as far as performance is concerned.

The setTimeout(fn, 0) hack is no longer needed, though I suspect it’ll still be useful in Firefox for times when the UI thread is busy.

Also, keep in mind this is still just two messages per frame. Next I’ll probably adapt Test04 (maximum message volume) to this format, and see how Firefox copes. Right now, Firefox edges out Chrome for me. I have a feeling Chrome will prevail in real-world scenarios, because in those the UI thread is almost always busy. We’ll see.

@trusktr

Using the polyfill in a browser without real workers would add more overhead to the UI thread than without the polyfill.

I’m not sure this would be the case. The engine is almost assuredly going to run on events anyways. Either way, I think we’ll probably arrive at a nice solution that handles both cases elegantly and in a manner that achieves the highest performance reasonably possible.

@trusktr re: IE10

As far as I’m concerned, IE10 can go see itself into the dumpster.

2 Likes

Since I’ve been slacking on these lately, I figured it was time for a new test.

This one tests inter-worker communication. Here’s what it does in order:

  1. Spin up a couple of workers.
  2. Create a MessageChannel.
  3. Send port1 of the MessageChannel to Worker1, and port2 to Worker2.
  4. Block the UI thread (~20 seconds on my hardware).
  5. Have Worker1 and Worker2 send each other messages one second apart, recording send and receive times.
  6. Unblock the UI thread.

Note that since we invoke console.log() from workers while the UI thread is blocked, we only expect console output once it unblocks. This is normal; console.log() is a UI thread function, workers just proxy it.

Source:

Live (hosted):
http://project42.xyz/webworkerperftests/test08/ - WARNING: This may freeze your browser, as it intentionally blocks the UI thread!

Here’s my Firefox (Windows) output:

UI thread start.
Worker2 start @ 88.71000000000001
Worker1 start @ 88.71000000000001
Begin blocking UI thread @ 5052.6900000000005
End blocking UI thread @ 22589.475000000002
Message received by Worker2 @ 6087.85, contents are: Hello from Worker1! @ 6087.695
Message received by Worker1 @ 7087.88, contents are: Hello from Worker2! @ 7087.735000000001

and Chrome 64-bit (Windows):

UI thread start.
Worker1 start @ 161.08
Worker2 start @ 164.67000000000002
Begin blocking UI thread @ 5084.385
End blocking UI thread @ 27932.81
Message received by Worker1 @ 27933.305000000004, contents are: Hello from Worker2! @ 7171.8550000000005
Message received by Worker2 @ 27933.305000000004, contents are: Hello from Worker1! @ 6168.145

What this shows is that in Chrome, workers can’t talk to each other if the UI thread is blocked, at least not via the MessageChannel API. :frowning:

Both Firefox and Chrome dispatch messages just fine while UI is blocked, but only Firefox workers receive messages during this time. Chrome is unable to receive messages in its workers until the UI thread is no longer blocked.

Update
Just ran a modified test for IE11 - Date.now() instead of performance.now() - and IE11 behaves the same way Firefox does, with worker message receipt being unaffected by a blocked UI thread. Why Chrome, why? Why?!

Update 2
Related: https://code.google.com/p/chromium/issues/detail?id=443374

Oh wow, that’s something super important to know! I didn’t even think to test that :frowning:

Thanks for the linked issue, probably worth mentioning this issue there too, relating back to what they refer to in the last comment as “'main thread working on behalf of worker thread”.

Inconveniently, we can work around this in our own code by dividing up our UI loops and breaking/resuming via setImmediate… we’ll lose some time though and of course are beholden to any other user code on the UI thread :confused:

Thanks for all these tests!

Hey, no need to thank me. We’re all in the same boat/team effort/etc. :sunglasses:

Anyways, yeah–this unfortunately means if we have the engine loop running on a worker, we’ll see no improvement with Chrome.

The good news is that the implementation is effectively the same and requires no changes.

That’s really limiting!

This isn’t a test so much as a finding:

When using SharedWorker in Chrome, if you have only one tab open that owns a SharedWorker instance, and you refresh the page, the SharedWorker will fail to function in the subsequent refresh. If you then refresh yet again, it’ll work fine.

If you close the tab manually, then open a new tab, it will work fine. So, clearly some sort of issue with SharedWorker destruction in Chrome.

In Firefox under an identical scenario, what happens is the SharedWorker remains alive and thus maintains its state. This isn’t exactly sound either, but at least it’s better behavior than Chrome.

IE11 and Safari simply don’t support SharedWorker. :’(

1 Like

New test, same as Test08, but using SharedWorker instances instead of Worker instances.

Results are the same unfortunately. I was mainly hoping to find a way around the aforementioned Chrome issue. Still more things to try, though.

Source:

Live (hosted):
http://project42.xyz/webworkerperftests/test08-SharedWorker/ - WARNING: This may freeze your browser, as it intentionally blocks the UI thread!

Update

Possibly related:
https://code.google.com/p/chromium/issues/detail?id=327256

Blink/Chromium Worker Implementation Notes

Update 2
https://code.google.com/p/chromium/issues/detail?id=344814

:expressionless:

1 Like