Web Worker Tests

oldschooljarvis · October 19, 2015, 7:20am

Just a brief note:

https://code.google.com/p/chromium/issues/detail?id=405956

This means that in situations where you want multiple SharedWorkers created from the same file, you have to actually copy said file multiple times and load each accordingly. Very lame.

oldschooljarvis · October 21, 2015, 7:55pm

https://code.google.com/p/chromium/issues/detail?id=334408

Update

So after a bit of testing, I’ve confirmed this only applies to explicit MessagePort objects, as opposed to the implicit type you get via the onmessage handler with regular workers. What this means in practice is that, if you’re on Chrome and want to send an ArrayBuffer from Worker A to worker B, you’re going to have to route it through the UI thread. Despite the fact you’re sending two messages, it’s still dramatically faster than a single/direct message using a structured cloning operation.

Update 2

This also means that in Chrome, if you have two UI threads, they won’t be able to communicate with each other (or each other’s workers) using transferables.

trusktr · October 21, 2015, 8:14pm

That is quite a setback! The web is evolving, and these will be necessary for creating the performant apps that we’re imagining.

oldschooljarvis · October 28, 2015, 11:18am

https://code.google.com/p/chromium/issues/detail?id=376637

Adding that to the list of Worker-related things broken in Chrome. Not a deal breaker, just very annoying.

trusktr · October 28, 2015, 7:58pm

Yeah, it’s interesting, all these issues. It shows that the web is transforming into a platform to compete with native, but that it’s not entirely there yet, although it’s getting a lot closer compared to a few years ago. One thing is certain: I really like the direction it’s going in. I guess all of us do since that’s the reason we’re discussing things in this forum anyway.

oldschooljarvis · October 28, 2015, 10:13pm

Well, my take is the majority of *Worker functionality was implemented 3-4 years ago, but only partially, and things have just kind of rotted since. I keep getting the impression the people actually working on the browser implementations resent those APIs and/or view them as bad specs, and I tend to agree—it’s just that spotty implementations end up making the situation far worse.

Unfortunately it’s all we have right now though, and it’s going to be quite some time before SharedArrayBuffer lands in at least two or more major browsers. Ironic really that constructs intended to insulate developers from the pitfalls of traditional parallel programming actually ends up making the situation far worse.

AdrianRossouw · November 2, 2015, 12:07pm

This might be relevant to the discussion at hand ?

Multithreaded Toolkits, a failed dream? (2004) [HN]

It’s about how every major interface toolkit that tried to go multi-threaded,
ie: allowing the threads (or workers) to update the UI, ended up walking
back from that decision to settle on using an event loop.

oldschooljarvis · November 2, 2015, 9:33pm

I read that yesterday actually.

For what it’s worth, this work isn’t specific to UIs. That article was also written over a decade ago, and there’s been success stories since.

Of course, a couple months ago I was probably the most vocal opponent of using Workers due to the complexity cost.

AdrianRossouw · November 2, 2015, 10:02pm

The only thing that really concerns me about the web worker stuff is that it very much feels like premature optimisation. It feels like it should be something that you only attempt once you actually have something that works.

I just remember this old mantra:

make it work
make it right
make it fast

trusktr · November 3, 2015, 5:42am

Just thinking out loud: The command queue will likely be on the UI thread. Some commands might be synchronous (f.e. handling of a DOM element) and some might be asynchronous (f.e. sending a request to a worker to calculate world transformation matrices or to a worker to calculate physics), and all can be queued using something like async.queue.

trusktr · November 3, 2015, 6:55am

Just thinking out loud (let me know if you have any thoughts about anything):

The command queue could be on the UI thread. Some commands might be synchronous (f.e. handling of a DOM element) and some might be asynchronous (f.e. sending a request to a worker to calculate world transformation matrices or to a worker to calculate physics), and all can be queued using something like async.queue.

Hmm, but if many of the functions are asynchronous, then they can run in parallel. Perhaps the fastest updating things will simply callback sooner, and those things render sooner (f.e. at 60fps while some other things are slower). There could be a result listener that simply applies whatever results were sent back within the frame. SO it’s possible that some slow animations will be slower, and some will be faster depending on when results come back from workers. If such a design is done properly, rendering could always be at 60fps, and only drawing speed appear to be slow depending on the specific animation.

Can each node have it’s own worker? Perhaps they’re a bunch of workers all connect together in the same structure as the scene graph? Would that make sense for a component’s onUpdate to run in it’s Node’s worker? If it needs state from another node for it’s calculation (f.e. the size or rotation of another node) it can request it asynchronously and eventually get the result back in order to completely it’s calculation. I’m imagining something this might work really nicely when MessageChannels aren’t coupled to the UI-thread like they are now (as described in that Chromium bug you linked us to above @oldschooljarvis).

I think perhaps each Node worker in this design I’m imagining can obviously hold things like it’s local state and world state (local transforms, local opacity, etc, and world/final transforms, final opacity, etc).

I’m imagining that in order to calculate the world transform of some Node in a tree of linked Workers, that the mechanism that will do these calculations (be it in some separate worker for this purpose, or perhaps in a sidekick worker of the Node who’s world-transform we’re calculating) will asynchronously query all Nodes in the path leading to the Node in question to get the local transforms of each node. The results might arrive in any order, but when they all finally arrive, the calculation can be completed, and the result world transform saved back onto the Node in question.

The WebGL renderer, f.e., would need all the world transforms of each Node in order to render them in the 3D world, so that would happen separately in a similar fashion: querying all nodes for world transforms, and applying only transforms that have changed. There’d be some way to query for transforms of things we expect have been modified.

Maybe a Node can be marked “dirty” on the UI side whenever a user has called a function to modify some of that Node’s state, that way the renderer can query that worker once per frame until the Node says it’s clean? I’m not entirely clear on this part yet, I have fog here.

But what I am starting to see clearly is that if we separate everything into workers in small pieces (f.e. one Worker per Node) then it seems clearer how things can perform well without blocking the UI thread.

At first I was imagining having a single “SceneWorker” that contains all of it’s worker-side nodes in (worker-side because there is also some UI-side “Node” that the end user always has a reference to, and which is the user’s interface to that “Node”) and would run all updates in it’s own self.

But now that I’m thinking about it, I’m liking the idea of splitting things into one worker per Node, which I think might actually make the design easier to implement because then the UI-side “Node” that a user interacts with can have methods that simply call methods on the worker-side Node in an RPC-like fashion.

And now, when I think about something like a “Transition” that performs calculations on any number of Nodes at once, I can imaging that the Transition thing can be it’s own worker that queries Nodes for info needed then calculates results to give back to Nodes, which Nodes can then use to update their own local and world transforms.

So here’s an example of possible UI-side end-user API:

let node = new Node // `node` is interface for the end user in the UI-thread, but creates a worker behind the scenes?
let transition = new Transition(0) // Transition might make a new worker behind the scenes.

function someAsyncFunction(callback) { // some async function who's result we'll need in a calculation.
    // ...
    callback(result)
}

transition.to(2*Math.PI, {
    duration: 5000,
    curve: Curve.expoInOut,
    deps: {
        otherPosX: otherNode.position, // transition sees this is a NodeComponent, so gets that component for the following calculator function.
        someNumber: someAsyncFunction // transition automatically knows to call someAsyncFunction on the UI-side each tick?
    }, // list dependencies needed by the Transform, so that it knows what to query on the worker side?
    calculator: function(currentValue, deps) { // calculator runs on the worker-side.  ?
        return { // return a result object, the results of some (possibly intense) calculations.
            rotation: deps.otherPosX * deps.someNumber + currentValue + Math.random()*Math.PI/8
        }
    },
    applier: function(results) { // used to apply results. Receives the result object of the calculator on the UI-side.
        node.rotation.y = results.rotation // this tells the UI to send the update to the node's worker after caching it on the UI-side for DOM rendering? We know that at each frame we need to apply cached values, whatever may be.
    }
}).loop()

What I was thinking there is that there’s some way to tell the API what it needs in order to perform some calculations worker-side (in this case, guided by a transforming number) and how to apply the result to other parts of the UI-side API.

But, maybe having that transition in it’s own worker is overkill? I initially thought that by making a new transform with new Transform(node) that it might be a component of the Node that it received in it’s constructor, and operate on that Node, but then that seems to make it complicated to do calculations involving state from multiple nodes, so I came up with that deps idea just now. But the problem with the applier function is that the result has come back to the UI-thread, and if it needs to be applied to another node, then the result is being sent back into a worker. It’d be nice for the result to go straight to where it needs to go without having to route through the UI (save for messaging limitations, but at least not via end-user API calls). Any ideas?

The overall main point that I’m seeing is that if everything is async on a worker basis, then nothing can block the UI except if the scene graph is just so big that it takes longer than a frame to apply results (i.e. pass transforms to renderers). But calculation wise, nothing will happen on the UI-side (if our API is used properly).

I’m imagining that subtrees in the scene graph could take longer than a frame to update if they involve large physics calculations for example, while other portions of the scene graph tree might update quickly, which could produce the effect of some things having sub-60fps lag, and other things moving smoothly. That’d be interesting to actually see. I also fear some behavior like that would be funky in, for example, a first person shooter case.

So, that’s what I’m gonna do. I’m gonna try to make absolutely everything asynchronous, and the easy way to make mathematical things asynchronous is obviously doing them in workers.

trusktr · November 3, 2015, 7:07am

Everything async makes my initial idea of having a single SceneWorker thread much less appealing because of how many synchronous calculations it’d have. I guess the thing is finding a balance between threads and calculations. For example, spawning a thread just to do 5*5 might be not be worth it. Maybe functions (atomic calculations) can be put on a primary queue for each tick, then if the mechanism there detects that the 16ms threshold is reached for all of the tasks to complete, it starts a new worker to stack tasks on, etc, so that, overall, all tasks in the entire app finish once per 16ms, except for those intense ones where the synchronous calculation simply takes longer than 16ms.

trusktr · November 3, 2015, 8:05am

Another question! If we’re animating 500 things at once, and having 500 Nodes, each represented by a Worker, that’d be 500 workers! Is that too much? Maybe there’s some magic number of Nodes to have per worker?

oldschooljarvis · November 3, 2015, 11:06am

We have three prototypes already that run/render, so there was little point hurrying to make another.

Nobody’s really solved workers yet, so that’s where I allocate the majority of my time currently.

The good news is that what I have so far is completely agnostic concerning nodes, event architecture, a scene graph, etc. It just handles the question of “How do I run this bit of code in multiple places, in a seamless, high-performance fashion?” That’s it.

I wouldn’t really call it premature optimization. More like building a solid low-level foundation.

Of course, the biggest argument against the entire thing would be the fact SharedArrayBuffer will invalidate pretty much all of this work once it lands. Can’t say I’m too thrilled about that, but at least the experience gained here will allow me to approach that better.

oldschooljarvis · November 3, 2015, 11:36am

It used to be that Firefox had a hard limit of 20 workers, and Chrome was 66. That was 2012/2013-ish information though (StackOverflow). Running some tests just now, Firefox and Chrome seem to technically support 500. That said, Chrome’s page rendering crashes and its inspector freaks out, detaching itself.

In practice, optimal worker count varies based on client hardware. If I had to guess, we’ll probably rarely go north of 20. Optimal will probably be around 6-12. If you have too many workers, performance suffers considerably.

As for your other thoughts, my only suggestion is your reasoning about workers is perhaps too coupled with other engine abstractions. I think you’re on point with the async.queue thing though.

With my implementation, I’m aiming to just use callbacks everywhere (ala Node), and hopefully it can just be left up to the user if they want to use something like Promises to handle async, though I may be a bit optimistic.

trusktr · November 3, 2015, 4:54pm

Would it be better to just wait until MessageChannel is fixed between workers (UI-thread agnostic) so that passing a buffer guarantees no race conditions? Wouldn’t that be the same thing? I mean if we’re going to carefully lock a shared array buffer, that’s the same as passing an array buffer carefully.

EDIT: nevermind, the non-shared arraybuffer prevents parallel reading, which would be nice with SharedArrayBuffer.

trusktr · November 3, 2015, 5:17pm

Haha, that’s funny.

Sounds good!

Can you share more details about how you’re thinking of implementing your worker thing? Wanna put it in a repo on infamous · GitHub?

Looks like the guy who made Parallel.js ~~is all up on SharedArrayBuffer, prototyping with it in Firefox nightly~~. Maybe it’ll be useful to see how he updates Parallel.js when SharedArrayBuffer is ready.

trusktr · November 3, 2015, 7:10pm

@oldschooljarvis What are your thoughts on Parallel.js?

trusktr · November 3, 2015, 7:52pm

Oops, nevermind, about what I said about the guy who made Parallel.js. He’s not the same guy as the author of this article that I was reading. I misread, but the words “Parallel JS” in the article coincidentally made me search for and stumble on Parallel.js.

oldschooljarvis · November 4, 2015, 2:34am

Parallel.js is nice, and certainly elegant, but it probably isn’t suitable for situations like ours where you’re dealing with a tight frame budget. Libraries like this tend to transmit the code to the worker they’re going to execute it on, and spin up workers on the fly. All of that eats up a ton of time.

To build off the previous question, the difference with my implementation is that there’s a single bundle file that all threads run on (UI/DedicatedWorker/SharedWorker). On application start, each thread loads from that bundle.

There’s several advantages to this approach:

Radically simplified build system.
or
No spin-up overhead.
Minimal invocation overhead.

Disadvantages are:

Significantly increased code complexity.
Scope concerns when invoking functions.

To elaborate on the second disadvantage, if you’re invoking code nested within other invokable code inside of a worker, there’s no way to actually reach it without breaking out the nested functions into top-level scope. For that, my tentative plan is a build-time AST transformation.

It’s nowhere near ready for prime time yet, but soon!