Web Worker Tests

Just a brief note:

https://code.google.com/p/chromium/issues/detail?id=405956

This means that in situations where you want multiple SharedWorkers created from the same file, you have to actually copy said file multiple times and load each accordingly. Very lame.

https://code.google.com/p/chromium/issues/detail?id=334408

:expressionless:

Update

So after a bit of testing, Iā€™ve confirmed this only applies to explicit MessagePort objects, as opposed to the implicit type you get via the onmessage handler with regular workers. What this means in practice is that, if youā€™re on Chrome and want to send an ArrayBuffer from Worker A to worker B, youā€™re going to have to route it through the UI thread. Despite the fact youā€™re sending two messages, itā€™s still dramatically faster than a single/direct message using a structured cloning operation.

Update 2

This also means that in Chrome, if you have two UI threads, they wonā€™t be able to communicate with each other (or each otherā€™s workers) using transferables.

1 Like

That is quite a setback! The web is evolving, and these will be necessary for creating the performant apps that weā€™re imagining.

https://code.google.com/p/chromium/issues/detail?id=376637

Adding that to the list of Worker-related things broken in Chrome. Not a deal breaker, just very annoying.

Yeah, itā€™s interesting, all these issues. It shows that the web is transforming into a platform to compete with native, but that itā€™s not entirely there yet, although itā€™s getting a lot closer compared to a few years ago. One thing is certain: I really like the direction itā€™s going in. I guess all of us do since thatā€™s the reason weā€™re discussing things in this forum anyway. :smile:

Well, my take is the majority of *Worker functionality was implemented 3-4 years ago, but only partially, and things have just kind of rotted since. I keep getting the impression the people actually working on the browser implementations resent those APIs and/or view them as bad specs, and I tend to agreeā€”itā€™s just that spotty implementations end up making the situation far worse.

Unfortunately itā€™s all we have right now though, and itā€™s going to be quite some time before SharedArrayBuffer lands in at least two or more major browsers. Ironic really that constructs intended to insulate developers from the pitfalls of traditional parallel programming actually ends up making the situation far worse.

This might be relevant to the discussion at hand ?

Multithreaded Toolkits, a failed dream? (2004) [HN]

Itā€™s about how every major interface toolkit that tried to go multi-threaded,
ie: allowing the threads (or workers) to update the UI, ended up walking
back from that decision to settle on using an event loop.

1 Like

I read that yesterday actually.

For what itā€™s worth, this work isnā€™t specific to UIs. That article was also written over a decade ago, and thereā€™s been success stories since.

Of course, a couple months ago I was probably the most vocal opponent of using Workers due to the complexity cost. :laughing:

1 Like

The only thing that really concerns me about the web worker stuff is that it very much feels like premature optimisation. It feels like it should be something that you only attempt once you actually have something that works.

I just remember this old mantra:

  1. make it work
  2. make it right
  3. make it fast

Just thinking out loud: The command queue will likely be on the UI thread. Some commands might be synchronous (f.e. handling of a DOM element) and some might be asynchronous (f.e. sending a request to a worker to calculate world transformation matrices or to a worker to calculate physics), and all can be queued using something like async.queue.

Just thinking out loud (let me know if you have any thoughts about anything):

The command queue could be on the UI thread. Some commands might be synchronous (f.e. handling of a DOM element) and some might be asynchronous (f.e. sending a request to a worker to calculate world transformation matrices or to a worker to calculate physics), and all can be queued using something like async.queue.

Hmm, but if many of the functions are asynchronous, then they can run in parallel. Perhaps the fastest updating things will simply callback sooner, and those things render sooner (f.e. at 60fps while some other things are slower). There could be a result listener that simply applies whatever results were sent back within the frame. SO itā€™s possible that some slow animations will be slower, and some will be faster depending on when results come back from workers. If such a design is done properly, rendering could always be at 60fps, and only drawing speed appear to be slow depending on the specific animation.

Can each node have itā€™s own worker? Perhaps theyā€™re a bunch of workers all connect together in the same structure as the scene graph? Would that make sense for a componentā€™s onUpdate to run in itā€™s Nodeā€™s worker? If it needs state from another node for itā€™s calculation (f.e. the size or rotation of another node) it can request it asynchronously and eventually get the result back in order to completely itā€™s calculation. Iā€™m imagining something this might work really nicely when MessageChannels arenā€™t coupled to the UI-thread like they are now (as described in that Chromium bug you linked us to above @oldschooljarvis).

I think perhaps each Node worker in this design Iā€™m imagining can obviously hold things like itā€™s local state and world state (local transforms, local opacity, etc, and world/final transforms, final opacity, etc).

Iā€™m imagining that in order to calculate the world transform of some Node in a tree of linked Workers, that the mechanism that will do these calculations (be it in some separate worker for this purpose, or perhaps in a sidekick worker of the Node whoā€™s world-transform weā€™re calculating) will asynchronously query all Nodes in the path leading to the Node in question to get the local transforms of each node. The results might arrive in any order, but when they all finally arrive, the calculation can be completed, and the result world transform saved back onto the Node in question.

The WebGL renderer, f.e., would need all the world transforms of each Node in order to render them in the 3D world, so that would happen separately in a similar fashion: querying all nodes for world transforms, and applying only transforms that have changed. Thereā€™d be some way to query for transforms of things we expect have been modified.

Maybe a Node can be marked ā€œdirtyā€ on the UI side whenever a user has called a function to modify some of that Nodeā€™s state, that way the renderer can query that worker once per frame until the Node says itā€™s clean? Iā€™m not entirely clear on this part yet, I have fog here.

But what I am starting to see clearly is that if we separate everything into workers in small pieces (f.e. one Worker per Node) then it seems clearer how things can perform well without blocking the UI thread.

At first I was imagining having a single ā€œSceneWorkerā€ that contains all of itā€™s worker-side nodes in (worker-side because there is also some UI-side ā€œNodeā€ that the end user always has a reference to, and which is the userā€™s interface to that ā€œNodeā€) and would run all updates in itā€™s own self.

But now that Iā€™m thinking about it, Iā€™m liking the idea of splitting things into one worker per Node, which I think might actually make the design easier to implement because then the UI-side ā€œNodeā€ that a user interacts with can have methods that simply call methods on the worker-side Node in an RPC-like fashion.

And now, when I think about something like a ā€œTransitionā€ that performs calculations on any number of Nodes at once, I can imaging that the Transition thing can be itā€™s own worker that queries Nodes for info needed then calculates results to give back to Nodes, which Nodes can then use to update their own local and world transforms.

So hereā€™s an example of possible UI-side end-user API:

let node = new Node // `node` is interface for the end user in the UI-thread, but creates a worker behind the scenes?
let transition = new Transition(0) // Transition might make a new worker behind the scenes.

function someAsyncFunction(callback) { // some async function who's result we'll need in a calculation.
    // ...
    callback(result)
}

transition.to(2*Math.PI, {
    duration: 5000,
    curve: Curve.expoInOut,
    deps: {
        otherPosX: otherNode.position, // transition sees this is a NodeComponent, so gets that component for the following calculator function.
        someNumber: someAsyncFunction // transition automatically knows to call someAsyncFunction on the UI-side each tick?
    }, // list dependencies needed by the Transform, so that it knows what to query on the worker side?
    calculator: function(currentValue, deps) { // calculator runs on the worker-side.  ?
        return { // return a result object, the results of some (possibly intense) calculations.
            rotation: deps.otherPosX * deps.someNumber + currentValue + Math.random()*Math.PI/8
        }
    },
    applier: function(results) { // used to apply results. Receives the result object of the calculator on the UI-side.
        node.rotation.y = results.rotation // this tells the UI to send the update to the node's worker after caching it on the UI-side for DOM rendering? We know that at each frame we need to apply cached values, whatever may be.
    }
}).loop()

What I was thinking there is that thereā€™s some way to tell the API what it needs in order to perform some calculations worker-side (in this case, guided by a transforming number) and how to apply the result to other parts of the UI-side API.

But, maybe having that transition in itā€™s own worker is overkill? I initially thought that by making a new transform with new Transform(node) that it might be a component of the Node that it received in itā€™s constructor, and operate on that Node, but then that seems to make it complicated to do calculations involving state from multiple nodes, so I came up with that deps idea just now. But the problem with the applier function is that the result has come back to the UI-thread, and if it needs to be applied to another node, then the result is being sent back into a worker. Itā€™d be nice for the result to go straight to where it needs to go without having to route through the UI (save for messaging limitations, but at least not via end-user API calls). Any ideas?

The overall main point that Iā€™m seeing is that if everything is async on a worker basis, then nothing can block the UI except if the scene graph is just so big that it takes longer than a frame to apply results (i.e. pass transforms to renderers). But calculation wise, nothing will happen on the UI-side (if our API is used properly).

Iā€™m imagining that subtrees in the scene graph could take longer than a frame to update if they involve large physics calculations for example, while other portions of the scene graph tree might update quickly, which could produce the effect of some things having sub-60fps lag, and other things moving smoothly. Thatā€™d be interesting to actually see. I also fear some behavior like that would be funky in, for example, a first person shooter case.

So, thatā€™s what Iā€™m gonna do. Iā€™m gonna try to make absolutely everything asynchronous, and the easy way to make mathematical things asynchronous is obviously doing them in workers. :smiley:

Everything async makes my initial idea of having a single SceneWorker thread much less appealing because of how many synchronous calculations itā€™d have. I guess the thing is finding a balance between threads and calculations. For example, spawning a thread just to do 5*5 might be not be worth it. Maybe functions (atomic calculations) can be put on a primary queue for each tick, then if the mechanism there detects that the 16ms threshold is reached for all of the tasks to complete, it starts a new worker to stack tasks on, etc, so that, overall, all tasks in the entire app finish once per 16ms, except for those intense ones where the synchronous calculation simply takes longer than 16ms.

Another question! If weā€™re animating 500 things at once, and having 500 Nodes, each represented by a Worker, thatā€™d be 500 workers! Is that too much? Maybe thereā€™s some magic number of Nodes to have per worker?

We have three prototypes already that run/render, so there was little point hurrying to make another.

Nobodyā€™s really solved workers yet, so thatā€™s where I allocate the majority of my time currently.

The good news is that what I have so far is completely agnostic concerning nodes, event architecture, a scene graph, etc. It just handles the question of ā€œHow do I run this bit of code in multiple places, in a seamless, high-performance fashion?ā€ Thatā€™s it.

I wouldnā€™t really call it premature optimization. More like building a solid low-level foundation.

Of course, the biggest argument against the entire thing would be the fact SharedArrayBuffer will invalidate pretty much all of this work once it lands. Canā€™t say Iā€™m too thrilled about that, but at least the experience gained here will allow me to approach that better.

1 Like

It used to be that Firefox had a hard limit of 20 workers, and Chrome was 66. That was 2012/2013-ish information though (StackOverflow). Running some tests just now, Firefox and Chrome seem to technically support 500. That said, Chromeā€™s page rendering crashes and its inspector freaks out, detaching itself.

In practice, optimal worker count varies based on client hardware. If I had to guess, weā€™ll probably rarely go north of 20. Optimal will probably be around 6-12. If you have too many workers, performance suffers considerably.

As for your other thoughts, my only suggestion is your reasoning about workers is perhaps too coupled with other engine abstractions. I think youā€™re on point with the async.queue thing though.

With my implementation, Iā€™m aiming to just use callbacks everywhere (ala Node), and hopefully it can just be left up to the user if they want to use something like Promises to handle async, though I may be a bit optimistic.

Would it be better to just wait until MessageChannel is fixed between workers (UI-thread agnostic) so that passing a buffer guarantees no race conditions? Wouldnā€™t that be the same thing? I mean if weā€™re going to carefully lock a shared array buffer, thatā€™s the same as passing an array buffer carefully.

EDIT: nevermind, the non-shared arraybuffer prevents parallel reading, which would be nice with SharedArrayBuffer.

Haha, thatā€™s funny.

Sounds good!

Can you share more details about how youā€™re thinking of implementing your worker thing? Wanna put it in a repo on github.com/infamous?

Looks like the guy who made Parallel.js is all up on SharedArrayBuffer, prototyping with it in Firefox nightly. Maybe itā€™ll be useful to see how he updates Parallel.js when SharedArrayBuffer is ready.

@oldschooljarvis What are your thoughts on Parallel.js?

Oops, nevermind, about what I said about the guy who made Parallel.js. Heā€™s not the same guy as the author of this article that I was reading. I misread, but the words ā€œParallel JSā€ in the article coincidentally made me search for and stumble on Parallel.js.

Parallel.js is nice, and certainly elegant, but it probably isnā€™t suitable for situations like ours where youā€™re dealing with a tight frame budget. Libraries like this tend to transmit the code to the worker theyā€™re going to execute it on, and spin up workers on the fly. All of that eats up a ton of time.

To build off the previous question, the difference with my implementation is that thereā€™s a single bundle file that all threads run on (UI/DedicatedWorker/SharedWorker). On application start, each thread loads from that bundle.

Thereā€™s several advantages to this approach:

  1. Radically simplified build system.
    or
  2. No spin-up overhead.
  3. Minimal invocation overhead.

Disadvantages are:

  1. Significantly increased code complexity.
  2. Scope concerns when invoking functions.

To elaborate on the second disadvantage, if youā€™re invoking code nested within other invokable code inside of a worker, thereā€™s no way to actually reach it without breaking out the nested functions into top-level scope. For that, my tentative plan is a build-time AST transformation.

Itā€™s nowhere near ready for prime time yet, but soon!