[performance/API] Web workers

gadicc · September 13, 2015, 8:20am

We’ve discussed this in the channel but a few things I wanted to bring up:

We said we should do workers be default, but depending on structure, this may well prevent users from using any framework (e.g. Meteor) that isn’t built for workers and/or potentially that all user code needs to be in the worker and write their own bridges for anything they need on the DOM (in the UI thread). That’s a very major barrier for adoption, but there are some way around it (none of which are ideal).

In general for “top” performance, we want only the renderers in the UI thread. That means building the scene graph and working with it, etc, in a worker thread. This introduces the kinds of problems I mention above, but like Famous, if we avoid direct DOM access, it’s not the end of the world, it’s a potentially fair trade off.

We also spoke about sending calculations (component updates) to worker threads. On the one hand it’s quite nice since we can split this up over all available cores, on the other hand, we need to remember that everything happens over messaging, so the calcalation + copy there + copy back of data + management overhead needs to take less time than doing the calculation on the existing thread. There is a newer web worker technology that let’s you do zero copy transfers, but then the original data is no longer available on the initiating thread, so it would have to be copied beforehand too.

Also as @oldschooljarvis just mentioned, we initiate workers from the UI thread, but they have to load their own scripts, so we introduce some additional latency.

Worker resources:

BridgedWorker builder, a concept of RPC-like communication in workers.
webworkify, workers with the ability to require/import NPM libraries in a Browserify project.
worker-loader, workers with the ability to require/import NPM libraries in a Webpack project.

oldschooljarvis · September 13, 2015, 9:15am

Moved this into its own thread.

gadicc · September 13, 2015, 9:21am

Awesome, @oldschooljarvis! I think possibly the best compromise might be to keep scene construction and modification (node.size.setPosition) on the UI thread, but optionally allow it on it’s own thread. And then always allow computations to be done in worker threads if available, but it might be worth figuring out which computations are more expensive to send there and back than to just do in place.

Last thing is also situations like this:

+Scene
  +Node id="a" size="50, 50"
    +Node id="b" size="100%, 100%"
      +Node id="c" size="100%, 100%"

Now, how would we handle these kinds of dependencies if we’re sending messages to worker threads? Does every component need to keep a local variable with all all dependent values?

oldschooljarvis · September 13, 2015, 9:36am

My thoughts so far are:

Supporting workers in general adds an extremely high amount of complexity.
Despite this, workers are still quite sexy.
Agree that the UI thread should only be home to renderers, or anything else that doesn’t support workers, and as little core fuctionality as possible (how little remains to be seen).
I believe that it’s a good idea for node-level components to be created by what we’ve been referring to as plugins, or at least have the ability to communicate with plugins. The reason for this is so that we don’t conceptually bind the capability of a node-level component to a single thread. Moreover, it allows plugins to have full access to the engine, and thus in turn node-level components. That said, if people want to create traditional node-level components without a backing plugin, I don’t see why we shouldn’t let them.
I’ve been mulling over strategies to partition the scene graph itself so it can be processed concurrently by multiple workers (yes, I am insane). The only maybe viable solution I’ve come up with is to divide up the scene graph by root-level nodes, and do a quick hierarchical complexity analysis of each node. Then, batch and allocate the root-level nodes to each Worker based on their combined hierarchical complexity.

gadicc · September 13, 2015, 9:41am

Haha wow, sounds good! Keep us posted. Just remember that building the scene graph outside of the UI thread, although preferable, means that it prevents us from being easily addable to existing sites or frameworks. We need to weigh up easy adoption vs best performance.

oldschooljarvis · September 13, 2015, 9:58am

I agree with your previous post that since we intend to use performance analysis to assist with scheduling tasks on the core engine loop, we definitely should also factor in the overhead incurred by concurrency concerns.

As far as your example, the main issue I see is if calculations on a descendant node are performed prior to calculations on an ancestor node. That can’t happen. That’s why my aforementioned scene graph partitioning method breaks apart the scene graph at the root node level, that way calculations can be performed in order down the hierarchy.

If what you’re getting at is how a node-level component could farm out computations (say matrix calculations) and not desynchronize parts of the scene graph, I think the answer is it can’t. We’ll need to lock the active node such that any further traversal of the scene graph, at least traversal to the next level of depth, awaits the return of all pending calculations on that node (or all sibling nodes) before proceeding. This brings up an interesting point though: would out-of-order calculations on sibling nodes screw anything up?

In terms of Worker usage hindering adoption, I think it’s safe to say anything that can run in a worker can run in the UI thread, so for those types of situation where you’d want to intermingle or sprinkle Famous within existing content, you could just configure it to all run on the UI thread. How this would actually look in practice is another question though. I can’t help but imagine various engine pieces communicating to each other via some robust event system, while all on the same thread.

trusktr · September 16, 2015, 11:18pm

we want only the renderers, and the class method abstractions, in the UI thread. What I mean by class method abstractions is that, suppose a user wants to build a scene graph. The user can do the following in the UI thread:

var node = new Node()
node.addComponents(DOMElement)
node.addChild(new Node())
// etc ...

and those APIs that are always available in the UI thread have interfaces that, behind the scenes, delegate to instances that are inside workers. The user shouldn’t have to worry about actually placing their code into a worker. Method calls in the UI thread would be like RPC (similar to Meteor methods where the user can call server-side methods from the client). Behind the scenes, messages from UI-side method calls are sent to corresponding instances that live in workers; instances created when the user does new Node in the UI thread, etc. This could, in theory, also work with network-based RPC with little modification (only the messaging layer needs to be changed).

The UI-side API would be “smart”: If you call node.setPosition on the UI-side, it delegates to the worker-side Node instance where (perhaps) the whole scene graph is located, which eventually updates the UI-side renderer via messages. If you call domElement.getElement on the UI-side, this gets you the element on the UI-side, and there’s no need for a message to be sent to a worker in that case (but the user doesn’t know this, the user only uses the public API and gets a DOM element as expected).

This would all happen transparently, and the user wouldn’t know what actually executes in a worker or not. Integrating with Meteor Blaze (or React/Angular/etc) would be easy because our “smart” API would make it easy:

// some code in some Meteor app:
domElement.getElement(function(element) {
  Blaze.renderWithData(Template.myTemplate, data, element)
})

If the users calls sizeComponent.set(...) then that sends a message into a worker (probably the one where the scene graph lives). So, the API would basically be ‘smart’ (knowing what to send into workers), and the entire public API would always be available on the UI thread for simplicity and ease of use.

We don’t have to avoid direct DOM access. That part of the API (behind the scenes) wouldn’t send messages into a worker.

how would we handle these kinds of dependencies if we’re sending messages to worker threads

Hope I kind of answered that question. Those integrations would use the public API, and would not need to know about our engine’s workers.

Any thoughts on that?

Steveblue · September 17, 2015, 12:21am

@gadicc I don’t understand the concern. The integration would be easy. Anything that happens in the UI thread can easily integrate with other frameworks, i.e. ticks on the Engine, selectors for DOM or GL Components that allow you to style elements. Nodes however can operate in a Worker behind the scenes, independent of any framework. The only time you would see output from a Node in the UI thread is the update that happens after a calculation is made to its position, rotation, etc. WebWorkers are already part of infamous/boxer. Take a look at the bare bones implementation there.

trusktr · September 17, 2015, 1:07am

What I’d love to see come out of this is something that could be used standalone (without our engine) to create classes that have worker-backed methods. Maybe there could be some convention on how to write such a class, f.e., suffixing methods with _$worker (albeit ugly):

class Foo extends WorkerBackedClass {
  someMethod_$worker() {
    return 5+5
  }
}

Behind the scenes, the WorkerBackedClass constructor would introspect itself, create a public method called someMethod that, when called, shoots off a message to a worker and fires someMethod_$worker inside the worker, like magic.

How might this work? I don’t entirely know yet, but I know that all method bodies can be gotten as simply as this.someMethod_$worker.toString() so right there we already know we can send that function to a worker, and construct infrastructure for sending and receiving args and return values, respectively.

There could be any amount of arbitrary conventions, and perhaps we don’t even need to use a naming convention but the upcoming annotation syntax (ES2016/ES2017?) to mark methods as worker methods.

There could be all sorts of conventions, like ways to name worker groups (so that certain methods fire in the same worker, and/or certain worker-side class delegates are all instantiated in the same worker.

Just some ideas.

trusktr · September 18, 2015, 6:57am

Or maybe we can have a more imperative API that let’s use define how classes group together in workers. For the sake of painting murals, let’s call the API BlueCollar.

// define which classes have delegates in the same worker.
let sceneGraphWorkerClasses = new WorkerGroup(Node, Scene, Engine)

// specify which methods are delegated on a class. First arg specifies a class, second a UI method, third a corresponding worker method.
sceneGraphWorkerClasses.linkClassMethods(Node, "addChild", "addChildWorker")

// or perhaps link all methods of a class
sceneGraphWorkerClasses.linkAll(Node)

There would have to be certain rules (which I haven’t thought of yet), but perhaps things along the lines of:

Methods called on the UI side send their args, serialized (with certain serialization rules), and receive deserialized return values.
It would be possible to do things on the UI side like
```
let scene = Engine.createScene()
let node = scene.addChild(new Node())
```
and the BlueCollar API would make that work in the worker side.
There could be a way to link multiple workers together, and to define classes who’s instances can expand into new workers after a certain number of instantiations, and worker sets who’s instances get added round-robin, etc.

BlueCollar would basically be a way of defining a messaging network, and using the BlueCollar API would result in a set of messaging systems in both the UI thread and all worker threads, where messages could be sent from various workers to other workers, routed through the UI thread, transparently, by simply defining how classes and methods co-exist in workers (basically just grouping them together).

Alright, I think this idea is heading in a better direction than my previous one on method naming. Plus, having to extend a class just for worker functionality might be a burden on the inheritance design of a library.

trusktr · September 24, 2015, 3:53am

Check out this BridgedWorker concept (updated link).

gadicc · September 24, 2015, 11:40am

@trusktr, @Steveblue, 100%, this was the pattern I ultimately went with too (having “stub” methods in the UI that communicate with the worker). I don’t think this is ideal (since a complex app could have a lot of stuff happening in the UI thread and cause it to block), but I think it’s the best solution moving forward, for “default use” (and easy adoption), without needing to worrying about anything else, unless they want to.

I’m thinking that get() type methods should by default return a promise but if the last argument is a function, call that as a callback when the result is ready. I think that keeps everyone happy. The alternative is to keep a copy of all values in the UI thread but I think that’s a double lose, because of the overhead of maintaining it and because of encouraging synchronous / blocking patterns.

@trusktr, yeah, we can definitely use toString() / blob… I think our “production” bundle should be a single file that includes the entire worker build inside of it and launches it this way. Just remember that everything we send needs to be self contained; so an entire build is ok, and an isolated function (i.e. not a closure) is ok, but a function that requires/imports anything is a no go (unless we get very funky with our build system but that’s nothing I want to tackle yet).

trusktr · September 24, 2015, 5:32pm

Would that limit us to a single worker? Maybe we can have a single worker to communicate with, but that worker can make sub workers for different things if needed? Hmmm…

I updated the original post to add resource links, including webworkify and worker-loader for Browserify and Webpack. I’m starting my prototype. Gotta catch up!

gadicc · September 24, 2015, 6:12pm

No, not at all. I mean the minified worker build will land up as a string in the main build, which then uses the blob / “hybrid” method to launch the actual worker (i.e. no need to wait for the worker to load and only then start requesting it’s own scripts). We can do this as many times as we want.

Note, no sub workers in Chrome (and no plans for it either). But their comment there about using MessagePorts is exactly what I ended up doing. Unfortunately Firefox doesn’t support MessagePorts, so would be better to use sub workers there :> (rather than proxying via the UI thread, which would impact rendering).

Re “funky build systems”, I meant allow something like require() (or "import Blah from blah" (with quotes, as text) in the functions, track dependencies and push them all through to the worker too… that’s nothing any current builder supports, we’d have to write our own that interacts with an existing one. I don’t really think it’s something we seriously want to do, but it’s certainly possible :>

trusktr · September 24, 2015, 9:01pm

Ah, dang. A possibility I was imagining was a scene graph in a root worker, and it sends matrix calculations to sub workers.

Can you give a little more detail on how it’d impact rendering? Do you mean because of the extra overhead of relaying messages between workers? That’s interesting to think about though. We can make an API that is agnostic of sub workers vs same-level workers with message ports.

So you mean something at runtime? What’s the difference between what you’re describing and webworkify/worker-loader?

oldschooljarvis · October 5, 2015, 7:37am

Just ran across this, it’s a rather fantastic summary of the future:

If only we had the luxury of say, waiting two years.

trusktr · November 16, 2015, 7:42pm

I thought this was awesome too, which also mentions SIMD: http://www.2ality.com/2015/11/tc39-process.html

Will we still need web workers? Or can a single ui-thread’s SIMD instruction sequence perfectly saturate all CPU cores during a requested animation frame?