Solving a single island across multiple cores

Please don't post Bullet support questions here, use the above forums instead.
raigan2
Posts: 197
Joined: Sat Aug 19, 2006 11:52 pm

Solving a single island across multiple cores

Post by raigan2 »

In this month's GD mag there's a paid portion from Intel that talks about this simulator: http://www.infernalengine.com/tech_physics.php

They specifically mention that it can solve an island of constraints in parallel; collision detection and Jacobian building are more easily split up, because each individual test/Jacobian happens in isolation, but how would you parallelize the actual relaxation/solving part?

The only obvious thing I can think of is: take an island and "split" it by ignoring some constraints and/or bodies, so that it becomes two separate islands B,C. Then, solve each island on a different core, and once both are done solve the constraints/bodies that were initially ignored. Or you could just use Jacobi iterations instead of Gauss-Seidel, but that seems like it would have terrible convergence.

Are there any other (better) approaches?
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Solving a single island across multiple cores

Post by Erwin Coumans »

You can do parallel PGS for a single island by re-organizing constraints into batches, where each batch has constraints that don't share dynamic rigid bodies with any other constraints in the same batch. This allows to parallelize simulation islands where all objects are connected (directly or indirectly).

See my posting and attached GDC 2009 slides by Takahiro Harada in this topic.

Hope this helps,
Erwin
raigan2
Posts: 197
Joined: Sat Aug 19, 2006 11:52 pm

Re: Solving a single island across multiple cores

Post by raigan2 »

Thanks, I _think_ I understand -- if you had a chain of bodies ABCDE connected by joints AB, BC, CD, DE you could then solve [AB,CD] in one batch and [BC,DE] in the next?

That's pretty smart!

You mentioned that you used 10 batches -- the number of batches would have to depend on the number of constraints involved with the most-constrained body, right? I.e if you had a body that was being acted upon by 11 different constraints, it wouldn't be possible to schedule all the constraints in 10 batches. This seems reasonable anyway, unless you have a bunch of spiders colliding with each other (with 8 joints to connect legs to thorax, the thorax can only collide with two things before you have problems). But then I suppose you could "solve" this problem by splitting the thorax into two bodies which are constrained to each other..
noctoz
Posts: 2
Joined: Thu May 21, 2009 11:14 am

Re: Solving a single island across multiple cores

Post by noctoz »

http://download.intel.com/technology/it ... -art08.pdf

This paper by Intel has a section about this.
It's on page 5-6.
raigan2
Posts: 197
Joined: Sat Aug 19, 2006 11:52 pm

Re: Solving a single island across multiple cores

Post by raigan2 »

Awesome, thanks!
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Solving a single island across multiple cores

Post by Erwin Coumans »

You mentioned that you used 10 batches -- the number of batches would have to depend on the number of constraints involved with the most-constrained body, right?
Actually, if you just check out the Bullet 2.75 beta1, compile the demos, and run Gpu2dDemo or Gpu3dDemo, you can check the batches on-screen. The tests show that typically less than 10 batches are required, we just use a configurable limit to 20, with an early-out once all constraints have been assigned a batch. Indeed, it is possible to create a worst case by attaching many constraints to a single dynamic rigid body.
Right. We are working with the authors of that Intel paper, Mikhail Smelyanskiy and others (they are just around the corner here in the bay area).

Splitting the constraints in batches just a starting point. The current Bullet CUDA parallel PGS implementation performs the batching on CPU. It is based on Takahiro Harada's work, and his implementation already performs batching and all other stages in parallel on GPU as well. We plan on providing a parallel GPU implementation of the batching in OpenCL in one of the upcoming Bullet releases.

Thanks,
Erwin
raigan2
Posts: 197
Joined: Sat Aug 19, 2006 11:52 pm

Re: Solving a single island across multiple cores

Post by raigan2 »

Erwin Coumans wrote: Actually, if you just check out the Bullet 2.75 beta1, compile the demos, and run Gpu2dDemo or Gpu3dDemo, you can check the batches on-screen.
I'm looking forward to checking them out once I get a new graphics card :)
Johan Gidlund
Posts: 26
Joined: Mon Jun 01, 2009 2:21 pm
Location: Sweden

Re: Solving a single island across multiple cores

Post by Johan Gidlund »

I'm hoping to try doing some implementations of this on Larabee as soon as we get some testing hardware.

Have you done any benchmarking to see how your bullet implementations scales with lots of cores?
If so, is this test data available somewhere?