So you suggest writing a second BulletMultiThreaded will be faster, easier and "better" than fixing the current one?
Borh rewriting the current BulletMultiThreaded and rewriting a new one are valid options. There is an upcoming contribution that could be a good alternative to the new multi threaded version. We can't provide more info about this contribution right now.
For the existing BulletMultiThreaded: we will need to review the existing BulletMultiThreaded and see what parts need to be optimized to take benefit of multi-core systems with shared cache.
The Cell SPU DMA has been implemented as an expensive memcpy, and some other slow operations should be replaced, and some prefetch operations added etc.
We received a fast multi-core system (thanks Intel!) so once we have time we can optimize things. Also the constraint solver lacks a lot of optimizations that are in the regular solver.
Hope this helps,
Erwin