http://code.google.com/p/bullet/downloads/list
- *** NEW *** SP1 release fixes several issues related to contact/compounds and adds a SIMD/SSE constraint solver
- Improved CUDA broadphase support. See Bullet/Extras/CUDA and CDTestFramework for a benchmark. It has an excellent worst case, and for large amount of objects it outperform SAP and btDbvt hands down.
- Improved ray test and convex sweep test performance using broadphase acceleration structure
- Added btGhostObject. This helps character controller, explosions, triggers and other local collision queries and short ray tests. It is now used by the btKinematicCharacterController.
- Improved CMake support with 'install', VERSION info and OSX 'framework' support, thanks to ejtttje
- IBM Cell SDK 3.1 build support, thanks to emgruett, Joczhen and danieltracy
- Improved btHeightfieldTerrainShape support and new Demos/TerrainDemo
- Compound shape export to BulletColladaConverter, thanks to JamesH for the report.
- Several fixes thanks to Ole K, related to inertia tensor computation, avoiding non-determinism and more.
- Added Extras/IFF binary chunk serialization library as preparation for in-game native platform serialization (with planned COLLADA physics conversion)
- Added SCE Physics Effects box-box collision detection for SPU/BulletMultiThreaded version, thanks to Sony Computer Entertainment Japan (SCEI)
- Moved BulletMultiThreaded and GIMPACT from Extras to /src/BulletMultiThreaded /src/BulletCollision/Gimpact for better integration
- Removed btPoint3 -> it was a typedef to btVector3. So please find and replace all btPoint3 -> btVector3
- Add CProfileManager::dumpAll() to dump detailed performance statistics to console using printf. Add this call after stepSimulation().
- Reduced default memory pool allocation (from 40Mb -> 3Mb), helpful for iPhone developers.

btCudaBroadphase. Even for 8192 objects with little motion coherence: CUDA (btCudaBroadphase) 6ms, OPCODE Array SAP 37ms, Bullet dynamic AABB tree (btDbvtBroadphase): 12ms. The best case for 8192 for the CUDA broadphase is 4ms, where as SAP, btDbvt are practically 0ms.
Thanks to everyone for feedback, bug fixes and improvements.
See also ChangeLog
Feedback is welcome,
Thanks,
Erwin