glibc Compatibility Problem Solved
April 30, 2008 8:30 AM
Filed Under: Adaptive Server Enterprise (ASE) Linux Kernel
I am happy to say that we have fixed a nuisance compatibility issue.
As many of you know ASE has trouble getting along with glibc 2.4 or newer. This has led to the oft referenced LD_POINTER_GUARD environment variable solution to get ASE to boot on SLES 10 and RHEL 5. It has also caused ASE to simply not work on many of the bleeding edge distros.
I am happy to say that this is now solved. The work was done under CR 479363 and is available in 15.0.2 ESD #4 and 15.0.1 Cluster Edition ESD #2. The LD_POINTER_GUARD workaround is no longer necessary and Chris Brown has posted that he has finally been able to upgrade to Ubuntu Hardy.
What was the cause of this? ASE implements its own internal threading system. To switch from thread to thread (on Linux) we used the standard setjmp / longjmp library calls. setjmp essentially takes a snap shot of your context. This snapshot (jump buffer) is later passed to longjmp to switch back to the previous context. The trick is that our threads have their stacks in shared memory, which allows them to be scheduled across different engines (OS processes). This is part of the Virtual Server Architecture we have has in place well before it was hip to be virtual. When we spawn a new thread there is no existing jump buffer to longjmp to, so we need to craft one by hand and put the stack pointer into shared memory. That is the problem.
In glibc 2.4 a change was made to swizzle the stack pointer in the jump buffer. setjmp takes the current stack pointer and swizzles it, and then longjmp unswizzles it on the other end. Because we hand craft the initial stack pointer, the initial jump buffer contained an unmodified stack pointer. When longjmp unswizzled it, garbage came out and that led to bus errors or segmentation violations (core dumps).
This is an interesting example of what happens when you poke around in somebody else’s data structure. The jump buffer (and the stack pointer contained within) is the property of the C runtime library. ASE really had no business poking its own values in there. It worked for many years, but because the C runtime owns the data, it was free to change it around, which broke ASE’s assumption.
A while ago I wrote a prototype to fix this using the officially supported context APIs from the C library, but then I got swallowed up by the cluster project. Fortunately my colleague Chaitanya has picked up the pieces and capably done the implementation. The end result is that this issue has hopefully been put to bed forever.
Cheers,
Dave
Posted by David Wein on April 30, 2008 8:30 AM
Comments
Michael Peppler email - www.peppler.org
Hi
I can confirm this - I've installed 15.0.2 ESD 4 under Fedora Core 8 with no problems whatsoever.
Michael



