![]() |
Data Center Works Inc |
| Home | Services | About Us | Jobs | Resources | Stories | Tools | Contact us |
On a project some time ago, we happened to be working with an engineer from HotSOS. We were fixing the customer's code, he was fixing the customer's Oracle performance with his (somewhat brilliant) profiler, http://www.hotsos.com/profiler.html
At the time, we thought this was really cool, and wondered what else should be profiled. This year we found out: shared libraries.
An Application that Did Nothing, Slowly
One of our customers has an application that spends all its time doing I/O. Slowly.
So we set out to profile it, and found that it was spending all its time in a low-level read function that did little except call system libraries. So we used the the profiler in truss, and found that it was spending a negligible amount of time in read and write.
With profiles from the program and from the system both saying “not my fault”, we had the computer version of that hardware and software salesmen pointing fingers at each other and saying “blame him”.
Pretending It was Perfmeter
We don't want to say anything more about the real customer program, but while doing a different experiment we found that the gnome-perfmeter program has almost exactly the same behavior. For a minute, let's pretend we are concerned about the performance of perfmeter.
Running truss -c -p <pid>, waiting several minutes and then typing ^C gave us the system-call usage, subject to a reasonably small margin of error in the timing:
$ truss -c -p 950 ^C syscall seconds calls errors read .003 147 15 write .007 121 fstat .002 88 ioctl .004 189 getmsg .001 44 putmsg .006 44 readv .000 20 sysinfo .000 3 pollsys .016 176 -------- ------ ---- sys totals: .043 832 15 usr time: .067 elapsed: 219.270
This means that the most expensive system calls are read and write, but together they only add up to 0.010 seconds system time, out of the 0.043 seconds for all the system calls. And that's out of a total of 0.110 seconds for both user and system time. Not much!
The program had been running for 219.27 seconds, so it had spent 219.17 of them doing nothing whatsoever.
We're not sure what it wasn't doing, but it was doing a lot of it.
Profiling the C library
The next place to look was the c library. However, it is a shared library, not a .a, so we couldn't just compile and link statically and run gprof. We needed a shared library profiler.
In a previous life, we'd developed much of apptrace(1), so shared libraries are old friends. We easily wrote a profiler library which we called libprof, and created the interposers for libc.
For our gnome-perfmeter example, libprof reports:
$ libprof /bin/gnome-perfmeter ^C Resource usage profile for process 3757, gnome-perfmeter Name Duration Pct. Calls Sec/Call ---- ------------ ----- ----- ---------- Entire program 35.90309000 100.0 1 35.90309000 Non-library time 3.45257372 9.62 1 3.45257372 ---- poll 29.03500100 80.87 186 0.15610216 fopen 0.69696600 1.94 110 0.00633605 select 0.56633600 1.58 20 0.02831680 strchr 0.44610100 1.24 392223 0.00000114 access 0.28468800 0.79 1337 0.00021293 ...
This indicates that the poll library call is the bottleneck among library calls, with 186 calls, with an average response time of 0.156 seconds. This is far larger than anything reported by strace/truss, so poll is a likely candidate for the bottleneck function in gnome-perfmeter.
Diagnosing the Problem
It was exactly the same in the customer program, except it called flock(2) right after the poll returned. Then it waited for all the other copies of itself to release their locks before doing any work. When its work was done, it unlocked the file and returned to the poll.
This turned out to be unnecessary: after some discussion and code-reading we discovered that only in one case did it need to take a lock, when the poll returned ready to read from a particular pipe. Once we moved the flock down past the read and into the switch with the pipe read, the program flew. The one case reading the pipe was still slow, but now all the other cases weren't blocked by it.
The libprof script and libraries are available on our tools page..