Without taking any measurements, reading up a bit on Graphite's architecture can help shed some light on what is going on under the hood. At the very minimum, we know the following about the application:
- Receives metrics over a network interface
- Stores metrics to disk
- Generates graphs in response to requests over a network interface
- I/O-bound from disk I/O
- I/O bound from network I/O
- CPU-bound from I/O co-ordination
I'll begin with examining CPU utilization of the process.
pidstat -u -p 25430 1Here, I've used the `pidstat` utility while specifying the PID of the carbon-relay instance, setting the CPU statistics flag and requesting updates every 1 second.
Linux 3.13.0-43-generic (stats.tunnelbear.com) 16-02-10 _x86_64_ (4 CPU)
03:24:05 AM UID PID %usr %system %guest %CPU CPU Command
03:24:06 AM 999 25430 57.00 2.00 0.00 59.00 1 carbon-relay.py
03:24:07 AM 999 25430 49.00 2.00 0.00 51.00 2 carbon-relay.py
03:24:08 AM 999 25430 71.00 2.00 0.00 73.00 2 carbon-relay.py
03:24:09 AM 999 25430 50.00 2.00 0.00 52.00 0 carbon-relay.py
03:24:10 AM 999 25430 26.00 2.00 0.00 28.00 2 carbon-relay.py
03:24:11 AM 999 25430 38.00 1.00 0.00 39.00 2 carbon-relay.py
03:24:12 AM 999 25430 38.00 1.00 0.00 39.00 2 carbon-relay.py
03:24:13 AM 999 25430 44.00 2.00 0.00 46.00 2 carbon-relay.py
03:24:14 AM 999 25430 30.00 2.00 0.00 32.00 2 carbon-relay.py
03:24:15 AM 999 25430 35.00 0.00 0.00 35.00 2 carbon-relay.py
03:24:16 AM 999 25430 68.00 2.00 0.00 70.00 2 carbon-relay.py
03:24:17 AM 999 25430 76.00 1.00 0.00 77.00 2 carbon-relay.py
03:24:18 AM 999 25430 70.00 3.00 0.00 73.00 2 carbon-relay.py
03:24:19 AM 999 25430 41.00 4.00 0.00 45.00 2 carbon-relay.py
03:24:20 AM 999 25430 36.00 2.00 0.00 38.00 2 carbon-relay.py
03:24:21 AM 999 25430 32.00 2.00 0.00 34.00 2 carbon-relay.py
03:24:22 AM 999 25430 28.00 1.00 0.00 29.00 2 carbon-relay.py
03:24:23 AM 999 25430 38.00 2.00 0.00 40.00 2 carbon-relay.py
03:24:24 AM 999 25430 53.00 2.00 0.00 55.00 0 carbon-relay.py
03:24:25 AM 999 25430 71.00 1.00 0.00 72.00 2 carbon-relay.py
03:24:26 AM 999 25430 78.00 3.00 0.00 81.00 2 carbon-relay.py
^C
Average: 999 25430 49.00 1.86 0.00 50.86 - carbon-relay.py
Some interesting observations are:
- The `carbon-relay` instance is typically serviced by CPU 2 but it is not strictly pinned to it.
- The average total percentage of CPU time used by the `carbon-relay` instance is around 50%.
- Around 96% of the time the `carbon-relay` instance is executing at the user level leaving a small percentage of associated work within kernel space.
pidstat -u -p 24569 1Using a similar `pidstat` call while targeting the PID of a carbon-cache instance yields the following comparative observations:
Linux 3.13.0-43-generic (stats.tunnelbear.com) 16-02-10 _x86_64_ (4 CPU)
03:30:35 AM UID PID %usr %system %guest %CPU CPU Command
03:30:36 AM 999 24569 16.00 8.00 0.00 24.00 1 carbon-cache.py
03:30:37 AM 999 24569 16.00 6.00 0.00 22.00 2 carbon-cache.py
03:30:38 AM 999 24569 10.00 4.00 0.00 14.00 2 carbon-cache.py
03:30:39 AM 999 24569 7.00 2.00 0.00 9.00 2 carbon-cache.py
03:30:40 AM 999 24569 6.00 3.00 0.00 9.00 2 carbon-cache.py
03:30:41 AM 999 24569 8.00 4.00 0.00 12.00 2 carbon-cache.py
03:30:42 AM 999 24569 9.00 2.00 0.00 11.00 3 carbon-cache.py
03:30:43 AM 999 24569 18.00 10.00 0.00 28.00 0 carbon-cache.py
03:30:44 AM 999 24569 12.00 5.00 0.00 17.00 3 carbon-cache.py
03:30:45 AM 999 24569 8.00 3.00 0.00 11.00 2 carbon-cache.py
03:30:46 AM 999 24569 17.00 6.00 0.00 23.00 1 carbon-cache.py
03:30:47 AM 999 24569 4.00 3.00 0.00 7.00 3 carbon-cache.py
03:30:48 AM 999 24569 5.00 0.00 0.00 5.00 1 carbon-cache.py
03:30:49 AM 999 24569 11.00 4.00 0.00 15.00 1 carbon-cache.py
03:30:50 AM 999 24569 17.00 8.00 0.00 25.00 1 carbon-cache.py
03:30:51 AM 999 24569 10.00 8.00 0.00 18.00 1 carbon-cache.py
03:30:52 AM 999 24569 8.00 3.00 0.00 11.00 1 carbon-cache.py
03:30:53 AM 999 24569 5.00 3.00 0.00 8.00 0 carbon-cache.py
03:30:54 AM 999 24569 15.00 7.00 0.00 22.00 1 carbon-cache.py
03:30:55 AM 999 24569 11.00 6.00 0.00 17.00 1 carbon-cache.py
03:30:56 AM 999 24569 4.00 1.00 0.00 5.00 0 carbon-cache.py
03:30:57 AM 999 24569 12.00 8.00 0.00 20.00 0 carbon-cache.py
03:30:58 AM 999 24569 15.00 5.00 0.00 20.00 0 carbon-cache.py
03:30:59 AM 999 24569 19.00 10.00 0.00 29.00 0 carbon-cache.py
03:31:00 AM 999 24569 9.90 5.94 0.00 15.84 0 carbon-cache.py
03:31:01 AM 999 24569 12.00 6.00 0.00 18.00 1 carbon-cache.py
03:31:02 AM 999 24569 9.00 7.00 0.00 16.00 0 carbon-cache.py
03:31:03 AM 999 24569 3.00 1.00 0.00 4.00 2 carbon-cache.py
03:31:04 AM 999 24569 14.00 4.00 0.00 18.00 1 carbon-cache.py
03:31:05 AM 999 24569 10.00 3.00 0.00 13.00 1 carbon-cache.py
03:31:06 AM 999 24569 18.00 9.00 0.00 27.00 1 carbon-cache.py
^C
Average: 999 24569 10.93 5.00 0.00 15.93 - carbon-cache.py
- The `carbon-relay` instance taxes processing resources more than the previously examined `carbon-cache` instance.
- The `carbon-relay` instance executes mainly in user space while it's counterpart operates at a 2:1 ratio between user and kernel space (most likely due to disk I/O).
01:22:44 AM UID PID cswch/s nvcswch/s Command
Average: 999 12028 66.94 2.47 carbon-cache.py
Average: 999 12043 71.11 164.32 carbon-relay.pyThe average number of voluntary context switches per second are similar while average non-voluntary context switches are much higher for the `carbon-relay` instance.
Voluntary context switches occur when a task blocks because it requires a resource that is unavailable. Comparing these two processes in terms of voluntary context switches, it seems as if the values are quite close. Other unrelated processes on the system have values closer to zero while also having CPU time quite low. Although these processes are functionally related due to them being apart of the same application, they have completely different CPU workloads. Their similarity in voluntary context switch rate could use some more investigation.
Non-voluntary context switches occur when a task's time slice runs to completion and it is forced away from processing in favour of the next task in line. Looking at the averages above, the non-voluntary context switches per second that the `carbon-relay` instance is experiencing sticks out and could use some further investigation.
Digging deeper past this initial triage can be done with other tools such as SystemTap or perf to uncover the code paths contributing to this performance.