Using perf on Linux =================== Introduction ------------ ``perf`` is a tool to analyze the performance of applications and of the kernel, on Linux-based systems. It relies on syscall ``perf_event_open`` (http://man7.org/linux/man-pages/man2/perf_event_open.2.html) to access performance monitoring facilities provided by the kernel. These facilities consist in: * tracepoints (probes) in the kernel, the C library (glibc), some interpreters, etc. * processor counters from the PMU (Performance Instrumentation Unit), like Intel PMC (Performance Monitoring Counter), replaced by Intel PCM (Processor Counter Monitor), or PIC (Performance Instrumentation Counter) * hardware-assisted tracing, like Intel PT (Processor Tracing) The access of the performance events system by unprivileged users is configured through sysctl ``kernel.perf_event_paranoid`` (file ``/proc/sys/kernel/perf_event_paranoid``). The value of this setting is documented on https://www.kernel.org/doc/Documentation/sysctl/kernel.txt: * ``-1``: Allow use of (almost) all events by all users. Ignore ``mlock`` limit after ``perf_event_mlock_kb`` without ``CAP_IPC_LOCK`` * ``>= 0``: Disallow ``ftrace`` function tracepoint by users without ``CAP_SYS_ADMIN``. Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN`` * ``>= 1``: Disallow CPU event access by users without ``CAP_SYS_ADMIN`` * ``>= 2``: Disallow kernel profiling by users without ``CAP_SYS_ADMIN`` Usage ----- The tool named ``perf`` works with subcommands (``stat``, ``record``, ``report``...). .. code-block:: sh # Enumerate all symbolic event types perf list # Look for events related to KVM hypervisor perf list 'kvm:*' In order to collect several statistics about a command: .. code-block:: sh perf stat $COMMAND Example with ``uname``: .. code-block:: text # perf stat uname Linux Performance counter stats for 'uname': 0.50 msec task-clock # 0.551 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 67 page-faults # 0.133 M/sec 1,837,945 cycles # 3.656 GHz 1,266,497 instructions # 0.69 insn per cycle 284,608 branches # 566.071 M/sec 8,956 branch-misses # 3.15% of all branches 0.000911814 seconds time elapsed 0.001001000 seconds user 0.000000000 seconds sys In order to record a trace of a command: .. code-block:: sh perf record $COMMAND # --branch-any: enable taken branch stack sampling # --call-graph=dwarf: enable call-graph (stack chain/backtrace) recording with DWARF information perf record --branch-any --call-graph=dwarf $COMMAND # Record a running process during 30 seconds # -a = --all-cpus: system-wide collection from all CPUs # -g (like --call-graph=fp): enable call-graph (stack chain/backtrace) recording # -p = --pid: record events on existing process ID (comma separated list) timeout 30s perf record -a -g -p $(pidof $MYPROCESS) This creates a file named ``perf.data``, that can be analyzed with other subcommands. .. code-block:: sh # Show perf.data in an ncurses browser (TUI) if possible perf report # Dump the raw trace in ASCII perf report -D perf report --dump-raw-trace # Display the trace output perf script # Show perf.data as: # * a text report # * with a column for sample count # * with call stacks # * with data coalesced and percentages perf report --stdio -n -g folded # List fields of header if the record was done with option -a perf script --header -F comm,pid,tid,cpu,time,event,ip,sym,dso The trace can also be analyzed with a GUI such as https://github.com/KDAB/hotspot. When Intel PT (Processor Tracing) is available on the CPU, the following commands can be used to trace a program (from https://lkml.org/lkml/2019/11/27/160): .. code-block:: sh perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' $COMMAND perf script -F +brstackinsn --xed --itrace=i1usl100 More recent versions of ``perf`` introduced an equivalent of ``strace`` without using the ``ptrace`` syscall: .. code-block:: sh perf trace --call-graph=dwarf $COMMAND # Or, with perf record: perf record -e 'raw_syscalls:*' $COMMAND # Trace with "augmented syscalls" (in order to see string parameters, for example) perf trace -e /usr/lib/perf/examples/bpf/augmented_raw_syscalls.c $COMMAND Flame Graphs ------------ Using https://github.com/brendangregg/FlameGraph, it is very simple to produce a flamegraph out of a trace. This can be useful for example to find in a program what functions take much time and need to be better optimized. .. code-block:: sh # Record stack samples at 99 Hertz during 60 seconds # (both userspace and kernel-space stacks, all processes) perf record -F 99 -a -g -- sleep 60 # Fold the stacks into a text file perf script | ./stackcollapse-perf.pl --all > out.folded # Filter on names of processes, functions... and create a flamegraph grep my_application < out.folded | ./flamegraph.pl --color=java > graph.svg Another project enables producing flamegraphs for Rust projects: https://github.com/ferrous-systems/flamegraph Documentation ------------- * https://perf.wiki.kernel.org/index.php/Tutorial perf Wiki - Tutorial * http://www.brendangregg.com/perf.html Linux perf Examples, documentation, links, and more! * http://www.brendangregg.com/flamegraphs.html Flame Graphs * https://github.com/brendangregg/perf-tools perf-tools GitHub project * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-record.txt perf-record man page * https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/ Transparent Hugepages: measuring the performance impact * https://twitter.com/b0rk/status/945900285460926464 perf cheat sheet by ulia Evans