Using perf on Linux¶

Introduction¶

perf is a tool to analyze the performance of applications and of the kernel, on Linux-based systems. It relies on syscall perf_event_open (http://man7.org/linux/man-pages/man2/perf_event_open.2.html) to access performance monitoring facilities provided by the kernel. These facilities consist in:

tracepoints (probes) in the kernel, the C library (glibc), some interpreters, etc.
processor counters from the PMU (Performance Instrumentation Unit), like Intel PMC (Performance Monitoring Counter), replaced by Intel PCM (Processor Counter Monitor), or PIC (Performance Instrumentation Counter)
hardware-assisted tracing, like Intel PT (Processor Tracing)

The access of the performance events system by unprivileged users is configured through sysctl kernel.perf_event_paranoid (file /proc/sys/kernel/perf_event_paranoid). The value of this setting is documented on https://www.kernel.org/doc/Documentation/sysctl/kernel.txt:

-1: Allow use of (almost) all events by all users. Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN. Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

Usage¶

The tool named perf works with subcommands (stat, record, report…).

# Enumerate all symbolic event types
perf list

# Look for events related to KVM hypervisor
perf list 'kvm:*'

In order to collect several statistics about a command:

perf stat $COMMAND

Example with uname:

# perf stat uname
Linux

 Performance counter stats for 'uname':

              0.50 msec task-clock                #    0.551 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                67      page-faults               #    0.133 M/sec
         1,837,945      cycles                    #    3.656 GHz
         1,266,497      instructions              #    0.69  insn per cycle
           284,608      branches                  #  566.071 M/sec
             8,956      branch-misses             #    3.15% of all branches

       0.000911814 seconds time elapsed

       0.001001000 seconds user
       0.000000000 seconds sys

In order to record a trace of a command:

perf record $COMMAND

# --branch-any: enable taken branch stack sampling
# --call-graph=dwarf: enable call-graph (stack chain/backtrace) recording with DWARF information
perf record --branch-any --call-graph=dwarf $COMMAND

# Record a running process during 30 seconds
# -a = --all-cpus: system-wide collection from all CPUs
# -g (like --call-graph=fp): enable call-graph (stack chain/backtrace) recording
# -p = --pid: record events on existing process ID (comma separated list)
timeout 30s perf record -a -g -p $(pidof $MYPROCESS)

This creates a file named perf.data, that can be analyzed with other subcommands.

# Show perf.data in an ncurses browser (TUI) if possible
perf report

# Dump the raw trace in ASCII
perf report -D
perf report --dump-raw-trace

# Display the trace output
perf script

# Show perf.data as:
# * a text report
# * with a column for sample count
# * with call stacks
# * with data coalesced and percentages
perf report --stdio -n -g folded

# List fields of header if the record was done with option -a
perf script --header -F comm,pid,tid,cpu,time,event,ip,sym,dso

The trace can also be analyzed with a GUI such as https://github.com/KDAB/hotspot.

When Intel PT (Processor Tracing) is available on the CPU, the following commands can be used to trace a program (from https://lkml.org/lkml/2019/11/27/160):

perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' $COMMAND
perf script -F +brstackinsn --xed --itrace=i1usl100

More recent versions of perf introduced an equivalent of strace without using the ptrace syscall:

perf trace --call-graph=dwarf $COMMAND

# Or, with perf record:
perf record -e 'raw_syscalls:*' $COMMAND

# Trace with "augmented syscalls" (in order to see string parameters, for example)
perf trace -e /usr/lib/perf/examples/bpf/augmented_raw_syscalls.c $COMMAND

Flame Graphs¶

Using https://github.com/brendangregg/FlameGraph, it is very simple to produce a flamegraph out of a trace. This can be useful for example to find in a program what functions take much time and need to be better optimized.

# Record stack samples at 99 Hertz during 60 seconds
# (both userspace and kernel-space stacks, all processes)
perf record -F 99 -a -g -- sleep 60

# Fold the stacks into a text file
perf script | ./stackcollapse-perf.pl --all > out.folded

# Filter on names of processes, functions... and create a flamegraph
grep my_application < out.folded | ./flamegraph.pl --color=java > graph.svg

Another project enables producing flamegraphs for Rust projects: https://github.com/ferrous-systems/flamegraph

Documentation¶

https://perf.wiki.kernel.org/index.php/Tutorial perf Wiki - Tutorial
http://www.brendangregg.com/perf.html Linux perf Examples, documentation, links, and more!
http://www.brendangregg.com/flamegraphs.html Flame Graphs
https://github.com/brendangregg/perf-tools perf-tools GitHub project
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-record.txt perf-record man page
https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/ Transparent Hugepages: measuring the performance impact
https://twitter.com/b0rk/status/945900285460926464 perf cheat sheet by ulia Evans

Using perf on Linux¶

Introduction¶

Usage¶

Flame Graphs¶

Documentation¶

Table of Contents

Previous topic

Next topic

This Page