Skip to content

Training Modules

I'll keep notes for longer modules in a dedicated place:

Notes on shorter modules can stay here, for now...


January 2022

The strace module is brief and will be useful as some situations do benefit from the usage of strace.

Stage 2

Spying on your programs with strace

Notes on Spying on your programs with strace by the amazing Julia Evans:

  • 🐧: Use strace.
  • 🍎: Use dtrace/dtruss.

Consider systems calls as the API or interface for the OS.

  • Use open and then read and then write.
  • Use connect to open a connection and then send and recv.
# strace ls  2>&1 | head -n1
execve("/usr/bin/ls", ["ls"], 0x7ffc58113fd0 /* 25 vars */) = 0

The system call used here is execve. The only argument for the execve system call is ls and the exit code is 0. Cool!

# strace -f ls cat.txt  2>&1  | grep cat
execve("/usr/bin/ls", ["ls", "cat.txt"], 0x7ffc69385350 /* 25 vars */) = 0
stat("cat.txt", {st_mode=S_IFREG|0644, st_size=29, ...}) = 0
lstat("cat.txt", {st_mode=S_IFREG|0644, st_size=29, ...}) = 0

Oooh, this is a nice trick: use ls -l /proc/N/fd where N is a process id. That will let you know all of the file descriptors for that process and what they point to! Here's an sample of the output for an idle PostgreSQL process on a GitLab instance:

total 0
0 dr-x------ 2 gitlab-psql gitlab-psql  0 Dec 30 18:36 .
0 dr-xr-xr-x 9 gitlab-psql gitlab-psql  0 Dec 30 18:32 ..
0 lr-x------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:36 0 -> /dev/null
0 l-wx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:36 1 -> pipe:[109584288]
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:36 10 -> /var/opt/gitlab/postgresql/data/base/16386/1247
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:37 100 -> /var/opt/gitlab/postgresql/data/base/16386/31400
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:37 101 -> /var/opt/gitlab/postgresql/data/base/16386/31401
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:37 102 -> /var/opt/gitlab/postgresql/data/base/16386/31408

There are a lot of syscalls: don't worry about learning them all! Knowing about a few important ones will take you a long way and the man pages will take you even further.

You should keep these in mind, though:

  • open
  • read
  • execve

A few interesting use cases:

  • Which config file is this thing using?
    • strace -f -e open mplayer Rick_astley.mp3
  • Where are the logs being written?
    • strace -e write
  • Who are you talking to?
    • strace -e connect

A big takeaway here is: read about strace -e, it's awesome!

Watch where sidekiq writes logs
# strace -e write -p 3010999,2236653,2236616,3010836
strace: Process 3010999 attached
strace: Process 2236653 attached
strace: Process 2236616 attached
strace: Process 3010836 attached

[pid 2236653] write(6, "{\"severity\":\"INFO\",\"time\":\"2022-"..., 1024) = 1024
[pid 2236653] write(6, "{\"severity\":\"WARN\",\"time\":\"2022-"..., 346) = 346
[pid 2236653] write(6, "{\"severity\":\"INFO\",\"time\":\"2022-"..., 1024) = 1024
[pid 2236653] write(6, "\":2.662537,\"redis_calls\":7,\"redi"..., 1024) = 1024
[pid 2236653] write(6, "712Z\",\"load_balancing_strategy\":"..., 63) = 63

You can watch when a process tries to write to access a specific path:

# strace -P /var/log/gitlab/sidekiq/current -p 2236653
strace command line flags that jvns ❤️
  • -e it can be used to reduce the amount of output. This is especially useful when you have an idea of what you are looking for.
  • -f just always add it on, this permits you to follow subprocesses. Remove it when you don't need it to avoid "omg, why isn't anything showing up?".
  • -o to save the output to a file. Alternately, redirect stderr so you can pipe the output to less or grep or whatever as I've done above with 2>&1.
  • -p to follow a running process
  • -y in newer versions of strace to see filenames instead of file descriptor numbers! Because you are a human and that's OK!

strace examples

HowtoForge has the Linux strace command tutorial for beginners (8 examples) article.

The -r flag will include relative timestamps for the different system calls. Use -t instead for the system's clock time. You can pass -tt and -ttt for more granularity.

The -T flag will show how much time was spent in a system call.

Ooh, that's neat! The -c flag will print a summary! 🎉

# strace -c ls cat.txt
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         9           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0        25           close
  0.00    0.000000           0         1           stat
  0.00    0.000000           0        24           fstat
  0.00    0.000000           0         1           lstat
  0.00    0.000000           0        40           mmap
  0.00    0.000000           0         8           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           ioctl
  0.00    0.000000           0         8           pread64
  0.00    0.000000           0         2         2 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         2 statfs
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0        23           openat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                   160         5 total

This could be useful, depending on what it is you are using strace to do.

About strace-parser

What a great segue! In the README for strace-parser, we see:

Similar to output provided by the -c flag, but with more detail and capable of handling multiple PIDs.

The jvns zine mentions this a bit but keep performance in mind when using strace:

Because strace may slow down the target system by up to 10x, it is not recommended for use in a production environment except as a last resort

WATCH Using strace to understand GitLab - Part 1

Watch the Using Strace to Understand GitLab - Part 1: Intro video with Will!

  • Don't run in a healthy production instance. Another plug for the Wow Much Syscall blog linked below.
  • Ensure via ps that strace definitely exited!

A good default set of strace flags:

strace -f -tt -T -y -yy -s 1024

In the video, we do:

strace -f -tt -T -y -yy -s 1024 -o /tmp/ls_trace.txt ls -la

There are ~219 lines in the ls_trace.txt, covering 8ms.

This is the first part in the Using Strace to Understand GitLab playlist on GitLab Unfiltered.

Ah, this explains why I saw openat and was looking for open.

Ah, so stat and fstat are the same except that fstat retrieves info given a filedescriptor while stat accepts a pathname.

We arrive at mmap: read info into memory so you don't have to read from disk. Cool and efficient but you have to know about this when using strace.

Ah, non-ASCII printable characters will appear as \177ELF\2\1\1\0...

A value of -1 where the file descriptor goes means: no file, just memory!

For each shared library observed in ldd, we open, read, mmap, mprotect, close...basically.

If no bytes are read: we've arrived at the end of the file. The locale files are read to see what language to use. You might observe that many of them don't exist.

This pattern repeats: - openat - fstat - mmap - close

After all that, we finally get to the ls part!

  • getdents - Get directory entitities.

Stage 3: Challenge

Run strace on ls.

# strace -f -tt -T -y -yy -s 1024 -o /tmp/strace-ls-cat.txt ls -la cat.txt
-rw-r--r-- 1 root root 29 Dec 30 18:32 cat.txt
# wc -l /tmp/strace-ls-cat.txt
215 /tmp/strace-ls-cat.txt

We need to locate a few items:

The execution command of ls

We see that here:

3099438 19:49:19.646278 execve("/usr/bin/ls", ["ls", "-la", "cat.txt"], 0x7ffd11f6af08 /* 25 vars */) = 0 <0.000187>

Opening a lib file

For this and the next few tasks, I'll use the library. It's opened here:

3099438 19:49:19.647969 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/", O_RDONLY|O_CLOEXEC) = 3</usr/lib/x86_64-linux-gnu/> <0.000012>

That same lib file being loaded into memory

It gets loaded in here:

3099438 19:49:19.648088 mmap(NULL, 590632, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/>, 0) = 0x7f5bbf219000 <0.000011>
3099438 19:49:19.648123 mmap(0x7f5bbf21b000, 413696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/>, 0x2000) = 0x7f5bbf21b000 <0.000013>
3099438 19:49:19.648160 mmap(0x7f5bbf280000, 163840, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/>, 0x67000) = 0x7f5bbf280000 <0.000010>
3099438 19:49:19.648195 mmap(0x7f5bbf2a8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/>, 0x8e000) = 0x7f5bbf2a8000 <0.000010>

Finally, that lib file being closed

That happens here:

3099438 19:49:19.648242 close(3</usr/lib/x86_64-linux-gnu/>) = 0 <0.000008>

Stage 4: strace git

Watch Using Strace to Understand GitLab - Part 2: Git.

Let's strace a few git commands!

Recall that we like to use something like this:

strace -fttTyyy -s 1024 -o /tmp/git_status git status

strace git status

The output of ldd $(which git) can guide you to libraries to look for. Hmm, /dev/null is opened and immediately closed. We look around for git configuration files in all the different places that file can be. Many don't exist but that is ok, fine, normal, expected. There's a quite a bit of duplicated work (checking the same file multiple times) but the overall elapsed time isn't very long so it's fine.

O_EXCL Ensure that this call creates the file.

> If O_CREAT and O_EXCL are set, _open_() shall fail if the file exists.

From man open.

The index file is unlinked. Fun: see the impact this can have. There's a related error we've seen on emergencies. Add what you are thinking of below, Brie:

strace git commit

The lseek syscall will look for a specified location within the current file., relative to the beginning. From the man page:

reposition read/write file offset

That's odd: /usr/local/libexec/git-core/git is the same file as what's at /usr/local/bin/git. Not symlinks just two copies of the same file.

The argument for wait4 is the PID that it's waiting on...

Stage 5: Challenge

For a new repository, run strace git commit -a -m ‘initial message’.

After doing mkdir cat && cd cat && git init . && date > cat.txt && git add cat.txt on a system where and are set properly, I did:

strace -f -tt -T -y -yy -s 1024 -o /tmp/strace-git-commit-am.txt git commit -a -m"Initial message."

The resulting file is 713 lines long. Nearly 1 in 7 entries is ENOENT (looking for a file or directory that does not exist).

The change from temp file to hash file.

See this line for that change:

955751 19:46:43.322665 link(".git/objects/e8/tmp_obj_OGTNqv", ".git/objects/e8/f6bdf68f3ecb8293a41f9d76b37790a5d2a89a") = 0 <0.000016>

The temp file in question is .git/objects/e8/tmp_obj_OGTNqv and the hashed object path is .git/objects/e8/f6bdf68f3ecb8293a41f9d76b37790a5d2a89a. The link syscall makes a new name for an existing file.

The index file being replaced to include the new hash.

If I understand what this module is asking: the line below shows where the index file (.git/index) is replaced with the new contents from index.lock:

955751 19:46:43.330415 rename("/home/bcarranza/cat/.git/index.lock", "/home/bcarranza/cat/.git/index") = 0 <0.000110>

Show the exclusive lock of the master file.

In this line, we see the corresponding item in .git/refs/heads opened with an exclusive lock (O_EXCL):

955751 19:46:43.326613 openat(AT_FDCWD, "/home/bcarranza/cat/.git/refs/heads/master.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3</home/bcarranza/cat/.git/refs/heads/master.lock> <0.000042>

The child process executing a command.

Observe the different process IDs. The parent process ID is 955751 and the child PID is 955752. We see the child process executing a command with the execve syscall in the line shown below:

955752 19:46:43.335458 execve("/usr/lib/git-core/git", ["/usr/lib/git-core/git", "gc", "--auto"], 0x556f3ef2a060 /* 21 vars */ <unfinished ...>


A few links I found helpful along the way.