Training Modules¶
I'll keep notes for longer modules in a dedicated place:
Notes on shorter modules can stay here, for now...
Strace¶
January 2022
The strace
module is brief and will be useful as some situations do benefit from the usage of strace
.
Stage 2¶
Spying on your programs with strace
¶
Notes on Spying on your programs with strace
by the amazing Julia Evans:
- 🐧: Use
strace
. - 🍎: Use
dtrace
/dtruss
.
Consider systems calls as the API or interface for the OS.
- Use
open
and thenread
and thenwrite
. - Use
connect
to open a connection and thensend
andrecv
.
# strace ls 2>&1 | head -n1
execve("/usr/bin/ls", ["ls"], 0x7ffc58113fd0 /* 25 vars */) = 0
The system call used here is execve
. The only argument for the execve
system call is ls
and the exit code is 0
. Cool!
# strace -f ls cat.txt 2>&1 | grep cat
execve("/usr/bin/ls", ["ls", "cat.txt"], 0x7ffc69385350 /* 25 vars */) = 0
stat("cat.txt", {st_mode=S_IFREG|0644, st_size=29, ...}) = 0
lstat("cat.txt", {st_mode=S_IFREG|0644, st_size=29, ...}) = 0
Oooh, this is a nice trick: use ls -l /proc/N/fd
where N
is a process id. That will let you know all of the file descriptors for that process and what they point to! Here's an sample of the output for an idle PostgreSQL process on a GitLab instance:
total 0
0 dr-x------ 2 gitlab-psql gitlab-psql 0 Dec 30 18:36 .
0 dr-xr-xr-x 9 gitlab-psql gitlab-psql 0 Dec 30 18:32 ..
0 lr-x------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:36 0 -> /dev/null
0 l-wx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:36 1 -> pipe:[109584288]
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:36 10 -> /var/opt/gitlab/postgresql/data/base/16386/1247
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:37 100 -> /var/opt/gitlab/postgresql/data/base/16386/31400
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:37 101 -> /var/opt/gitlab/postgresql/data/base/16386/31401
0 lrwx------ 1 gitlab-psql gitlab-psql 64 Dec 30 18:37 102 -> /var/opt/gitlab/postgresql/data/base/16386/31408
There are a lot of syscalls: don't worry about learning them all! Knowing about a few important ones will take you a long way and the man
pages will take you even further.
You should keep these in mind, though:
open
read
execve
A few interesting use cases:
- Which config file is this thing using?
strace -f -e open mplayer Rick_astley.mp3
- Where are the logs being written?
strace -e write
- Who are you talking to?
strace -e connect
A big takeaway here is: read about strace -e
, it's awesome!
Watch where sidekiq
writes logs¶
# strace -e write -p 3010999,2236653,2236616,3010836
strace: Process 3010999 attached
strace: Process 2236653 attached
strace: Process 2236616 attached
strace: Process 3010836 attached
[pid 2236653] write(6, "{\"severity\":\"INFO\",\"time\":\"2022-"..., 1024) = 1024
[pid 2236653] write(6, "{\"severity\":\"WARN\",\"time\":\"2022-"..., 346) = 346
[pid 2236653] write(6, "{\"severity\":\"INFO\",\"time\":\"2022-"..., 1024) = 1024
[pid 2236653] write(6, "\":2.662537,\"redis_calls\":7,\"redi"..., 1024) = 1024
[pid 2236653] write(6, "712Z\",\"load_balancing_strategy\":"..., 63) = 63
You can watch when a process tries to write to access a specific path:
# strace -P /var/log/gitlab/sidekiq/current -p 2236653
strace
command line flags that jvns ❤️¶
-e
it can be used to reduce the amount of output. This is especially useful when you have an idea of what you are looking for.-f
just always add it on, this permits you to follow subprocesses. Remove it when you don't need it to avoid "omg, why isn't anything showing up?".-o
to save the output to a file. Alternately, redirectstderr
so you can pipe the output toless
orgrep
or whatever as I've done above with2>&1
.-p
to follow a running process-y
in newer versions ofstrace
to see filenames instead of file descriptor numbers! Because you are a human and that's OK!
strace
examples¶
HowtoForge has the Linux strace
command tutorial for beginners (8 examples) article.
The -r
flag will include relative timestamps for the different system calls. Use -t
instead for the system's clock time. You can pass -tt
and -ttt
for more granularity.
The -T
flag will show how much time was spent in a system call.
Ooh, that's neat! The -c
flag will print a summary! 🎉
# strace -c ls cat.txt
cat.txt
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 9 read
0.00 0.000000 0 1 write
0.00 0.000000 0 25 close
0.00 0.000000 0 1 stat
0.00 0.000000 0 24 fstat
0.00 0.000000 0 1 lstat
0.00 0.000000 0 40 mmap
0.00 0.000000 0 8 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 2 ioctl
0.00 0.000000 0 8 pread64
0.00 0.000000 0 2 2 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 2 statfs
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 23 openat
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 160 5 total
This could be useful, depending on what it is you are using strace
to do.
About strace-parser
¶
What a great segue! In the README for strace-parser
, we see:
Similar to output provided by the
-c
flag, but with more detail and capable of handling multiple PIDs.
The jvns
zine mentions this a bit but keep performance in mind when using strace
:
Because
strace
may slow down the target system by up to 10x, it is not recommended for use in a production environment except as a last resort
WATCH Using strace
to understand GitLab - Part 1¶
Watch the Using Strace to Understand GitLab - Part 1: Intro video with Will!
- Don't run in a healthy production instance. Another plug for the
Wow Much Syscall
blog linked below. - Ensure via
ps
thatstrace
definitely exited!
A good default set of strace
flags:
strace -f -tt -T -y -yy -s 1024
In the video, we do:
strace -f -tt -T -y -yy -s 1024 -o /tmp/ls_trace.txt ls -la
There are ~219 lines in the ls_trace.txt
, covering 8ms.
This is the first part in the Using Strace to Understand GitLab playlist on GitLab Unfiltered.
Ah, this explains why I saw openat
and was looking for open
.
Ah, so stat
and fstat
are the same except that fstat
retrieves info given a filedescriptor while stat
accepts a pathname.
We arrive at mmap
: read info into memory so you don't have to read from disk. Cool and efficient but you have to know about this when using strace
.
Ah, non-ASCII printable characters will appear as \177ELF\2\1\1\0
...
A value of -1
where the file descriptor goes means: no file, just memory!
For each shared library observed in ldd
, we open, read, mmap
, mprotect
, close...basically.
If no bytes are read: we've arrived at the end of the file. The locale files are read to see what language to use. You might observe that many of them don't exist.
This pattern repeats:
- openat
- fstat
- mmap
- close
After all that, we finally get to the ls
part!
getdents
- Getd
irectoryent
itities.
Stage 3: Challenge¶
Run
strace
onls
.
# strace -f -tt -T -y -yy -s 1024 -o /tmp/strace-ls-cat.txt ls -la cat.txt
-rw-r--r-- 1 root root 29 Dec 30 18:32 cat.txt
# wc -l /tmp/strace-ls-cat.txt
215 /tmp/strace-ls-cat.txt
We need to locate a few items:
The execution command of ls
We see that here:
3099438 19:49:19.646278 execve("/usr/bin/ls", ["ls", "-la", "cat.txt"], 0x7ffd11f6af08 /* 25 vars */) = 0 <0.000187>
Opening a lib file
For this and the next few tasks, I'll use the libpcre2-8.so.0
library. It's opened here:
3099438 19:49:19.647969 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3</usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.9.0> <0.000012>
That same lib file being loaded into memory
It gets loaded in here:
3099438 19:49:19.648088 mmap(NULL, 590632, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.9.0>, 0) = 0x7f5bbf219000 <0.000011>
3099438 19:49:19.648123 mmap(0x7f5bbf21b000, 413696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.9.0>, 0x2000) = 0x7f5bbf21b000 <0.000013>
3099438 19:49:19.648160 mmap(0x7f5bbf280000, 163840, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.9.0>, 0x67000) = 0x7f5bbf280000 <0.000010>
3099438 19:49:19.648195 mmap(0x7f5bbf2a8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.9.0>, 0x8e000) = 0x7f5bbf2a8000 <0.000010>
Finally, that lib file being closed
That happens here:
3099438 19:49:19.648242 close(3</usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.9.0>) = 0 <0.000008>
Stage 4: strace git
¶
Watch Using Strace to Understand GitLab - Part 2: Git.
Let's strace
a few git
commands!
Recall that we like to use something like this:
strace -fttTyyy -s 1024 -o /tmp/git_status git status
strace git status
¶
The output of ldd $(which git)
can guide you to libraries to look for.
Hmm, /dev/null
is opened and immediately closed.
We look around for git
configuration files in all the different places that file can be. Many don't exist but that is ok, fine, normal, expected.
There's a quite a bit of duplicated work (checking the same file multiple times) but the overall elapsed time isn't very long so it's fine.
O_EXCL
Ensure that this call creates the file.
> If O_CREAT and O_EXCL are set, _open_() shall fail if the file exists.
From man open.
The index file is unlinked. Fun: see the impact this can have. There's a related error we've seen on emergencies. Add what you are thinking of below, Brie:
strace git commit
¶
The lseek
syscall will look for a specified location within the current file., relative to the beginning. From the man
page:
reposition read/write file offset
That's odd: /usr/local/libexec/git-core/git
is the same file as what's at /usr/local/bin/git
. Not symlinks just two copies of the same file.
The argument for wait4
is the PID that it's waiting on...
Stage 5: Challenge¶
For a new repository, run strace git commit -a -m ‘initial message’
.
After doing mkdir cat && cd cat && git init . && date > cat.txt && git add cat.txt
on a system where user.email
and user.name
are set properly, I did:
strace -f -tt -T -y -yy -s 1024 -o /tmp/strace-git-commit-am.txt git commit -a -m"Initial message."
The resulting file is 713 lines long. Nearly 1 in 7 entries is ENOENT
(looking for a file or directory that does not exist).
The change from temp file to hash file.
See this line for that change:
955751 19:46:43.322665 link(".git/objects/e8/tmp_obj_OGTNqv", ".git/objects/e8/f6bdf68f3ecb8293a41f9d76b37790a5d2a89a") = 0 <0.000016>
The temp file in question is .git/objects/e8/tmp_obj_OGTNqv
and the hashed object path is .git/objects/e8/f6bdf68f3ecb8293a41f9d76b37790a5d2a89a
. The link syscall makes a new name for an existing file.
The index file being replaced to include the new hash.
If I understand what this module is asking: the line below shows where the index file (.git/index
) is replaced with the new contents from index.lock
:
955751 19:46:43.330415 rename("/home/bcarranza/cat/.git/index.lock", "/home/bcarranza/cat/.git/index") = 0 <0.000110>
Show the exclusive lock of the master file.
In this line, we see the corresponding item in .git/refs/heads
opened with an exclusive lock (O_EXCL
):
955751 19:46:43.326613 openat(AT_FDCWD, "/home/bcarranza/cat/.git/refs/heads/master.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3</home/bcarranza/cat/.git/refs/heads/master.lock> <0.000042>
The child process executing a command.
Observe the different process IDs. The parent process ID is 955751
and the child PID is 955752
. We see the child process executing a command with the execve
syscall in the line shown below:
955752 19:46:43.335458 execve("/usr/lib/git-core/git", ["/usr/lib/git-core/git", "gc", "--auto"], 0x556f3ef2a060 /* 21 vars */ <unfinished ...>
READmore¶
A few links I found helpful along the way.
strace
cheat sheet from the packagecloud blog- The posts that
jvns
has written aboutstrace
- A good introduction is What problems do people solve with
strace
?
- A good introduction is What problems do people solve with
- instruction pointer
- strace Wow Much Syscall - on the impact of
strace
on performance.