jq For Forensics

jq is a tremendously useful tool for dealing with JSON data. But the documentation that exists seems to be targeted at developers parsing deeply nested JSON structures to transform them into other JSON structures. In my DFIR role, I typically deal with streams of fairly simple JSON records–usually some sort of log– that I need to transform into structured text, such as comma-separated (CSV) or tab-separated (TSV) output. I’ve spent a lot of time running through reference manuals and endless Stack Overflow postings to get to a reasonable level with jq. I wanted to share some of the things I’ve learned along the way.

Start With The Basics

At it’s simplest, jq is an excellent JSON pretty printer:

$ jq . journal.json
{
"_MACHINE_ID": "0f2f13b9dce0451591ae0dc418f6c96f",
"_RUNTIME_SCOPE": "system",
"_HOSTNAME": "vbox",
"_SOURCE_BOOTTIME_TIMESTAMP": "0",
"MESSAGE": "Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)",
"__MONOTONIC_TIMESTAMP": "6400064",
"_SOURCE_MONOTONIC_TIMESTAMP": "0",
"_BOOT_ID": "2a5a598d4f6142c7b7719eed38c1a2b9",
"SYSLOG_IDENTIFIER": "kernel",
"_TRANSPORT": "kernel",
"PRIORITY": "5",
"SYSLOG_FACILITY": "0",
"__CURSOR": "s=0a047604dca842218e0807bc796d4cb7;i=1;b=2a5a598d4f6142c7b7719eed
38c1a2b9;m=61a840;t=64dc728142e95;x=852824913ddff90e",
"__REALTIME_TIMESTAMP": "1774367626505877"
}
{
"_MACHINE_ID": "0f2f13b9dce0451591ae0dc418f6c96f",
"MESSAGE": "Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet",

...

The basic syntax here is “jq <script> <jsonfile> ...“, where <script> is some sort of translation script in jq‘s own particular scripting language. The script “.” is essentially a null transformation that simply tells jq to output whatever it sees in its input <jsonfile>. The default output style for jq is the pretty-printed style you see above.

Some of you will recognize the data above as Systemd journal entries. Normally we would work with the Systemd journal via the journalctl command. But exported journal data from one of my lab systems is a good example set for showing you some useful jq tips and tricks that you can apply to any sort of exported logging stream.

Other Output Modes

Suppose we just wanted to output the “MESSAGE” field from each record. Just specify the field you want to output with a leading “.“:

$ jq .MESSAGE journal.json
"Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

Because the value of the MESSAGE field is a string, jq outputs each message surrounded by double quotes. If you don’t want the quoting, use the “-r” option for raw mode output:

$ jq -r .MESSAGE journal.json
Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

Suppose we wanted to output multiple fields as columns of structured text. jq includes support for both “@csv” and “@tsv” output modes:

$ jq -r '[.__REALTIME_TIMESTAMP, ._HOSTNAME, .MESSAGE] | @csv' journal.json
"1774367626505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"1774367626505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

jq transformation scripts use a pipelining syntax. Here we’re sending the fields we want to output into the “@csv” formatting tool. “@csv” wants its inputs as a JSON array, so we create an array on the fly simply by enclosing the fields we want to output with square brackets (“[..., ..., ...]“). The “@csv” output method automatically quotes each field and handles escaping any double quotes that might be included.

If you want other delimiters besides the traditional commas or tabs, jq can also output arbitrary text:

$ jq -r '"\(.__REALTIME_TIMESTAMP)|\(._HOSTNAME)|\(.MESSAGE)"' journal.json
1774367626505877|vbox|Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
1774367626505925|vbox|Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

Use double quotes ("...") to enclose your output template. Use “\(.fieldname)” to output the value of specific fields. Anything else in your template is output as literal text. Here I’m outputting pipe-delimited text with the same three fields as in our CSV example above.

Note that our output template can use the typical escape sequences like “\t” for tabs. So another way to produce tab-delimited text would be:

$ jq -r '"\(.__REALTIME_TIMESTAMP)\t\(._HOSTNAME)\t\(.MESSAGE)"' journal.json
1774367626505877 vbox Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
1774367626505925 vbox Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

However, it’s almost certainly easier to use '[..., ..., ...] | @tsv' for this.

Transforming Data With Builtin Operators

jq includes a wide variety of builtin operators for data transformation and math. For example, suppose we wanted to format those __REALTIME_TIMESTAMP fields in the Systemd journal into human-readable strings:

$ head -1 journal.json | jq -r '(.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T")'
2026-03-24 15:53:46

There’s a lot going on here, so let’s break it down a bit at a time. __REALTIME_TIMESTAMP is a string– if you look at the pretty-printed output above, the values are displayed in double quotes meaning they are string type values. Ultimately we want to feed the __REALTIME_TIMESTAMP value into strftime() to produce formatted text, but strftime() wants numeric input. The first thing to do then is to convert the string into a number with “tonumber“. The jq piping syntax is how we express this transformation.

Our next problem is that __REALTIME_TIMESTAMP is in microseconds, but strftime() wants good old Unix epoch seconds. So we do some math with the traditional “/” operator for division. This actually converts our value into a decimal number (“1774367626.505877“), but that’s good enough for strftime(). Finally we pipeline the number we calculated into the strftime() function. We give strftime() an appropriate format string to get the output we want.

This works great, but we’re throwing away the microseconds information. What if we wanted to display that as part of the timestamp? Time to introduce some more useful string operations:

$ head -1 journal.json | jq -r '((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + 
(.__REALTIME_TIMESTAMP | .[-6:])'

2026-03-24 15:53:46.505877

Looking at the back part of our expression on the second line above, we are using jq‘s slicing operation “.[start:end]“. Since we are using a negative offset for the start value, we are counting backwards from the end of the string six characters. With no end value specified, it outputs the rest of the string from that point.

Like many other scripting languages, jq supports string concatenation with the addition operator (“+“). Here we are adding the formatted string output from strftime() and the microseconds value we sliced out of the string. Note the the strftime() format has been updated to output a literal “.” between the formatted text and the microseconds.

Suppose we wanted to include the human-readable timestamp we just created instead of the raw epoch microseconds for our “@csv” output. The trick is to take our jq code for producing human readable timestamps and drop it into our “[...] | @csv” pipeline in place of the __REALTIME_TIMESTAMP field:

$ jq -r '[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + (.__REALTIME_TIMESTAMP | .[-6:]), ._HOSTNAME, .MESSAGE] | @csv' journal.json
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

Scripting With jq

Obviously that jq expression is pretty horrible to type on the command line. You can always take any jq script and put it into a text file and then run that script on your data with the “-f” option:

$ jq -r -f csv-journal.jq journal.json
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

In this instance, our csv-journal.jq file is the jq recipe from our command line example, but without the single quotes. Since jq doesn’t care about whitespace in scripts, we can format our recipe with newlines and indentation to make it more readable:

$ cat csv-journal.jq 
[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) +
(.__REALTIME_TIMESTAMP | .[-6:]),
._HOSTNAME, .MESSAGE] | @csv

On Linux systems you can even use jq in a “bang path” at the top of the script so it automatically gets invoked as the interpreter:

$ cat csv-journal.jq
#!/usr/bin/jq -rf

[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) +
(.__REALTIME_TIMESTAMP | .[-6:]),
._HOSTNAME, .MESSAGE] | @csv

Note that the new interpreter path at the top of the script includes the “-rf” options for raw output (“-r“) and interpreting the rest of the file as a script (“-f“).

Once we have the interpreter path at the top of the script, we can just cat our JSON data into the script without invoking jq directly:

$ chmod +x csv-journal.jq 
$ cat journal.json | ./csv-journal.jq
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

This might make things easier for less-technical users.

Selecting Records

When working with streams of records, it’s typical to want to only operate on certain records. For example, suppose we only wanted to see log messages from the “sudo” command. In the Systemd journal, these messages have the “SYSLOG_IDENTIFIER” field set to “sudo“:

$ jq -r 'select(.SYSLOG_IDENTIFIER == "sudo") | .MESSAGE' journal.json
worker : user NOT in sudoers ; TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/2 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
pam_unix(sudo:session): session closed for user root
worker : TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
pam_unix(sudo:session): session closed for user root
worker : TTY=pts/1 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
worker : TTY=pts/3 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
...

The new magic is jq‘s select() operator up at the front of that pipeline. If the conditional you give to select() evaluates to true, then the record you have matched gets passed down for processing by the rest of the pipeline. If not, then that record is skipped.

Logical operators (“and“, “or“, “not“) and parentheses are allowed. And you can do pattern matching with PCRE-like expressions. For example, the really interesting lines in Sudo logs are the ones that show the command being invoked (“COMMAND=“):

$ jq -r 'select(.SYSLOG_IDENTIFIER == "sudo" and (.MESSAGE | test("COMMAND="))) | .MESSAGE' journal.json
worker : user NOT in sudoers ; TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/2 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/1 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/3 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
...

For pattern matching, just pipeline the field you want to match against into the test() operator. Here I’m matching the literal string “COMMAND=” against the MESSAGE field. The pattern match is joined with our original selector for “sudo” in the SYSLOG_IDENTIFIER field using a logical “and“.

Here’s another example showing a useful regex when dealing with SSH logs, just to give you a flavor of things you can do with regular expression matching:

$ jq -r 'select(._COMM == "sshd" and 
(.MESSAGE | test("^((Accepted|Failed) .* for|Invalid user) "))) | .MESSAGE' journal.json

Invalid user mary from 192.168.10.31 port 55746
Failed password for invalid user mary from 192.168.10.31 port 55746 ssh2
Accepted password for hal from 192.168.4.22 port 42310 ssh2
...

Enough For Now

Hopefully this is enough to get you started writing your own basic jq scripts. As with many things, the rest you pick up as you practice and get frustrated. The jq reference manual is useful for checking the syntax of different built-in operators, but I often find the examples more frustrating than helpful. Searching Stack Overflow can often yield more useful results.

Feel free to drop your questions into the comments, or reach out to me via social media or email. Maybe your questions will turn this single blog article into a series!

Linux Forensic Scenario

Let’s try something a little different for today’s blog post!

I’ve been working on ideas for a major update on my Linux forensics class, including new lab scenarios. I recently threw together a rough draft of one of my scenario ideas: built a machine, planted some malware on it, and then used UAC to capture forensic data from the system. I was pleased with the results, and thought I would share them with the larger community.

And then I thought, why not turn it into a bit of a contest? For the moment I haven’t decided on any prizes other than bragging rights, but you never know. I have decided that the deadline for submissions for judging will be April 15th– tax day here in the USA.

The Scenario

You received an escalation from your SOC. They received an alert from their NMS about suspicious traffic to one of the Linux workers in the development group’s CI/CD pipeline. The alert was for unencrypted traffic on port 22/tcp, specifically the string “python3 -c 'import pty; pty.spawn("/bin/bash")'” which triggered the alert for “reverse shell promotion” in the NMS. They note that the system is showing signs of heavy CPU usage but that they don’t see any process(es) that account for this. Following their SOP, they acquired data from the system using UAC and have escalated to you as on-call for the internal IR/Threat team.

Other information about the system:

  • There is a single shared account on the system called “worker“. It has full Sudo privileges with the NOPASSWD option set.
  • All network access to the box is through a jump host at IP 192.168.4.35.
  • The UAC collection is uac-vbox-linux-20260324234043.tar.gz

Additional Comments

I threw this scenario together in a matter of hours, so when you look at the timeline of the system you will see that it got built and then compromised very quickly. For the final scenario I will doubtless do a more complete job running fake workloads for some time before the “attack” actually happens.

Similarly, you’ll probably discover that there is no significant network infrastructure around the compromised system. The “jump host” is really just another host in my lab environment that I was operating from.

But I still think there’s plenty of interesting artifacts to find in this scenario. I’m leaving things deliberately open-ended because I want to see what people come up with. But the goal would be to at least account for the issues raised by the SOC: why is there unencrypted traffic on 22/tcp, why is the system burning CPU, and why can’t the SOC see what is going on? Is the system compromised? When and how did that happen?

Submissions

Submissions for judging must be received no later than 23:59 UTC on 2026-04-15. I will accept submissions in .docx, PDF, or text. You may email your submissions to hrpomeranz@gmail.com. Please try to put something like “Linux Forensic Scenario Submission” in the Subject: line to make my life easier.

Depending on the number of submissions I get, I may need more folks to help with the judging. If you’re not planning to compete but would like to help judge, please drop me a line at the email address above. I’ll let you know if I need the help once I count the number (and length) of the submissions.

Happy forensicating! Have fun!

Linux Notes: ls and Timestamps

There’s an old riddle in Unix circles: “Name a letter that is not an option for the ls command”. The advent of the GNU version of ls has only made this more difficult to answer. Even if you’re a Unix/Linux power user, you’ve probably only memorized a small handful of the available options.

For example, I have “ls -lArt” burned into my brain from my Sys Admin days. “-l” for detailed listing, “-A” to show hidden files and directories (but not the “.” and “..” links like “-a“), sort by last modified time with “-t“, and “-r” to reverse the sort so the newest files appear right above your next shell prompt.

$ ls -lArt
total 1288
-rw-r--r-- 1 root root 9 Aug 7 2006 host.conf
-rw-r--r-- 1 root root 433 Aug 23 2020 apg.conf
-rw-r--r-- 1 root root 26 Dec 20 2020 libao.conf
-rw-r--r-- 1 root root 12813 Mar 27 2021 services
-rw-r--r-- 1 root root 769 Apr 10 2021 profile
-rw-r--r-- 1 root root 449 Nov 29 2021 mailcap.order
-rw-r--r-- 1 root root 119 Jan 10 2022 catdocrc
...
-rw-r--r-- 1 root root 52536 Feb 23 11:44 mailcap
-rw-r--r-- 1 root root 108979 Mar 2 09:24 ld.so.cache
-rw-r--r-- 1 root root 75 Mar 3 18:08 resolv.conf
drwxr-xr-x 5 root lp 4096 Mar 5 19:52 cups

You’ll note that the timestamps are displayed in two different formats. The oldest files show “month day year”, while the newer files show “month day hh:mm”. The default for ls is that files more than six months old display year information.

Personally I prefer consistent ISO-style timestamps with “--time-style=long-iso“:

$ ls -lArt --time-style=long-iso
total 1288
-rw-r--r-- 1 root root 9 2006-08-07 13:14 host.conf
-rw-r--r-- 1 root root 433 2020-08-23 10:52 apg.conf
-rw-r--r-- 1 root root 26 2020-12-20 11:21 libao.conf
-rw-r--r-- 1 root root 12813 2021-03-27 18:32 services
-rw-r--r-- 1 root root 769 2021-04-10 16:00 profile
-rw-r--r-- 1 root root 449 2021-11-29 08:07 mailcap.order
-rw-r--r-- 1 root root 119 2022-01-10 19:08 catdocrc
...
-rw-r--r-- 1 root root 52536 2026-02-23 11:44 mailcap
-rw-r--r-- 1 root root 108979 2026-03-02 09:24 ld.so.cache
-rw-r--r-- 1 root root 75 2026-03-03 18:08 resolv.conf
drwxr-xr-x 5 root lp 4096 2026-03-05 19:52 cups

While “-t” sorts on last modified time by default, other options allow you to sort and display other timestamps. For example, “-u” sorts on and displays last access time. “-u” is hardly memorable as last access time, but remember “-a” is used for something else.

It’s a pain trying to remember the one letter options for the other timestamps– and note there isn’t even a short option for sorting/displaying on file creation time. So I just use “--time=” to pick the timestamp I want:

$ ls -lArt --time=birth --time-style=long-iso
total 1288
-rw-r--r-- 1 root root 1013 2025-04-10 10:27 fstab
drwxr-xr-x 2 root root 4096 2025-04-10 10:27 ImageMagick-6
drwxr-xr-x 2 root root 4096 2025-04-10 10:27 GNUstep
...
-rw-r--r-- 1 root root 142 2026-02-23 11:41 shells
-rw-r--r-- 1 root root 52536 2026-02-23 11:44 mailcap
-rw-r--r-- 1 root root 108979 2026-03-02 09:24 ld.so.cache
-rw-r--r-- 1 root root 75 2026-03-03 18:08 resolv.conf

Here we’re sorting on and displaying file creation times (“--time=birth“). You can use “--time=atime” or “--time=ctime” for the other timestamps.

If this command line seems long and unwieldy, remember that you can create aliases for commands in your .bashrc or other startup files:

alias ls='ls --color=auto --time-style=long-iso'
alias lb='ls -lArt --time=birth'

With normal ls commands, I’ll get colored output always, and “long-iso” dates whenever I use “-l“. I can use lb whenever I want file creation times. Note that alias definitions “stack”– the “lb” alias will get the color and time-style options from my basic “ls” alias, so I don’t need to include the “--time-style” option in the “lb” alias.

A Little More on LKM Persistence

In my previous blog post I demonstrated a method for persisting a Linux LKM rootkit across reboots by leveraging systemd-modules-load. For this method to work, we needed to add the evil module into the /usr/lib/modules/$(uname -r) directory and then run depmod. As I pointed out in the article, while the LKM could hide the module object itself, the modprobe command invoked by systemd-modules-load requires the module name to be listed in the modules.dep and modules.dep.bin files created by depmod.

But a few days later it occurred to me that the module name actually only has to appear in the modules.dep.bin file in order to be loaded. modules.dep is an intermediate file that modules.dep.bin is built from. The modprobe command invoked by systemd-modules-load only looks at the (trie structured) modules.dep.bin file. So once modules.dep.bin is created, the attacker could go back and remove their evil module name from modules.dep.

I tested this on my lab system, installing the LKM per my previous blog post and then editing the evil module name out of modules.dep. When I rebooted my lab system, I verfied that the evil module was loaded by looking for the files that are hidden by the rootkit:

# ls /usr/lib/modules-load.d/
fwupd-msr.conf open-vm-tools-desktop.conf
# ls /usr/lib/modules/$(uname -r)/kernel/drivers/block
aoe drbd loop.ko nbd.ko pktcdvd.ko rsxx sx8.ko virtio_blk.ko xen-blkfront.ko
brd.ko floppy.ko mtip32xx null_blk.ko rbd.ko skd.ko umem.ko xen-blkback zram

If the rootkit was not operating, we’d see the zaq123edcx* file in each of these directories.

I thought about writing some code to unpack the format of modules.dep.bin. This format is well documented in the comments of the source code for depmod.c. But then I realized that there was a much easier way to find the evil module name hiding in modules.dep.bin.

depmod works by walking the directory structure under /usr/lib/modules/$(uname -r) and creating modules.dep based on what it finds there. If we run depmod while the LKM is active, then depmod will not see the evil kernel object and will build a new modules.dep and modules.dep.bin file without the LKM object listed:

# cd /usr/lib/modules/$(uname -r)
# cp modules.dep modules.dep.orig
# cp modules.dep.bin modules.dep.bin.orig
# depmod
# diff modules.dep modules.dep.orig
# diff modules.dep.bin modules.dep.bin.orig
Binary files modules.dep.bin and modules.dep.bin.orig differ

The old and new modules.dep files are the same, since I had previously removed the evil module name by hand. But the *.bin* files differ because the evil module name is still lurking in modules.dep.bin.orig.

And I don’t need to write code to dump the contents of modules.dep.bin.orig— I’ll just use strings and diff:

# diff <(strings -a modules.dep.bin.orig) <(strings -a modules.dep.bin)
1c1
< ?=4_cs
---
> 4_cs
5610,5611d5609
< 123edcx_diamorphine
< <kernel/drivers/block/zaq123edcx-diamorphine.ko:
5616c5614
< 4ndemod
---
> demod
5619c5617
< enhua
---
> 5jenhua
5622a5621
> 7@53
5627d5625
< 8Z2c
5630c5628
< a2326
---
> 9Ja2326
5635c5633
< alloc
---
> <valloc

The output would be prettier with some custom tooling, but you can clearly see the name of the hidden object in the diff output.

From an investigative perspective, I really wish depmod had an option to write modules.dep.bin to an alternate directory. That would make it easier to perform these steps without modifying the state of the system under investigation. I suppose we could use overlayfs hacks to make this happen.

But honestly using modprobe to load your LKM rootkit is probably not the best approach. insmod allows you to specify the path to your evil module. Create a script that uses insmod to load the rootkit, and then drop the script into /etc/cron.hourly with a file name that will be hidden once the rootkit is loaded. Easy!

Linux LKM Persistence

Back in August, Ruben Groenewoud posted two detailed articles on Linux persistence mechanisms and then followed that up with a testing/simulation tool called PANIX that implements many of these persistence mechanisms. Ruben’s work was, in turn, influenced by a series of articles by Pepe Berba and work by Eder Ignacio. Eder’s February article on persistence with udev rules seems particularly prescient after Stroz/AON reported in August on a long-running campaign using udev rules for persistence. I highly recommend all of this work, and frankly I’m including these links so I personally have an easy place to go find them whenever I need them.

In general, all of this work focuses on using persistence mechanisms for running programs in user space. For example, PANIX sets up a simple reverse shell by default (though the actual payload can be customized) and the “sedexp” campaign described by Stroz/AON used udev rules to trigger a custom malware executable.

Reading all of this material got my evil mind working, and got me thinking about how I might handle persistence if I was working with a Linux loadable kernel module (LKM) type rootkit. Certainly I could use any of the user space persistence mechanisms in PANIX that run with root privilege (or at least have CAP_SYS_MODULE capability) to call modprobe or insmod to load my evil kernel module. But what about other Linux mechanisms for specifically loading kernel modules at boot time?

Hiks Gerganov has written a useful article summarizing how to load Linux modules at boot time. If you want to be traditional, you can always put the name of the module you want to load into /etc/modules. But that seems a little too obvious, so instead we are going to use the more flexible systemd-modules-load service to get our evil kernel module installed.

systemd-modules-load looks in multiple directories for configuration files specifying modules to load, including /etc/modules-load.d, /usr/lib/modules-load.d, and /usr/local/lib/modules-load.d. systemd-modules-load also looks in /run/modules-load.d, but /run is typically a tmpfs style file system that does not persist across reboots. Configuration file names must end with “.conf” and simply contain the names of the modules to load, one name per line.

For my examples, I’m going to use the Diamorphine LKM rootkit. Diamorphine started out as a proof of concept rootkit, but a Diamorphine variant has recently been found in the wild. Diamorphine allows you to choose a “magic string” at compile time– any file or directory name that starts with the magic string will automatically be hidden by the rootkit once the rootkit is loaded into the kernel. In my examples I am using the magic string “zaq123edcx“.

First we need to copy the Diamorphine kernel module, typically compiled as diamorphine.ko, into a directory under /usr/lib/modules where it can be found by the modprobe command invoked by systemd-modules-load:

# cp diamorphine.ko /usr/lib/modules/$(uname -r)/kernel/drivers/block/zaq123edcx-diamorphine.ko
# depmod

Note that the directory under /usr/lib/modules is kernel version specific. You can put your evil module anywhere under /usr/lib/modules/*/kernel that you like. Notice that by using the magic string in the file name, we are relying on the rootkit itself to hide the module. Of course, if the victim machine receives a kernel update then your Diamorphine module in the older kernel directory will no longer be loaded and your evil plots could end up being exposed.

The depmod step is necessary to update the /usr/lib/modules/*/modules.dep and /usr/lib/modules/*/modules.dep.bin files. Until these files are updated, modprobe will be unable to locate your kernel module. Unfortunately, depmod puts the path name of your evil module into both of the modules.dep* files. So you will probably want to choose a less obvious name (and magic string) than the one I am using here.

The only other step needed is to create a configuration file for systemd-modules-load:

# echo zaq123edcx-diamorphine >/usr/lib/modules-load.d/zaq123edcx-evil.conf

The configuration file is just a single line– whatever name you copied the evil module to under /usr/lib/modules, but without the “.ko” extension. Here again we name the configuration file with the Diamorphine magic string so the file will be hidden once the rootkit is loaded.

That’s all the configuration you need to do. Load the rootkit manually by running “modprobe zaq123edcx-diamorphine” and rest easy in the knowledge that the rootkit will load automatically whenever the system reboots.

Finding the Evil

What artifacts are created by these changes? The mtime on the /usr/lib/modules-load.d directory and the directory where you installed the rootkit module will be updated. Aside from putting the name of your evil module into the modules.dep* files, the depmod command updates the mtime on several other files under /usr/lib/modules/*:

/usr/lib/modules/.../modules.alias
/usr/lib/modules/.../modules.alias.bin
/usr/lib/modules/.../modules.builtin.alias.bin
/usr/lib/modules/.../modules.builtin.bin
/usr/lib/modules/.../modules.dep
/usr/lib/modules/.../modules.dep.bin
/usr/lib/modules/.../modules.devname
/usr/lib/modules/.../modules.softdep
/usr/lib/modules/.../modules.symbols
/usr/lib/modules/.../modules.symbols.bin

Timestomping these files and directories could make things more difficult for hunters.

But loading the rootkit is also likely to “taint” the kernel. You can try looking at the dmesg output for taint warnings:

# dmesg | grep taint
[ 8.390098] diamorphine: loading out-of-tree module taints kernel.
[ 8.390112] diamorphine: module verification failed: signature and/or required key missing - tainting kernel

However, these log messages can be removed by the attacker or simply disappear due to the system’s normal log rotation (if the machine has been running long enough). So you should also look at /proc/sys/kernel/tainted:

# cat /proc/sys/kernel/tainted
12288

Any non-zero value means that the kernel is tainted. To interpret the value, here is a trick based on an idea in the kernel.org document I referenced above:

# taintval=$(cat /proc/sys/kernel/tainted)
# for i in {0..18}; do [[ $(($taintval>>$i & 1)) -eq 1 ]] && echo $i; done
12
13

Referring to the kernel.org document, bit 12 being set means an “out of tree” (externally built) module was loaded. Bit 13 means the module was unsigned. Notice that these flags correspond to the log messages found in the dmesg output above.

While this is a useful bit of command-line kung fu, I thought it might be useful to have in a more portable format and with more verbose output. So I present to you chktaint.sh:

$ chktaint.sh
externally-built (“out-of-tree”) module was loaded
unsigned module was loaded

By default chktaint.sh reads the value from /proc/sys/kernel/tainted on the live system. But in many cases you may be looking at captured evidence offline. So chktaint.sh also allows you to specify an alternate file path (“chktaint.sh /path/to/evidence/file“) or simply a raw numeric value from /proc/sys/kernel/tainted (“chktaint.sh 12288“).

The persistence mechanism(s) deployed by the attacker are often the best way to detect whether or not a system is compromised. If the attacker is using an LKM rootkit, checking /proc/sys/kernel/tainted is often a good first step in determining if you have a problem. This can be combined with tools like chkproc (find hidden processes) and chkdirs (find hidden directories) from the chkrootkit project.

The Emperor’s New Clothes

My good friend Matt and I graduated college the same year. I went off into the work world and he headed for a graduate degree program in nuclear engineering. Much of the research effort in nuclear engineering is centered around developing sustainable fusion technology. Matt quickly realized that something was off.

So he went to his faculty adviser, who had been pursuing fusion research for several decades, and asked him, “I’m in my early 20’s, do you think that we will achieve viable fusion technology in my lifetime?”

The advisor’s answer was an involved discussion of “No.” Sustainable fusion technology involves an entire collection of problems that we are not close to solving. The materials science alone required to construct a vessel to hold the fusion reaction and extract power from it safely is well beyond our current capabilities even decades after my friend had the conversation with his advisor.

Happily for my friend, he had this conversation before he had sunk too much time into his research. Matt bailed out of nuclear engineering, changed his research focus, and has had a highly successful career in engineering education.

Meanwhile, I had been lucky enough to land a job doing Unix support at AT&T Bell Labs. One of the projects we supported was a research group that was working to develop a bespoke system that implemented Karmarkar’s algorithm for linear programming. This was an enormous project that employed hundreds of developers and consumed huge amounts of resources. The customers were the major airlines– scheduling aircraft and the flight crews that staff them is a classic problem in linear programming that directly impacts the bottom line of these companies.

You likely have never heard of Karmarkar’s algorithm, except perhaps for the controversy around it. Initially hailed as a major step forward that would revolutionize linear programming, its detractors claimed that, upon closer scrutiny, this so-called “revolutionary” algorithm was just a combination of known heuristics and speedups. It was not a substantial improvement over existing algorithms of the time.

I never studied the algorithm enough to determine which side’s claims were correct. What I do know is that the airlines pulled their funding and AT&T’s project was scuttled. The IT support team came in on Monday and everybody who was working on that project was literally gone. We moved through their empty office space for the next week collecting computer equipment to be repurposed for other projects. Some of the developers got shifted to other projects as well, but I imagine many people suddenly found themselves looking for work.

The airlines poured millions of dollars into a project that produced exactly nothing of value. Governments around the world continue to pour billions into fusion research with little to show for it and very little hope of fusion power in our lifetimes. Why is so much time, effort, and money being wasted?

These projects have several factors in common. Their goal is highly desirable: a “revolution” that would reshape the world as we know it, or at least an entire industry. The path involves highly complex technology that is impenetrable to a non-specialist: a complex algorithm or deep scientific research necessary to invent things that have never been done before. And they require massive amounts of funding.

This is a perfect recipe for bad decision making or outright fraud. People will sacrifice a great deal to achieve a significant goal. Because the path to that goal is difficult to comprehend, people will fool themselves into thinking the solution is “just around the corner”. Critical thinking skills fly out the window as people focus on the goal and can’t or won’t focus on the process to get there.

And when the project attracts unscrupulous operators who realize that there is money to be made in prolonging the effort, you have the makings of a bezzle. The unscrupulous promise a wonderful new world but use any excuse to keep extracting money from the situation. When challenged about their lack of results they just say, “Technology is complex and unpredictable, but I swear we are almost there!” Technology is a perfect breeding ground for bezzles because we have socialized the idea that computers and technology are inscrutable to mere mortals who must defer to a high priesthood to interpret the signs and omens.

“Generative AI” and “large language models” are the latest techno bezzle. But “AI” is a constant and recurring bezzle that I have seen numerous times in my decades in technology. Remember “machine learning”? Remember “neural networks”? I have lived through too many of these hype cycles and seen too many people lose their jobs and/or retirement funds due to companies that bet the farm on the latest bezzle.

The AI hype is too strong right now for me to convince people caught up in it that they are being conned. But for the rest of you I want you to recognize the patterns at play here and apply your critical thinking skills to any new “revolutionary” technologies that follow a similar path. And try to educate others so that we don’t as a society keep making the same sorts of mistakes over and over again. The resources we are wasting on the current AI hype cycle are killing the planet and could be put to so much better use.

More on EXT4 Timestamps and Timestomping

Many years ago I did a breakdown of the EXT4 file system. I devoted an entire blog article to the new timestamp system used by EXT4. The trick is that EXT4 added 32-bit nanosecond resolution fractional seconds fields in its extended inode. But you only need 30 bits to represent nanoseconds, so EXT4 uses the lower two bits of the fractional seconds fields to extend the standard Unix “epoch time” timestamps. This allows EXT4 to get past the Y2K-like problem that normal 32-bit epoch timestamps face in the year 2038.

At the time I wrote, “With the extra two bits, the largest value that can be represented is 0x03FFFFFFFF, which is 17179869183 decimal. This yields a GMT date of 2514-05-30 01:53:03…” But it turns out that I misunderstood something critical about the way EXT4 handles timestamps. The actual largest date that can be represented in an EXT4 file system is 2446-05-10 22:38:55. Curious about why? Read on for a breakdown of how EXT4 timestamps are encoded, or skip ahead to “Practical Applications” to understand why this knowledge is useful.

Traditional Unix File System Timestamps

Traditionally, file system times in Unix/Linux are represented as “the number of seconds since 00:00:00 Jan 1, 1970 UTC”– typically referred to as Unix epoch time. But what if you wanted to represent times before 1970? You just use negative seconds to go backwards.

So Unix file system times are represented as signed 32-bit integers. This gives you a time range from 1901-12-13 20:45:52 (-2**31 or 0x80000000 or -2147483648 seconds) to 2038-01-19 03:14:07 (2**31 - 1 or 0x7fffffff or 2147483647 seconds). When January 19th, 2038 rolls around, unpatched 32-bit Unix and Linux systems are going to be having a very bad day. Don’t try to tell me there won’t be critical applications running on these systems in 2038– I’m pretty much basing my retirement planning on consulting in this area.

What Did EXT4 Do?

EXT4 had those two extra bits from the fractional seconds fields to play around with, so the developers used them to extend the seconds portion of the timestamp. When I wrote my infamous “timestamps good into year 2514” comment in the original article, I was thinking of the timestamp as an unsigned 34-bit integer 0x03ffffffff. But that’s not right.

EXT4 still has to support the original timestamp range from 1901 – 2038 and the epoch still has to be based around the January 1, 1970 or else chaos will ensue. So the meaning of the original epoch time values hasn’t changed. This field still counts seconds from -2147483648 to 2147483647.

So what about the extra two bits? With two bits you can enumerate values from 0-3. EXT4 treats these as multiples of 2*32 or 4294967296 seconds. So, for example, if the “extra” bits value was 2 you would start with 2 * 4294967296 = 8589934592 seconds and then add whatever value is in the standard epoch seconds field. And if that epoch seconds value was negative, you end up adding a negative number, which is how mathematicians think of subtraction.

This insanity allows EXT4 to cover a range from (0 * 4294967296 - 2147483648) aka -2147483648 seconds (the traditional 1901 time value) all the way up to (3 * 4294967296 + 2147483647) = 15032385535 seconds. That timestamp is 2446-05-10 22:38:55, the maximum EXT4 timestamp. If you’re still around in the year 2446 (and people are still using money) then maybe you can pick up some extra consulting dollars fixing legacy systems.

At this point you may be wondering why the developers chose this encoding. Why not just use the extra bits to make a 34-bit signed integer? A 34-bit signed integer would have a range from -2**33 = -8589934592 seconds to 2**33 - 1 = 8589934591 seconds. That would give you a range of timestamps from 1697-10-17 11:03:28 to 2242-03-16 12:56:31. Being able to set file timestamps back to 1697 is useful to pretty much nobody. Whereas the system the EXT4 developers chose gives another 200 years of future dates over the basic signed 34-bit date scheme.

Practical Applications

Why I am looking so closely at EXT4 timestamps? This whole business started out because I was frustrated after reading yet another person claiming (incorrectly) that you cannot set ctime and btime in Linux file systems. Yes, the touch command only lets you set atime and mtime, but touch is not the only game in town.

For EXT file systems, the debugfs command allows writing to inode fields directly with the set_inode_field command (abbreviated sif). This works even on actively mounted file systems:

root@LAB:~# touch MYFILE
root@LAB:~# ls -i MYFILE
654442 MYFILE
root@LAB:~# df -h .
Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/LabVM-root   28G   17G  9.7G  63% /
root@LAB:~# debugfs -w -R 'stat <654442>' /dev/mapper/LabVM-root | grep time:
 ctime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
 atime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
 mtime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
crtime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
root@LAB:~# debugfs -w -R 'sif <654442> crtime @-86400' /dev/mapper/LabVM-root
root@LAB:~# debugfs -w -R 'stat <654442>' /dev/mapper/LabVM-root | grep time:
 ctime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
 atime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
 mtime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
crtime: 0xfffeae80:8eb89330 -- Tue Dec 30 19:00:00 1969

The set_inode_field command needs the inode number, the name of the field you want to set (you can get a list of field names with set_inode_field -l), and the value you want to set the field to. In the example above, I’m setting the crtime field (which is how debugfs refers to btime). debugfs wants you to provide the value as an epoch time value– either in hex starting with “0x” or in decimal preceded by “@“.

What often trips people up when they try this is caching. Watch what happens when I use the standard Linux stat command to dump the file timestamps:

root@LAB:~# stat MYFILE
  File: MYFILE
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fe00h/65024d    Inode: 654442      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-09-02 09:58:13.598615244 -0400
Modify: 2024-09-02 09:58:13.598615244 -0400
Change: 2024-09-02 09:58:13.598615244 -0400
 Birth: 2024-09-02 09:58:13.598615244 -0400

The btime appears to be unchanged! The inode value has changed, but the operating system hasn’t caught up to the new reality yet. Once I force Linux to drop it’s out-of-date cached info, everything looks as it should:

root@LAB:~# echo 3 > /proc/sys/vm/drop_caches
root@LAB:~# stat MYFILE
  File: MYFILE
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fe00h/65024d    Inode: 654442      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-09-02 09:58:13.598615244 -0400
Modify: 2024-09-02 09:58:13.598615244 -0400
Change: 2024-09-02 09:58:13.598615244 -0400
 Birth: 1969-12-30 19:00:00.598615244 -0500

If I wanted to set the fractional seconds field, that would be crtime_extra. Remember, however, that the low bits of this field are used to set dates far into the future:

root@LAB:~# debugfs -w -R 'sif <654442> crtime_extra 2' /dev/mapper/LabVM-root
root@LAB:~# debugfs -w -R 'stat <654442>' /dev/mapper/LabVM-root | grep time:
debugfs 1.46.2 (28-Feb-2021)
 ctime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
 atime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
 mtime: 0x66d5c475:8eb89330 -- Mon Sep  2 09:58:13 2024
crtime: 0xfffeae80:00000002 -- Tue Mar 15 08:56:32 2242

For the *_extra fields, debugfs just wants a raw number either in hex or decimal (hex values should still start with “0x“).

Making This Easier

Human beings would like to use readable timestamps rather than epoch time values. The good news is that GNU date can convert a variety of different timestamp formats into epoch time values:

root@LAB:~# date -d '2345-01-01 12:34:56' '+%s'
11833925696

Specify whatever time string you want to convert after the -d option. The %s format means give epoch time output.

Now for the bad news. The value that date outputs must be converted into the peculiar encoding that EXT4 uses. And that’s why I spent so much time fully understanding the EXT4 timestamp format. That understanding leads to some crazy shell math:

# Calculate a random nanoseconds value
# Mask it down to only 30 bits, shift right two bits
nanosec="0x$(head /dev/urandom | tr -d -c 0-9a-f | cut -c1-8)"
nanosec=$(( ($nanosec & 0x3fffffff) << 2) ))

# Get an epoch time value from the date command
# Adjust the time value to a range of all positive values
# Calculate the number for the standard seconds field
# Calculate the bits needed in the *_extra field
epoch_time=$(date -d '2345-01-01 12:34:56' '+%s)
adjusted_time=$(( $epoch_time + 2147483648 ))
time_lowbits=$(( ($adjusted_time % 4294967296) - 2147483648 ))
time_highbits=$(( $adjusted_time / 4294967296 ))

# The *_extra field value combines extra bits with nanoseconds
extra_field=$(( $nanosec + $time_highbits ))

Clearly nobody wants to do this manually every time. You just want to do some timestomping, right? Don’t worry, I’ve written a script to set timestamps in EXT:

root@LAB:~# extstomp -v -macb -T '2123-04-05 1:23:45' MYFILE
===== MYFILE
 ctime: 0x2044c7e1:ec154f7d -- Mon Apr  5 01:23:45 2123
 atime: 0x2044c7e1:ec154f7d -- Mon Apr  5 01:23:45 2123
 mtime: 0x2044c7e1:ec154f7d -- Mon Apr  5 01:23:45 2123
crtime: 0x2044c7e1:ec154f7d -- Mon Apr  5 01:23:45 2123
root@LAB:~# extstomp -v -cb -T '2345-04-05 2:34:56' MYFILE
===== MYFILE
 ctime: 0xc1d6b290:61966bab -- Thu Apr  5 02:34:56 2345
 atime: 0x2044c7e1:ec154f7d -- Mon Apr  5 01:23:45 2123
 mtime: 0x2044c7e1:ec154f7d -- Mon Apr  5 01:23:45 2123
crtime: 0xc1d6b290:61966bab -- Thu Apr  5 02:34:56 2345

Use the -macb options to specify the timestamps you want to set and -T to specify your time string. You can use -e to specify nanoseconds if you want, otherwise the script just generates a random nanoseconds value. The script is usually silent but -v causes the script to output the file timestamps when it’s done. The script even drops the file system caches automatically for you (unless you use -C to keep the old cached info).

And because you often want to blend in with other files in the operating system, I’ve included an option to copy the timestamps from another file:

root@LAB:~# extstomp -v -macb -S /etc/passwd MYFILE
===== MYFILE
 ctime: 0x66b37fc9:9d8e0ba8 -- Wed Aug  7 10:08:09 2024
 atime: 0x66d5c3cb:33dd3b30 -- Mon Sep  2 09:55:23 2024
 mtime: 0x66b37fc9:9d8e0ba8 -- Wed Aug  7 10:08:09 2024
crtime: 0x66b37fc9:9d8e0ba8 -- Wed Aug  7 10:08:09 2024
root@LAB:~# stat /etc/passwd
  File: /etc/passwd
  Size: 2356            Blocks: 8          IO Block: 4096   regular file
Device: fe00h/65024d    Inode: 1368805     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-09-02 09:55:23.217534156 -0400
Modify: 2024-08-07 10:08:09.660833002 -0400
Change: 2024-08-07 10:08:09.660833002 -0400
 Birth: 2024-08-07 10:08:09.660833002 -0400

You’re welcome!

What About Other File Systems?

It’s particularly easy to do this timestomping on EXT because debugfs allows us to operate on live file systems. In “expert mode” (xfs_db -x), the xfs_db tool has a write command that allows you to set inode fields. Unfortunately, by default xfs_db does not allow writing to mounted file systems. Of course, an enterprising individual could modify the xfs_db source code and bypass these safety checks.

And that’s really the bottom line. Use the debugging tool for whatever file system you are dealing with to set the timestamp values appropriately for that file system. It may be necessary to modify the code to allow operation on live file systems, but tweaking timestamp fields in an inode while the file system is running is generally not too dangerous.

Systemd Journal and journalctl

While I haven’t been happy about Systemd’s continued encroachment into the Linux operating system, I will say that the Systemd journal is generally an upgrade over traditional Syslog. We’ve reached the point where some newer distributions are starting to forgo Syslog and traditional Syslog-style logs altogether. The challenge for DFIR professionals is that the Systemd journals are in a binary format and require a command-line tool, journalctl, for searching and text output.

The main advantage that Systemd journals have over traditional Syslog-style logs is that Systemd journals carry considerably more metadata related to log messages, and this metadata is broken down into multiple searchable fields. A traditional Syslog log message might look like:

Jul 21 11:22:02 LAB sshd[1304]: Accepted password for lab from 192.168.10.1 port 56280 ssh2

The Systemd journal entry for the same message is:

{
        "_EXE" : "/usr/sbin/sshd",
        "_SYSTEMD_CGROUP" : "/system.slice/ssh.service",
        "_SELINUX_CONTEXT" : "unconfined\n",
        "SYSLOG_FACILITY" : "4",
        "_SYSTEMD_UNIT" : "ssh.service",
        "_UID" : "0",
        "SYSLOG_TIMESTAMP" : "Jul 21 07:22:02 ",
        "_CAP_EFFECTIVE" : "1ffffffffff",
        "_TRANSPORT" : "syslog",
        "_SYSTEMD_SLICE" : "system.slice",
        "PRIORITY" : "6",
        "SYSLOG_IDENTIFIER" : "sshd",
        "_PID" : "1304",
        "_HOSTNAME" : "LAB",
        "__REALTIME_TIMESTAMP" : "1721560922218814",
        "_SYSTEMD_INVOCATION_ID" : "70a0b99512864d22a8f8b10752ad6537",
        "SYSLOG_PID" : "1304",
        "__MONOTONIC_TIMESTAMP" : "265429588",
        "_GID" : "0",
        "__CURSOR" : "s=743db8433dcc46ca9b9cecd7a4272061;i=1d6f;b=5c57e83c3abd457c95d0695807667c9e;m=fd22254;t=61dc0233a613e;x=31ff9c313be9c36f",
        "_CMDLINE" : "sshd: lab [priv]",
        "_MACHINE_ID" : "47b59f088dc74eb0b8544be4c3276463",
        "_COMM" : "sshd",
        "_BOOT_ID" : "5c57e83c3abd457c95d0695807667c9e",
        "MESSAGE" : "Accepted password for lab from 192.168.10.1 port 56280 ssh2",
        "_SOURCE_REALTIME_TIMESTAMP" : "1721560922218786"
}

Any of these fields is individually searchable. The journalctl command provides multiple pre-defined output formats, and custom output of specific fields is also supported.

Systemd uses a simple serialized text protocol over HTTP or HTTPS for sending journal entries to remote log collectors. This protocol uses port 19532 by default. The URL of the remote server is normally found in the /etc/systemd/journal-upload.conf file. On the receiver, configuration for handling the incoming messages is defined in /etc/systemd/journal-remote.conf. A “pull” mode for requesting journal entries from remote systems is also supported, using port 19531 by default.

Journal Time Formats

As you can see in the JSON output above, the Systemd journal supports multiple time formats. The primary format is a Unix epoch style UTC time with an extra six digits for microsecond precision. This is the format for the _SOURCE_REALTIME_TIMESTAMP and __REALTIME_TIMESTAMP fields. Archived journal file names (see below) use a hexadecimal form of this UTC time value.

Note that _SOURCE_REALTIME_TIMESTAMP is the time when systemd-journald first received the message on the system where the message was originally generated. If the message was later relayed to another system using the systemd-journal-remote service, __REALTIME_TIMESTAMP will reflect the time the message was received by the remote system. In the journal on the originating system, _SOURCE_REALTIME_TIMESTAMP and __REALTIME_TIMESTAMP are usually the same value.

I have created shell functions for converting both the decimal and hexadecimal representations of this time format into human-readable time strings:

function jtime { usec=$(echo $1 | cut -c11-); date -d @$(echo $1 | cut -c 1-10) "+%F %T.$usec %z"; }
function jhextime { usec=$(echo $((0x$1)) | cut -c11-); date -d @$(echo $((0x$1)) | cut -c 1-10) "+%F %T.$usec %z"; }

Journal entries also contain a __MONOTONIC_TIMESTAMP field. This field represents the number of microseconds since the system booted. This is the same timestamp typically seen in dmesg output.

Journal entries will usually contain a SYSLOG_TIMESTAMP field. This text field is the traditional Syslog-style timestamp format. This time is in the default local time zone for the originating machine.

Journal Files Location and Naming

Systemd journal files are typically found under /var/log/journal/MACHINE_ID. The MACHINE_ID is a random 128-bit value assigned to each system during the first boot. You can find the MACHINE_ID in the file /etc/machine-id file.

Under the /var/log/journal/MACHINE_ID directory, you will typically find multiple files:

root@LAB:~# ls /var/log/journal/47b59f088dc74eb0b8544be4c3276463/
system@00061db4ac78a3a2-03652b1534b78cc1.journal~
system@7166038d7a284f0f9f3c1aa7fab3f251-0000000000000001-0005f3e6144d8b0c.journal
system@7166038d7a284f0f9f3c1aa7fab3f251-0000000000001d59-0005f848158ef146.journal
system@7166038d7a284f0f9f3c1aa7fab3f251-00000000000061be-0005fbf9b35341f9.journal
system.journal
user-1000@00061db4b9047993-8eabd101686c1832.journal~
user-1000@a9d71fa481e0447a88f62416b6815868-0000000000001d7c-0005f8481f464fe7.journal
user-1000@c946abc41d224a5692053aa4e03ae012-00000000000007fe-0005f3e61585d7c2.journal
user-1000@e933964a572642cdb863a8803485cf10-0000000000006411-0005fbf9b50e0a5a.journal
user-1000.journal

The system.journal file is where logs from operating system services are currently being written. The other system@*.journal files are older, archived journal files. The systemd-journald process takes care of rotating the current journal and purging older files based on parameters configured in /etc/systemd/journald.conf.

The naming convention for these archived files is system@fileid-seqnum-time.journal. fileid is a random 128-bit file ID number. seqnum is the sequence number of the first message in the journal file. Sequence numbers are started at one and simply increase monotonically with each new message. time is the hexadecimal form of the standard journal UTC Unix epoch timestamp (see above). This time matches the __REALTIME_TIMESTAMP value of the first message in the journal– the time that message was received on the local system.

File names that end in a tilde– like the system@00061db4ac78a3a2-03652b1534b78cc1.journal~ file– are files that systemd-journald either detected as corrupted or which were ended by an unclean shutdown of the operating system. The first field after the “@” is a hex timestamp value corresponding to when the file was renamed as an archive. This is often when the system reboots, if the operating system crashed. I have been unable to determine how the second hex string is calculated.

In addition to the system*.journal files, the journal directory may also contain one or more user-UID*.journal files. These are user-specific logs where the UID corresponds to each user’s UID value in the third field of /etc/passwd. The naming convention on the user-UID*.journal files is the same as for the system*.journal files.

The journalctl Command

Because the journalctl command has a large number of options for searching, output formats, and more, I have created a quick one-page cheat sheet for the journalctl command. You may want to refer to the cheat sheet as you read through this section.

My preference is to “export SYSTEMD_PAGER=” before operating the journalctl command. Setting this value to null means that long lines in the journalctl output will wrap onto the next line of your terminal rather than creating a situation where you need to scroll the lines to the right to see the full message. If you want to look at the output one screenful at a time, you can simply pipe the output into less or more.

SELECTING FILES

By default the journalctl command operates on the local journal files in /var/log/journal/MACHINE_ID. If you wish to use a different set of files, you can specify an alternate directory with “-D“, e.g. “journalctl -D /path/to/evidence ...“. You can specify an individual file with “--file=” or use multiple “--file” arguments on a single command line. The “--file” option also accepts normal shell wildcards, so you could use “journalctl --file=system@\*” to operate just on archived system journal files in the current working directory. Note the extra backslash (“\“) to prevent the wildcard from being interpreted by the shell.

journalctl --header” provides information about the contents of one or more journal files:

# journalctl --header --file=system@00061db4ac78a3a2-03652b1534b78cc1.journal~
File path: system@00061db4ac78a3a2-03652b1534b78cc1.journal~
File ID: a98f5eb8aff543a8abdee01518dd91f0
Machine ID: 47b59f088dc74eb0b8544be4c3276463
Boot ID: 9ac272cac6c040a7b9ad021ba32c2574
Sequential number ID: 7166038d7a284f0f9f3c1aa7fab3f251
State: OFFLINE
Compatible flags:
Incompatible flags: COMPRESSED-LZ4
Header size: 256
Arena size: 33554176
Data hash table size: 211313
Field hash table size: 333
Rotate suggested: no
Head sequential number: 27899 (6cfb)
Tail sequential number: 54724 (d5c4)
Head realtime timestamp: Sun 2024-07-14 16:18:40 EDT (61d3ad1816828)
Tail realtime timestamp: Sat 2024-07-20 08:55:01 EDT (61dad51f51a90)
Tail monotonic timestamp: 2d 1min 22.016s (284092380c)
Objects: 116042
Entry objects: 26326
Data objects: 64813
Data hash table fill: 30.7%
Field objects: 114
Field hash table fill: 34.2%
Tag objects: 0
Entry array objects: 24787
Deepest field hash chain: 2
Deepest data hash chain: 4
Disk usage: 32.0M

The most useful information here is the first (“Head“) and last (“Tail“) timestamps in the file along with the object counts.

OUTPUT MODES

The default output mode for journalctl is very similar to a typical Syslog-style log:

# journalctl -t sudo -r
-- Journal begins at Sat 2023-02-04 15:59:52 EST, ends at Sun 2024-07-21 13:00:01 EDT. --
Jul 21 07:34:01 LAB sudo[1491]: pam_unix(sudo:session): session opened for user root(uid=0) by lab(uid=1000)
Jul 21 07:34:01 LAB sudo[1491]:      lab : TTY=pts/1 ; PWD=/home/lab ; USER=root ; COMMAND=/bin/bash
Jul 21 07:22:09 LAB sudo[1432]: pam_unix(sudo:session): session opened for user root(uid=0) by lab(uid=1000)
Jul 21 07:22:09 LAB sudo[1432]:      lab : TTY=pts/0 ; PWD=/home/lab ; USER=root ; COMMAND=/bin/bash
-- Boot 93616c3bb5794e0099520b2bf974d1bc --
Jul 21 07:17:11 LAB sudo[1571]: pam_unix(sudo:session): session closed for user root
Jul 21 07:17:11 LAB sudo[1512]: pam_unix(sudo:session): session closed for user root
[...]

Note that here I am using the “-r” flag so that the most recent entries are shown first rather than the normal ordering of oldest to newest as you would normally read them in a log file.

The main differences between the default journalctl output and default Syslog-style output is the “Journal begins at...” header line and the markers that show which boot session the log messages were generated in. Like normal Syslog logs, the timestamps are shown in the default time zone for the machine where you are running the journalctl command.

If you want to hide the initial header, specify “-q” (“quiet“). If you want to force UTC timestamps, the option is “--utc“. You can hide the boot session information by choosing any one of several output modes with “-o“. Here is a single log message formatted with some of the different output choices:

-o short
Jul 21 07:33:36 LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2

-o short-full
Sun 2024-07-21 07:33:36 EDT LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2

-o short-iso
2024-07-21T07:33:36-0400 LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2

-o short-iso-precise
2024-07-21T07:33:36.610329-0400 LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2

My personal preference is “-q --utc -o short-iso“. If you have a particular preferred output style, you might consider making it an alias so you’re not constantly having to retype the options. In my case the command would be “alias journalctl='journalctl -q --utc -o short-iso'“.

The “-o” option also supports several different JSON output formats. If you are looking to consume journalctl output with a script, you probably want “-o json” which formats all fields in each journal entry as a single long line of minified JSON. “-o json-pretty” is a multi-line output mode that I find useful when I’m trying to figure out which fields to construct my queries with. The JSON output at the top of this article was created with “-o json-pretty“.

In JSON output modes, you can output a custom list of fields with the “--output-fields=” option:

# journalctl -o json-pretty --output-fields=_EXE,_PID,MESSAGE
{
        "_EXE" : "/usr/sbin/sshd",
        "__REALTIME_TIMESTAMP" : "1721561616611216",
        "MESSAGE" : "Accepted password for lab from 192.168.10.1 port 56282 ssh2",
        "_BOOT_ID" : "5c57e83c3abd457c95d0695807667c9e",
        "__CURSOR" : "s=743db8433dcc46ca9b9cecd7a4272061;i=1e19;b=5c57e83c3abd457c95d0695807667c9e;m=3935b8a5;t=61dc04c9df790;x=d031b64e57796135",
        "_PID" : "1478",
        "__MONOTONIC_TIMESTAMP" : "959821989"
}
[...]

Notice that the __CURSOR, __REALTIME_TIMESTAMP, __MONOTONIC_TIMESTAMP, and _BOOT_ID fields are always printed even though we did not specifically select them.

-o verbose --output-fields=...” gives only the requested fields plus __CURSOR but does so without the JSON formatting. “-o cat --output-fields=...” gives just the field values with no field names and no extra fields.

MATCHING MESSAGES

In general you can select messages you want to see by matching with “FIELD=value“, e.g. “_UID=1000“. You can specify multiple selectors on the same command line and the journalctl command assumes you want to logically “AND” the selections together (intersection). If you want logical “OR”, use a “+” between field selections, e.g. “_UID=0 + _UID=1000“.

Earlier I mentioned using “-o json-pretty” to help view fields that you might want to match on. “journalctl -N” lists the names of all fields found in the journal file(s), while “journalctl -F FIELD” lists all values found for a particular field:

# journalctl -N | sort
[...]
_SYSTEMD_UNIT
_SYSTEMD_USER_SLICE
_SYSTEMD_USER_UNIT
THREAD_ID
TID
TIMESTAMP_BOOTTIME
TIMESTAMP_MONOTONIC
_TRANSPORT
_UDEV_DEVNODE
_UDEV_SYSNAME
_UID
UNIT
UNIT_RESULT
USER_ID
USER_INVOCATION_ID
USERSPACE_USEC
USER_UNIT
# journalctl -F _UID | sort -n
0
101
104
105
107
110
114
117
1000
62803

Piping the output of “-F” and “-N” into sort is highly recommended.

Commonly matched fields have shortcut options:

--facility=   Matches on Syslog facility name or number
    journalctl -q -o short --facility=authpriv
    (Gives output just like typical /var/log/auth.log files)

-t            Matches SYSLOG_IDENTIFIER field
    journalctl -q -o short -t sudo
    (When you just want to see messages from Sudo)

-u            Matches _SYSTEMD_UNIT field
    journalctl -q -o short -u ssh.service
    (Messages from sshd, the ".service" is optional)

You can also do pattern matching against the log message text using the “-g” (“grep“) option. This option uses PCRE2 regular expression syntax. You might find this regular expression tester useful.

Options can be combined in any order:

# journalctl -q -r --utc -o short-iso -u ssh -g Accepted
2024-07-21T11:33:36+0000 LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2
2024-07-21T11:22:02+0000 LAB sshd[1304]: Accepted password for lab from 192.168.10.1 port 56280 ssh2
2024-07-20T21:55:45+0000 LAB sshd[1559]: Accepted password for lab from 192.168.10.1 port 56278 ssh2
2024-07-20T21:44:55+0000 LAB sshd[1386]: Accepted password for lab from 192.168.10.1 port 56376 ssh2
[...]

TIME-BASED SELECTIONS

Specify time ranges with the “-S” (or “--since“) and “-U” (“--until“) options. The syntax for specifying dates and times is ridiculously flexible and is defined in the systemd.time(7) manual page. Here are some examples:

-S 2024-08-07 09:30:00
-S 2024-07-24
-U yesterday
-U “15 minutes ago”
-S -1hr
-S 2024-07-24 -U yesterday

The Systemd journal also keeps track of when the system reboots and allows you to select messages that happened during a particular operating sessions of the machine. “--list-boots” gives a list of all of the reboots found in the current journal files and “-b” allows you to select one or more sessions:

# journalctl --list-boots
 -3 f366a96b2f0a402a94e02eb57e10d431 Sun 2024-07-14 16:18:40 EDT—Thu 2024-07-18 08:53:12 EDT
 -2 9ac272cac6c040a7b9ad021ba32c2574 Thu 2024-07-18 08:53:45 EDT—Sat 2024-07-20 08:55:50 EDT
 -1 93616c3bb5794e0099520b2bf974d1bc Sat 2024-07-20 17:41:24 EDT—Sun 2024-07-21 07:17:12 EDT
  0 5c57e83c3abd457c95d0695807667c9e Sun 2024-07-21 07:17:40 EDT—Sun 2024-07-21 14:17:52 EDT
# journalctl -q -r --utc -o short-iso -u ssh -g Accepted -b -1
2024-07-20T21:55:45+0000 LAB sshd[1559]: Accepted password for lab from 192.168.10.1 port 56278 ssh2
2024-07-20T21:44:55+0000 LAB sshd[1386]: Accepted password for lab from 192.168.10.1 port 56376 ssh2

TAIL-LIKE BEHAVIORS

When using journalctl on a live system, “journalctl -f” allows you to watch messages coming into the logs in real time. This is similar to using “tail -f” on a traditional Syslog-style log. You may still use all of the normal selectors to filter messages you want to watch for, as well as specify the usual output formats.

journalctl -n” displays the last ten entries in the journal, similar to piping the output into tail. You may optionally specify a numeric argument after “-n” if you want to see more or less than ten lines.

However, the “-n” and “-g” (pattern matching) operators have a strange interaction. The pattern match is only applied to the lines selected by “-n” along with your other selectors. For example, we can extract the last ten lines associated with the SSH service:

# journalctl -q --utc -o short-iso -u ssh -n
2024-07-21T11:17:41+0000 LAB systemd[1]: Starting OpenBSD Secure Shell server...
2024-07-21T11:17:42+0000 LAB sshd[723]: Server listening on 0.0.0.0 port 22.
2024-07-21T11:17:42+0000 LAB sshd[723]: Server listening on :: port 22.
2024-07-21T11:17:42+0000 LAB systemd[1]: Started OpenBSD Secure Shell server.
2024-07-21T11:22:02+0000 LAB sshd[1304]: Accepted password for lab from 192.168.10.1 port 56280 ssh2
2024-07-21T11:22:02+0000 LAB sshd[1304]: pam_unix(sshd:session): session opened for user lab(uid=1000) by (uid=0)
2024-07-21T11:33:36+0000 LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2
2024-07-21T11:33:36+0000 LAB sshd[1478]: pam_unix(sshd:session): session opened for user lab(uid=1000) by (uid=0)
2024-07-21T19:56:09+0000 LAB sshd[4013]: Accepted password for lab from 192.168.10.1 port 56284 ssh2
2024-07-21T19:56:09+0000 LAB sshd[4013]: pam_unix(sshd:session): session opened for user lab(uid=1000) by (uid=0)

But matching the lines containing the “Accepted” keyword only matches against the ten lines shown above:

# journalctl -q --utc -o short-iso -u ssh -g Accepted -n
2024-07-21T11:22:02+0000 LAB sshd[1304]: Accepted password for lab from 192.168.10.1 port 56280 ssh2
2024-07-21T11:33:36+0000 LAB sshd[1478]: Accepted password for lab from 192.168.10.1 port 56282 ssh2
2024-07-21T19:56:09+0000 LAB sshd[4013]: Accepted password for lab from 192.168.10.1 port 56284 ssh2

From an efficiency perspective I understand this choice. It’s costly to seek backwards through the journal doing pattern matches until you find ten lines that match your regular expression. But it’s certainly surprising behavior, especially when your pattern match returns zero matching lines because it doesn’t happen to get a hit in the last ten lines you selected.

Frankly, I just forget about the “-n” option and just pipe my journalctl output into tail.

Further Reading

I’ve attempted to summarize the information most important to DFIR professionals, but there is always more to know. For further information, start with the journalctl(1) manual page. Keep your journalctl cheat sheet handy and good luck out there!

Hiding Linux Processes with Bind Mounts

Lately I’ve been thinking about Stephan Berger’s recent blog post on hiding Linux processes with bind mounts. Bottom line here is that if you have an evil process you want to hide, use a bind mount to mount a different directory on top of the /proc/PID directory for the evil process.

In the original article, Stephan uses a nearly empty directory to overlay the original /proc/PID directory for the process he is hiding. I started thinking about how I could write a tool that would populate a more realistic looking spoofed directory. But after doing some prototypes and running into annoying complexities I realized there is a much easier approach.

Why try and make my own spoofed directory when I can simply use an existing /proc/PID directory from some other process? If you look at typical Linux ps output, there are lots of process entries that would hide our evil process quite well:

root@LAB:~# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 Jul23 ?        00:00:12 /sbin/init
root           2       0  0 Jul23 ?        00:00:00 [kthreadd]
root           3       2  0 Jul23 ?        00:00:00 [rcu_gp]
root           4       2  0 Jul23 ?        00:00:00 [rcu_par_gp]
[...]
root          73       2  0 Jul23 ?        00:00:00 [irq/24-pciehp]
root          74       2  0 Jul23 ?        00:00:00 [irq/25-pciehp]
root          75       2  0 Jul23 ?        00:00:00 [irq/26-pciehp]
root          76       2  0 Jul23 ?        00:00:00 [irq/27-pciehp]
root          77       2  0 Jul23 ?        00:00:00 [irq/28-pciehp]
root          78       2  0 Jul23 ?        00:00:00 [irq/29-pciehp]
root          79       2  0 Jul23 ?        00:00:00 [irq/30-pciehp]

These process entries with low PIDs and process names in square brackets (“[somename]“) are spontaneous processes. They aren’t running executables in the traditional sense– you won’t find a binary in your operating system called kthreadd for example. Instead, these are essentially kernel code dressed up to look like a process so administrators can monitor various subsystems using familiar tools like ps.

From our perspective, however, they’re a bunch of processes that administrators generally ignore and which have names that vary only slightly from one another. They’re perfect for hiding our evil processes:

root@LAB:~# ps -ef | grep myevilprocess
root        4867       1  0 Jul23 pts/0    00:00:16 myevilprocess
root@LAB:~# mount -B /proc/78 /proc/4867
root@LAB:~# ps -ef | grep 4867

Our evil process is now completely hidden. If somebody were to look closely at the ps output, they would discover there are now two entries for PID 78:

root@LAB:~# ps -ef | awk '$2 == 78'
root          78       2  0 Jul23 ?        00:00:00 [irq/29-pciehp]
root          78       2  0 Jul23 ?        00:00:00 [irq/29-pciehp]

My guess is that nobody is going to notice this unless they are specifically looking for this technique. And if they are aware of this technique, there’s a much simpler way of detecting it which Stephan notes in his original article:

root@LAB:~# cat /proc/mounts | grep /proc
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=424 0 0
proc /proc/4867 proc rw,nosuid,nodev,noexec,relatime 0 0

The last line above is a dead giveaway that something hinky is going on.

We can refine Stephan’s approach:

root@LAB:~# cat /proc/*/mounts | awk '$2 ~ /^\/proc\/[0-9]*($|\/)/ { print $2 }' | sort -ur
/proc/4867

Just to be thorough, I’m dumping the content of all /proc/*/mount entries (/proc/mounts is a link to /proc/self/mounts) and looking for ones where the mount point is a /proc/PID directory or one of its subdirectories. The “sort -ur” at the end gives us one instance of each unique mount point.

But why the “-r” option? I want to use my output to programmatically unmount the bind mounted directories. I was worried about somebody doing a bind mount on top of a bind mount:

root@LAB:~# mount -B /proc/79/fd /proc/4867/fd
root@LAB:~# cat /proc/*/mounts | awk '$2 ~ /^\/proc\/[0-9]*($|\/)/ { print $2 }' | sort -ur
/proc/78/fd
/proc/4867/fd
/proc/4867
root@LAB:~# cat /proc/*/mounts | awk '$2 ~ /^\/proc\/[0-9]*($|\/)/ { print $2 }' | sort -ur | 
    while read dir; do umount $dir; done
umount: /proc/4867/fd: not mounted.
root@LAB:~# ps -ef | grep myevilprocess
root        4867       1  0 Jul23 pts/0    00:00:16 myevilprocess

While I think this scenario is extremely unlikely, using “sort -ur” means that the mount points are returned in the proper order to be unmounted. And once the bind mounts are umounted, we can see the evil process again.

Note that we do get an error here. /proc/78 is mounted on top of /proc/4867. So when we unmount /proc/78/fd we are also taking care of the spoofed path /proc/4867/fd. When our while loop gets to the entry for /proc/4867/fd, the umount command errors out.

Possible weird corner cases aside, let’s try and provide our analyst with some additional information:

root@LAB:~# function procbindmounts {
  cat /proc/*/mounts | awk '$2 ~ /^\/proc\/[0-9]*($|\/)/ { print $2 }' | sort -ur | 
    while read dir; do 
        echo ===== POSSIBLE PROCESS HIDING $dir
        echo -ne Overlay:\\t
        cut -d' ' -f1-7 $dir/stat
        umount $dir
        echo -ne Hidden:\\t\\t
        cut -d' ' -f1-7 $dir/stat
    done
}
root@LAB:~# mount -B /proc/78 /proc/4867
root@LAB:~# procbindmounts
===== POSSIBLE PROCESS HIDING /proc/4867
Overlay:        78 (irq/29-pciehp) S 2 0 0 0
Hidden:         4867 (myevilprocess) S 1 4867 4759 34816

Thanks Stephan for getting my creative juices flowing. This is a fun technique for all you red teamers out there, and a good trick for all you blue team analysts.

Recovering Deleted Files in XFS

In my earlier write-ups on XFS, I noted that when a file is deleted:

  • The inode address is often still visible in the deleted directory entry
  • The extent structures in the inode are not zeroed

This combination of factors should make it straightforward to recover deleted files. Let’s see if we can document this recovery process, shall we?

For this example, I created a directory containing 100 JPEG images and then deleted 10 images from the directory:

We will be attempting to recover the 0010.jpg file. I have included the file checksum and output of the file command in the screenshot above for future reference.

Examining the Directory

I will use xfs_db to dump the directory file. But first I need to know the device that contains the file system and the inode number of our test directory:

LAB# mount /images
mount: /images: /dev/sdb1 already mounted on /images.
LAB# ls -id /images/testdir/
171 /images/testdir/
LAB# xfs_db -r /dev/sdb1
xfs_db> inode 171
xfs_db> print
core.magic = 0x494e
[... snip ...]
v3.crtime.sec = Sun Jun 23 12:43:20 2024
v3.crtime.nsec = 240281066
v3.inumber = 171
v3.uuid = 82396b5c-3a48-46e9-b3fa-fbed705313b0
v3.reflink = 0
v3.cowextsz = 0
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,2315557,1,0]

The directory file occupies a single block at address 2315557. We can use xfs_db to dump the contents of that block. Viewing the block as a directory isn’t all that helpful, though we can see the area of deleted directory entries in the output:

xfs_db> fsblock 2315557
xfs_db> type dir3
xfs_db> print
bhdr.hdr.magic = 0x58444233
[... snip ... ]
bu[10].namelen = 8
bu[10].name = "0009.jpg"
bu[10].filetype = 1
bu[10].tag = 0x120
bu[11].freetag = 0xffff
bu[11].length = 0xf0
bu[11].filetype = 1
bu[11].tag = 0x138
bu[12].inumber = 20049168
bu[12].namelen = 8
bu[12].name = "0020.jpg"
bu[12].filetype = 1
bu[12].tag = 0x228
bu[13].inumber = 20049169
bu[13].namelen = 8
bu[13].name = "0021.jpg"
[... snip ...]

Array entry 11 shows the 0xffff marker that denotes the beginning of one or more deleted directory entries, and then we have the two-byte length value (0x00f0 or 240 bytes) of the length of that section.

But to see the actual contents of that region, we will need to get a hex dump view:

At offset 0x138 you can see the “ff ff” marking the start of the deleted entries and the “00 f0” length value. These four bytes overwrite the upper four bytes of the inode address of the 0010.jpg file, but the lower four bytes are still visible: “01 31 ed 06“.

Recall from my previous XFS write-ups that while XFS uses 64-bit addresses, the block and inode addresses are variable length and rarely occupy the entire 64-bit address space. The inode address length is based on the number of blocks in each allocation group, and the number of bits necessary to represent that many blocks. This is the agblklog value in the superblock:

xfs_db> sb 0
xfs_db> print agblklog
agblklog = 24

24 bits are required for the relative block offset in the AG. We need three additional bits to index the inode within the block– 27 bits in total. Everything above these 27 bits is the AG number, but assuming the default of four AGs per file system, the AG number only occupies two more bits. The inode address should fit in 29 bits, and so the inode residue we are seeing in the directory entry should be the entire original inode address. You can confirm this by looking at the deleted directory entries that follow the deleted 0010.jpg file— their upper 32 bits are untouched and they show all zeroes in the upper bits.

Examining the Inode

We have some confidence that the inode of the deleted 0010.jpg file is 0x0131ed06. We can use xfs_db to examine this inode. The normal output from xfs_db shows us that the file is empty and there are no extents:

xfs_db> inode 0x0131ed06
xfs_db> print
core.magic = 0x494e
[... snip ...]
core.size = 0
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
[... snip ...]

However, viewing a hexdump of the inode shows the original extent structures:

The extents start at offset 0x0b0, immediately following the “inode core” region. Extents structures are 128 bits in length, so each line in the standard hexdump output format represents a single extent.

Recognizing that the standard, non-byte-aligned XFS extent structures are difficult to decode, I developed a small script called xfs-extents.sh that reads the extent structures from an inode and outputs dd commands that should dump the blocks specified in the extent. Simply provide the device name and the inode number:

LAB# xfs-extents.sh /dev/sdb1 0x0131ed06
(offset 0) -- dd if=/dev/sdb1 bs=4096 skip=$((0 * 12206976 + 2507998)) count=8
LAB# dd if=/dev/sdb1 bs=4096 skip=$((0 * 12206976 + 2507998)) count=8 >/tmp/recovered-0010.jpg
8+0 records in
8+0 records out
32768 bytes (33 kB, 32 KiB) copied, 0.00100727 s, 32.5 MB/s
LAB# file /tmp/recovered-0010.jpg
/tmp/recovered-0010.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 99x98, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=0], comment: "Created with GIMP", baseline, precision 8, 312x406, components 3
LAB# md5sum /tmp/recovered-0010.jpg
637ad57a1e494b2c521b959de6a1995e  /tmp/recovered-0010.jpg

The careful reader will note that the MD5 checksum on the recovered file does not match the checksum of the original file. This is due to the fact that the recovered file includes the null-filled slack space at the end of the final block that was ignored in the original checksum calculations. Unfortunately the original file size in the inode is zeroed when the file is deleted, so we have no idea of the exact length of the original file. All we can do is recover the entire block run of the file, including the slack space. We should still be able to view the original image in this case, even with the extra nulls tacked on to the end of the file.

Extra Credit Math Problem

With the help of xfs_db and my little shell script, we were able to recover the deleted file. However, retrieving the inode from the deleted directory entry was facilitated by the fact that the inode address was less than 32 bits long. So even though the upper 32 bits of the 64 address space was overwritten, we could still see the original inode number.

Since the length of the inode address is based on the number of blocks per AG, the question becomes how large the file system has to grow before the inode address, including the AG number in the upper bits, becomes longer than 32 bits? Once this happens, recovering the original inode address from deleted directory entries becomes problematic– at least for the first entry in a region of deleted directory entries. Remember from our example above the full 64-bit address space of the second and later deleted entries in the chunk are fully visible.

We need two bits to represent the AG number in a typical XFS file system, and three bits to represent the inode offset in the block. That leaves 27 of 32 bits for the relative block offset in the AG. So the maximum AG size is 2**28 – 1 or 268,435,455 blocks. Assuming the standard 4K block size, that’s 1024 gigabytes per AG, or a 4TB file system.

What if we were willing to sacrifice the upper two bits of AG number? After all, even if the AG number were overwritten, we could still try to find our deleted file simply by checking the relative inode address in each of the AGs until we find the file we’re looking for. With an extra two bits of room for the relative block offset, each AG could now be four times larger allowing us to have up to a 16TB file system before the relative inode address was larger than 32 bits.