jq For Forensics

jq is a tremendously useful tool for dealing with JSON data. But the documentation that exists seems to be targeted at developers parsing deeply nested JSON structures to transform them into other JSON structures. In my DFIR role, I typically deal with streams of fairly simple JSON records–usually some sort of log– that I need to transform into structured text, such as comma-separated (CSV) or tab-separated (TSV) output. I’ve spent a lot of time running through reference manuals and endless Stack Overflow postings to get to a reasonable level with jq. I wanted to share some of the things I’ve learned along the way.

Start With The Basics

At it’s simplest, jq is an excellent JSON pretty printer:

$ jq . journal.json
{
"_MACHINE_ID": "0f2f13b9dce0451591ae0dc418f6c96f",
"_RUNTIME_SCOPE": "system",
"_HOSTNAME": "vbox",
"_SOURCE_BOOTTIME_TIMESTAMP": "0",
"MESSAGE": "Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)",
"__MONOTONIC_TIMESTAMP": "6400064",
"_SOURCE_MONOTONIC_TIMESTAMP": "0",
"_BOOT_ID": "2a5a598d4f6142c7b7719eed38c1a2b9",
"SYSLOG_IDENTIFIER": "kernel",
"_TRANSPORT": "kernel",
"PRIORITY": "5",
"SYSLOG_FACILITY": "0",
"__CURSOR": "s=0a047604dca842218e0807bc796d4cb7;i=1;b=2a5a598d4f6142c7b7719eed
38c1a2b9;m=61a840;t=64dc728142e95;x=852824913ddff90e",
"__REALTIME_TIMESTAMP": "1774367626505877"
}
{
"_MACHINE_ID": "0f2f13b9dce0451591ae0dc418f6c96f",
"MESSAGE": "Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet",

...

The basic syntax here is “jq <script> <jsonfile> ...“, where <script> is some sort of translation script in jq‘s own particular scripting language. The script “.” is essentially a null transformation that simply tells jq to output whatever it sees in its input <jsonfile>. The default output style for jq is the pretty-printed style you see above.

Some of you will recognize the data above as Systemd journal entries. Normally we would work with the Systemd journal via the journalctl command. But exported journal data from one of my lab systems is a good example set for showing you some useful jq tips and tricks that you can apply to any sort of exported logging stream.

Other Output Modes

Suppose we just wanted to output the “MESSAGE” field from each record. Just specify the field you want to output with a leading “.“:

$ jq .MESSAGE journal.json
"Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

Because the value of the MESSAGE field is a string, jq outputs each message surrounded by double quotes. If you don’t want the quoting, use the “-r” option for raw mode output:

$ jq -r .MESSAGE journal.json
Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

Suppose we wanted to output multiple fields as columns of structured text. jq includes support for both “@csv” and “@tsv” output modes:

$ jq -r '[.__REALTIME_TIMESTAMP, ._HOSTNAME, .MESSAGE] | @csv' journal.json
"1774367626505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"1774367626505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

jq transformation scripts use a pipelining syntax. Here we’re sending the fields we want to output into the “@csv” formatting tool. “@csv” wants its inputs as a JSON array, so we create an array on the fly simply by enclosing the fields we want to output with square brackets (“[..., ..., ...]“). The “@csv” output method automatically quotes each field and handles escaping any double quotes that might be included.

If you want other delimiters besides the traditional commas or tabs, jq can also output arbitrary text:

$ jq -r '"\(.__REALTIME_TIMESTAMP)|\(._HOSTNAME)|\(.MESSAGE)"' journal.json
1774367626505877|vbox|Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
1774367626505925|vbox|Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

Use double quotes ("...") to enclose your output template. Use “\(.fieldname)” to output the value of specific fields. Anything else in your template is output as literal text. Here I’m outputting pipe-delimited text with the same three fields as in our CSV example above.

Note that our output template can use the typical escape sequences like “\t” for tabs. So another way to produce tab-delimited text would be:

$ jq -r '"\(.__REALTIME_TIMESTAMP)\t\(._HOSTNAME)\t\(.MESSAGE)"' journal.json
1774367626505877 vbox Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
1774367626505925 vbox Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

However, it’s almost certainly easier to use '[..., ..., ...] | @tsv' for this.

Transforming Data With Builtin Operators

jq includes a wide variety of builtin operators for data transformation and math. For example, suppose we wanted to format those __REALTIME_TIMESTAMP fields in the Systemd journal into human-readable strings:

$ head -1 journal.json | jq -r '(.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T")'
2026-03-24 15:53:46

There’s a lot going on here, so let’s break it down a bit at a time. __REALTIME_TIMESTAMP is a string– if you look at the pretty-printed output above, the values are displayed in double quotes meaning they are string type values. Ultimately we want to feed the __REALTIME_TIMESTAMP value into strftime() to produce formatted text, but strftime() wants numeric input. The first thing to do then is to convert the string into a number with “tonumber“. The jq piping syntax is how we express this transformation.

Our next problem is that __REALTIME_TIMESTAMP is in microseconds, but strftime() wants good old Unix epoch seconds. So we do some math with the traditional “/” operator for division. This actually converts our value into a decimal number (“1774367626.505877“), but that’s good enough for strftime(). Finally we pipeline the number we calculated into the strftime() function. We give strftime() an appropriate format string to get the output we want.

This works great, but we’re throwing away the microseconds information. What if we wanted to display that as part of the timestamp? Time to introduce some more useful string operations:

$ head -1 journal.json | jq -r '((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + 
(.__REALTIME_TIMESTAMP | .[-6:])'

2026-03-24 15:53:46.505877

Looking at the back part of our expression on the second line above, we are using jq‘s slicing operation “.[start:end]“. Since we are using a negative offset for the start value, we are counting backwards from the end of the string six characters. With no end value specified, it outputs the rest of the string from that point.

Like many other scripting languages, jq supports string concatenation with the addition operator (“+“). Here we are adding the formatted string output from strftime() and the microseconds value we sliced out of the string. Note the the strftime() format has been updated to output a literal “.” between the formatted text and the microseconds.

Suppose we wanted to include the human-readable timestamp we just created instead of the raw epoch microseconds for our “@csv” output. The trick is to take our jq code for producing human readable timestamps and drop it into our “[...] | @csv” pipeline in place of the __REALTIME_TIMESTAMP field:

$ jq -r '[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + (.__REALTIME_TIMESTAMP | .[-6:]), ._HOSTNAME, .MESSAGE] | @csv' journal.json
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

Scripting With jq

Obviously that jq expression is pretty horrible to type on the command line. You can always take any jq script and put it into a text file and then run that script on your data with the “-f” option:

$ jq -r -f csv-journal.jq journal.json
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

In this instance, our csv-journal.jq file is the jq recipe from our command line example, but without the single quotes. Since jq doesn’t care about whitespace in scripts, we can format our recipe with newlines and indentation to make it more readable:

$ cat csv-journal.jq 
[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) +
(.__REALTIME_TIMESTAMP | .[-6:]),
._HOSTNAME, .MESSAGE] | @csv

On Linux systems you can even use jq in a “bang path” at the top of the script so it automatically gets invoked as the interpreter:

$ cat csv-journal.jq
#!/usr/bin/jq -rf

[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) +
(.__REALTIME_TIMESTAMP | .[-6:]),
._HOSTNAME, .MESSAGE] | @csv

Note that the new interpreter path at the top of the script includes the “-rf” options for raw output (“-r“) and interpreting the rest of the file as a script (“-f“).

Once we have the interpreter path at the top of the script, we can just cat our JSON data into the script without invoking jq directly:

$ chmod +x csv-journal.jq 
$ cat journal.json | ./csv-journal.jq
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

This might make things easier for less-technical users.

Selecting Records

When working with streams of records, it’s typical to want to only operate on certain records. For example, suppose we only wanted to see log messages from the “sudo” command. In the Systemd journal, these messages have the “SYSLOG_IDENTIFIER” field set to “sudo“:

$ jq -r 'select(.SYSLOG_IDENTIFIER == "sudo") | .MESSAGE' journal.json
worker : user NOT in sudoers ; TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/2 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
pam_unix(sudo:session): session closed for user root
worker : TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
pam_unix(sudo:session): session closed for user root
worker : TTY=pts/1 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
worker : TTY=pts/3 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
...

The new magic is jq‘s select() operator up at the front of that pipeline. If the conditional you give to select() evaluates to true, then the record you have matched gets passed down for processing by the rest of the pipeline. If not, then that record is skipped.

Logical operators (“and“, “or“, “not“) and parentheses are allowed. And you can do pattern matching with PCRE-like expressions. For example, the really interesting lines in Sudo logs are the ones that show the command being invoked (“COMMAND=“):

$ jq -r 'select(.SYSLOG_IDENTIFIER == "sudo" and (.MESSAGE | test("COMMAND="))) | .MESSAGE' journal.json
worker : user NOT in sudoers ; TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/2 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/1 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
worker : TTY=pts/3 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
...

For pattern matching, just pipeline the field you want to match against into the test() operator. Here I’m matching the literal string “COMMAND=” against the MESSAGE field. The pattern match is joined with our original selector for “sudo” in the SYSLOG_IDENTIFIER field using a logical “and“.

Here’s another example showing a useful regex when dealing with SSH logs, just to give you a flavor of things you can do with regular expression matching:

$ jq -r 'select(._COMM == "sshd" and 
(.MESSAGE | test("^((Accepted|Failed) .* for|Invalid user) "))) | .MESSAGE' journal.json

Invalid user mary from 192.168.10.31 port 55746
Failed password for invalid user mary from 192.168.10.31 port 55746 ssh2
Accepted password for hal from 192.168.4.22 port 42310 ssh2
...

Enough For Now

Hopefully this is enough to get you started writing your own basic jq scripts. As with many things, the rest you pick up as you practice and get frustrated. The jq reference manual is useful for checking the syntax of different built-in operators, but I often find the examples more frustrating than helpful. Searching Stack Overflow can often yield more useful results.

Feel free to drop your questions into the comments, or reach out to me via social media or email. Maybe your questions will turn this single blog article into a series!

Linux Forensic Scenario

Let’s try something a little different for today’s blog post!

I’ve been working on ideas for a major update on my Linux forensics class, including new lab scenarios. I recently threw together a rough draft of one of my scenario ideas: built a machine, planted some malware on it, and then used UAC to capture forensic data from the system. I was pleased with the results, and thought I would share them with the larger community.

And then I thought, why not turn it into a bit of a contest? For the moment I haven’t decided on any prizes other than bragging rights, but you never know. I have decided that the deadline for submissions for judging will be April 15th– tax day here in the USA.

The Scenario

You received an escalation from your SOC. They received an alert from their NMS about suspicious traffic to one of the Linux workers in the development group’s CI/CD pipeline. The alert was for unencrypted traffic on port 22/tcp, specifically the string “python3 -c 'import pty; pty.spawn("/bin/bash")'” which triggered the alert for “reverse shell promotion” in the NMS. They note that the system is showing signs of heavy CPU usage but that they don’t see any process(es) that account for this. Following their SOP, they acquired data from the system using UAC and have escalated to you as on-call for the internal IR/Threat team.

Other information about the system:

  • There is a single shared account on the system called “worker“. It has full Sudo privileges with the NOPASSWD option set.
  • All network access to the box is through a jump host at IP 192.168.4.35.
  • The UAC collection is uac-vbox-linux-20260324234043.tar.gz

Additional Comments

I threw this scenario together in a matter of hours, so when you look at the timeline of the system you will see that it got built and then compromised very quickly. For the final scenario I will doubtless do a more complete job running fake workloads for some time before the “attack” actually happens.

Similarly, you’ll probably discover that there is no significant network infrastructure around the compromised system. The “jump host” is really just another host in my lab environment that I was operating from.

But I still think there’s plenty of interesting artifacts to find in this scenario. I’m leaving things deliberately open-ended because I want to see what people come up with. But the goal would be to at least account for the issues raised by the SOC: why is there unencrypted traffic on 22/tcp, why is the system burning CPU, and why can’t the SOC see what is going on? Is the system compromised? When and how did that happen?

Submissions

Submissions for judging must be received no later than 23:59 UTC on 2026-04-15. I will accept submissions in .docx, PDF, or text. You may email your submissions to hrpomeranz@gmail.com. Please try to put something like “Linux Forensic Scenario Submission” in the Subject: line to make my life easier.

Depending on the number of submissions I get, I may need more folks to help with the judging. If you’re not planning to compete but would like to help judge, please drop me a line at the email address above. I’ll let you know if I need the help once I count the number (and length) of the submissions.

Happy forensicating! Have fun!

Linux Notes: ls and Timestamps

There’s an old riddle in Unix circles: “Name a letter that is not an option for the ls command”. The advent of the GNU version of ls has only made this more difficult to answer. Even if you’re a Unix/Linux power user, you’ve probably only memorized a small handful of the available options.

For example, I have “ls -lArt” burned into my brain from my Sys Admin days. “-l” for detailed listing, “-A” to show hidden files and directories (but not the “.” and “..” links like “-a“), sort by last modified time with “-t“, and “-r” to reverse the sort so the newest files appear right above your next shell prompt.

$ ls -lArt
total 1288
-rw-r--r-- 1 root root 9 Aug 7 2006 host.conf
-rw-r--r-- 1 root root 433 Aug 23 2020 apg.conf
-rw-r--r-- 1 root root 26 Dec 20 2020 libao.conf
-rw-r--r-- 1 root root 12813 Mar 27 2021 services
-rw-r--r-- 1 root root 769 Apr 10 2021 profile
-rw-r--r-- 1 root root 449 Nov 29 2021 mailcap.order
-rw-r--r-- 1 root root 119 Jan 10 2022 catdocrc
...
-rw-r--r-- 1 root root 52536 Feb 23 11:44 mailcap
-rw-r--r-- 1 root root 108979 Mar 2 09:24 ld.so.cache
-rw-r--r-- 1 root root 75 Mar 3 18:08 resolv.conf
drwxr-xr-x 5 root lp 4096 Mar 5 19:52 cups

You’ll note that the timestamps are displayed in two different formats. The oldest files show “month day year”, while the newer files show “month day hh:mm”. The default for ls is that files more than six months old display year information.

Personally I prefer consistent ISO-style timestamps with “--time-style=long-iso“:

$ ls -lArt --time-style=long-iso
total 1288
-rw-r--r-- 1 root root 9 2006-08-07 13:14 host.conf
-rw-r--r-- 1 root root 433 2020-08-23 10:52 apg.conf
-rw-r--r-- 1 root root 26 2020-12-20 11:21 libao.conf
-rw-r--r-- 1 root root 12813 2021-03-27 18:32 services
-rw-r--r-- 1 root root 769 2021-04-10 16:00 profile
-rw-r--r-- 1 root root 449 2021-11-29 08:07 mailcap.order
-rw-r--r-- 1 root root 119 2022-01-10 19:08 catdocrc
...
-rw-r--r-- 1 root root 52536 2026-02-23 11:44 mailcap
-rw-r--r-- 1 root root 108979 2026-03-02 09:24 ld.so.cache
-rw-r--r-- 1 root root 75 2026-03-03 18:08 resolv.conf
drwxr-xr-x 5 root lp 4096 2026-03-05 19:52 cups

While “-t” sorts on last modified time by default, other options allow you to sort and display other timestamps. For example, “-u” sorts on and displays last access time. “-u” is hardly memorable as last access time, but remember “-a” is used for something else.

It’s a pain trying to remember the one letter options for the other timestamps– and note there isn’t even a short option for sorting/displaying on file creation time. So I just use “--time=” to pick the timestamp I want:

$ ls -lArt --time=birth --time-style=long-iso
total 1288
-rw-r--r-- 1 root root 1013 2025-04-10 10:27 fstab
drwxr-xr-x 2 root root 4096 2025-04-10 10:27 ImageMagick-6
drwxr-xr-x 2 root root 4096 2025-04-10 10:27 GNUstep
...
-rw-r--r-- 1 root root 142 2026-02-23 11:41 shells
-rw-r--r-- 1 root root 52536 2026-02-23 11:44 mailcap
-rw-r--r-- 1 root root 108979 2026-03-02 09:24 ld.so.cache
-rw-r--r-- 1 root root 75 2026-03-03 18:08 resolv.conf

Here we’re sorting on and displaying file creation times (“--time=birth“). You can use “--time=atime” or “--time=ctime” for the other timestamps.

If this command line seems long and unwieldy, remember that you can create aliases for commands in your .bashrc or other startup files:

alias ls='ls --color=auto --time-style=long-iso'
alias lb='ls -lArt --time=birth'

With normal ls commands, I’ll get colored output always, and “long-iso” dates whenever I use “-l“. I can use lb whenever I want file creation times. Note that alias definitions “stack”– the “lb” alias will get the color and time-style options from my basic “ls” alias, so I don’t need to include the “--time-style” option in the “lb” alias.