jq For Forensics

jq is a tremendously useful tool for dealing with JSON data. But the documentation that exists seems to be targeted at developers parsing deeply nested JSON structures to transform them into other JSON structures. In my DFIR role, I typically deal with streams of fairly simple JSON records–usually some sort of log– that I need to transform into structured text, such as comma-separated (CSV) or tab-separated (TSV) output. I’ve spent a lot of time running through reference manuals and endless Stack Overflow postings to get to a reasonable level with jq. I wanted to share some of the things I’ve learned along the way.

Start With The Basics

At it’s simplest, jq is an excellent JSON pretty printer:

$ jq . journal.json
{
  "_MACHINE_ID": "0f2f13b9dce0451591ae0dc418f6c96f",
  "_RUNTIME_SCOPE": "system",
  "_HOSTNAME": "vbox",
  "_SOURCE_BOOTTIME_TIMESTAMP": "0",
  "MESSAGE": "Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)",
  "__MONOTONIC_TIMESTAMP": "6400064",
  "_SOURCE_MONOTONIC_TIMESTAMP": "0",
  "_BOOT_ID": "2a5a598d4f6142c7b7719eed38c1a2b9",
  "SYSLOG_IDENTIFIER": "kernel",
  "_TRANSPORT": "kernel",
  "PRIORITY": "5",
  "SYSLOG_FACILITY": "0",
  "__CURSOR": "s=0a047604dca842218e0807bc796d4cb7;i=1;b=2a5a598d4f6142c7b7719eed
38c1a2b9;m=61a840;t=64dc728142e95;x=852824913ddff90e",
  "__REALTIME_TIMESTAMP": "1774367626505877"
}
{
  "_MACHINE_ID": "0f2f13b9dce0451591ae0dc418f6c96f",
  "MESSAGE": "Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet",
...

The basic syntax here is “jq <script> <jsonfile> ...“, where <script> is some sort of translation script in jq‘s own particular scripting language. The script “.” is essentially a null transformation that simply tells jq to output whatever it sees in its input <jsonfile>. The default output style for jq is the pretty-printed style you see above.

Some of you will recognize the data above as Systemd journal entries. Normally we would work with the Systemd journal via the journalctl command. But exported journal data from one of my lab systems is a good example set for showing you some useful jq tips and tricks that you can apply to any sort of exported logging stream.

Other Output Modes

Suppose we just wanted to output the “MESSAGE” field from each record. Just specify the field you want to output with a leading “.“:

$ jq .MESSAGE journal.json
"Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

Because the value of the MESSAGE field is a string, jq outputs each message surrounded by double quotes. If you don’t want the quoting, use the “-r” option for raw mode output:

$ jq -r .MESSAGE journal.json
Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

Suppose we wanted to output multiple fields as columns of structured text. jq includes support for both “@csv” and “@tsv” output modes:

$ jq -r '[.__REALTIME_TIMESTAMP, ._HOSTNAME, .MESSAGE] | @csv' journal.json
"1774367626505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"1774367626505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

jq transformation scripts use a pipelining syntax. Here we’re sending the fields we want to output into the “@csv” formatting tool. “@csv” wants its inputs as a JSON array, so we create an array on the fly simply by enclosing the fields we want to output with square brackets (“[..., ..., ...]“). The “@csv” output method automatically quotes each field and handles escaping any double quotes that might be included.

If you want other delimiters besides the traditional commas or tabs, jq can also output arbitrary text:

$ jq -r '"\(.__REALTIME_TIMESTAMP)|\(._HOSTNAME)|\(.MESSAGE)"' journal.json
1774367626505877|vbox|Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
1774367626505925|vbox|Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

Use double quotes ("...") to enclose your output template. Use “\(.fieldname)” to output the value of specific fields. Anything else in your template is output as literal text. Here I’m outputting pipe-delimited text with the same three fields as in our CSV example above.

Note that our output template can use the typical escape sequences like “\t” for tabs. So another way to produce tab-delimited text would be:

$ jq -r '"\(.__REALTIME_TIMESTAMP)\t\(._HOSTNAME)\t\(.MESSAGE)"' journal.json
1774367626505877	vbox	Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)
1774367626505925	vbox	Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet
...

However, it’s almost certainly easier to use '[..., ..., ...] | @tsv' for this.

Transforming Data With Builtin Operators

jq includes a wide variety of builtin operators for data transformation and math. For example, suppose we wanted to format those __REALTIME_TIMESTAMP fields in the Systemd journal into human-readable strings:

$ head -1 journal.json | jq -r '(.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T")'
2026-03-24 15:53:46

There’s a lot going on here, so let’s break it down a bit at a time. __REALTIME_TIMESTAMP is a string– if you look at the pretty-printed output above, the values are displayed in double quotes meaning they are string type values. Ultimately we want to feed the __REALTIME_TIMESTAMP value into strftime() to produce formatted text, but strftime() wants numeric input. The first thing to do then is to convert the string into a number with “tonumber“. The jq piping syntax is how we express this transformation.

Our next problem is that __REALTIME_TIMESTAMP is in microseconds, but strftime() wants good old Unix epoch seconds. So we do some math with the traditional “/” operator for division. This actually converts our value into a decimal number (“1774367626.505877“), but that’s good enough for strftime(). Finally we pipeline the number we calculated into the strftime() function. We give strftime() an appropriate format string to get the output we want.

This works great, but we’re throwing away the microseconds information. What if we wanted to display that as part of the timestamp? Time to introduce some more useful string operations:

$ head -1 journal.json | jq -r '((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + 
                                (.__REALTIME_TIMESTAMP | .[-6:])'
2026-03-24 15:53:46.505877

Looking at the back part of our expression on the second line above, we are using jq‘s slicing operation “.[start:end]“. Since we are using a negative offset for the start value, we are counting backwards from the end of the string six characters. With no end value specified, it outputs the rest of the string from that point.

Like many other scripting languages, jq supports string concatenation with the addition operator (“+“). Here we are adding the formatted string output from strftime() and the microseconds value we sliced out of the string. Note the the strftime() format has been updated to output a literal “.” between the formatted text and the microseconds.

Suppose we wanted to include the human-readable timestamp we just created instead of the raw epoch microseconds for our “@csv” output. The trick is to take our jq code for producing human readable timestamps and drop it into our “[...] | @csv” pipeline in place of the __REALTIME_TIMESTAMP field:

$ jq -r '[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + (.__REALTIME_TIMESTAMP | .[-6:]), ._HOSTNAME, .MESSAGE] | @csv' journal.json
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

Scripting With jq

Obviously that jq expression is pretty horrible to type on the command line. You can always take any jq script and put it into a text file and then run that script on your data with the “-f” option:

$ jq -r -f csv-journal.jq journal.json
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

In this instance, our csv-journal.jq file is the jq recipe from our command line example, but without the single quotes. Since jq doesn’t care about whitespace in scripts, we can format our recipe with newlines and indentation to make it more readable:

$ cat csv-journal.jq 
[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + 
       (.__REALTIME_TIMESTAMP | .[-6:]), 
 ._HOSTNAME, .MESSAGE] | @csv

On Linux systems you can even use jq in a “bang path” at the top of the script so it automatically gets invoked as the interpreter:

$ cat csv-journal.jq
#!/usr/bin/jq -rf

[((.__REALTIME_TIMESTAMP | tonumber) / 1000000 | strftime("%F %T.")) + 
       (.__REALTIME_TIMESTAMP | .[-6:]), 
 ._HOSTNAME, .MESSAGE] | @csv

Note that the new interpreter path at the top of the script includes the “-rf” options for raw output (“-r“) and interpreting the rest of the file as a script (“-f“).

Once we have the interpreter path at the top of the script, we can just cat our JSON data into the script without invoking jq directly:

$ chmod +x csv-journal.jq 
$ cat journal.json | ./csv-journal.jq
"2026-03-24 15:53:46.505877","vbox","Linux version 6.12.74+deb13+1-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.74-2 (2026-03-08)"
"2026-03-24 15:53:46.505925","vbox","Command line: BOOT_IMAGE=/boot/vmlinuz-6.12.74+deb13+1-amd64 root=UUID=d6cf7c18-1df5-4f29-a6f8-d5c4947c1df7 ro quiet"
...

This might make things easier for less-technical users.

Selecting Records

When working with streams of records, it’s typical to want to only operate on certain records. For example, suppose we only wanted to see log messages from the “sudo” command. In the Systemd journal, these messages have the “SYSLOG_IDENTIFIER” field set to “sudo“:

$ jq -r 'select(.SYSLOG_IDENTIFIER == "sudo") | .MESSAGE' journal.json
worker : user NOT in sudoers ; TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash  
worker : TTY=pts/2 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000) 
pam_unix(sudo:session): session closed for user root
worker : TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
pam_unix(sudo:session): session closed for user root
worker : TTY=pts/1 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
pam_unix(sudo:session): session opened for user root(uid=0) by worker(uid=1000)
worker : TTY=pts/3 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
...

The new magic is jq‘s select() operator up at the front of that pipeline. If the conditional you give to select() evaluates to true, then the record you have matched gets passed down for processing by the rest of the pipeline. If not, then that record is skipped.

Logical operators (“and“, “or“, “not“) and parentheses are allowed. And you can do pattern matching with PCRE-like expressions. For example, the really interesting lines in Sudo logs are the ones that show the command being invoked (“COMMAND=“):

$ jq -r 'select(.SYSLOG_IDENTIFIER == "sudo" and (.MESSAGE | test("COMMAND="))) | .MESSAGE' journal.json
  worker : user NOT in sudoers ; TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
  worker : TTY=pts/2 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
  worker : TTY=pts/0 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
  worker : TTY=pts/1 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
  worker : TTY=pts/3 ; PWD=/home/worker ; USER=root ; COMMAND=/bin/bash
 ...

For pattern matching, just pipeline the field you want to match against into the test() operator. Here I’m matching the literal string “COMMAND=” against the MESSAGE field. The pattern match is joined with our original selector for “sudo” in the SYSLOG_IDENTIFIER field using a logical “and“.

Here’s another example showing a useful regex when dealing with SSH logs, just to give you a flavor of things you can do with regular expression matching:

$ jq -r 'select(._COMM == "sshd" and 
                (.MESSAGE | test("^((Accepted|Failed) .* for|Invalid user) "))) | .MESSAGE' journal.json
Invalid user mary from 192.168.10.31 port 55746
Failed password for invalid user mary from 192.168.10.31 port 55746 ssh2
Accepted password for hal from 192.168.4.22 port 42310 ssh2
...

Enough For Now

Hopefully this is enough to get you started writing your own basic jq scripts. As with many things, the rest you pick up as you practice and get frustrated. The jq reference manual is useful for checking the syntax of different built-in operators, but I often find the examples more frustrating than helpful. Searching Stack Overflow can often yield more useful results.

Feel free to drop your questions into the comments, or reach out to me via social media or email. Maybe your questions will turn this single blog article into a series!

jq For Forensics

Start With The Basics

Other Output Modes

Transforming Data With Builtin Operators

Scripting With jq

Selecting Records

Enough For Now

Published by Hal Pomeranz

Leave a comment

Start With The Basics

Other Output Modes

Transforming Data With Builtin Operators

Scripting With jq

Selecting Records

Enough For Now

Share this:

Related

Published by Hal Pomeranz

Leave a comment