⏴ Back to all articles

Published on 2024-10-31

Tip of the day #3: Convert a CSV to a markdown or HTML table

🏷️ Markdown, Csv, Awk, Tip of the day

The other day at work, I found myself having to produce a human-readable table of all the direct dependencies in the project, for auditing purposes.

There is a tool for Rust projects that outputs a TSV (meaning: a CSV where the separator is the tab character) of this data. That's great, but not really fit for consumption by a non-technical person.

I just need to convert that to a human readable table in markdown or HTML, and voila!

Here's the output of this tool in my open-source Rust project:

$ cargo license --all-features --avoid-build-deps --avoid-dev-deps --direct-deps-only --tsv
name	version	authors	repository	license	license_file	description
clap	2.33.0	Kevin K. <kbknapp@gmail.com>	https://github.com/clap-rs/clap	MIT		A simple to use, efficient, and full-featured Command Line Argument Parser
heck	0.3.1	Without Boats <woboats@gmail.com>	https://github.com/withoutboats/heck	Apache-2.0 OR MIT		heck is a case conversion library.
kotlin	0.1.0	Philippe Gaultier <philigaultier@gmail.com>				
log	0.4.8	The Rust Project Developers	https://github.com/rust-lang/log	Apache-2.0 OR MIT		A lightweight logging facade for Rust
pretty_env_logger	0.3.1	Sean McArthur <sean@seanmonstar>	https://github.com/seanmonstar/pretty-env-logger	Apache-2.0 OR MIT		a visually pretty env_logger
termcolor	1.1.0	Andrew Gallant <jamslam@gmail.com>	https://github.com/BurntSushi/termcolor	MIT OR Unlicense		A simple cross platform library for writing colored text to a terminal.

Not really readable. We need to transform this data into a markdown table, something like that:

| First Header  | Second Header |
| ------------- | ------------- |
| Content Cell  | Content Cell  |
| Content Cell  | Content Cell  |

Technically, markdown tables are an extension to standard markdown (if there is such a thing), but they are very common and supported by all the major platforms e.g. Github, Azure, etc. So how do we do that?

Once again, I turn to the trusty AWK. It's always been there for me. And it's present on every UNIX system out of the box.

AWK neatly handles all the 'decoding' of the CSV format for us, we just need to output the right thing:

Here's the full implementation (don't forget to mark the file executable). The shebang line instructs AWK to use the tab character \t as the delimiter between fields:

#!/usr/bin/env -S awk -F '\t' -f

{
    printf("|");
    for (i = 1; i <= NF; i++) {
        # Note: if a field contains the character `|`, it will mess up the table. 
        # In this case, we should replace this character by something else e.g. `,`:
        gsub(/\|/, ",", $i);
        printf(" %s |", $i);
    } 
    printf("\n");
} 

NR==1 { # Output the delimiting line
    printf("|");
    for(i = 1; i <= NF; i++) {
        printf(" --- | ");
    }
    printf("\n");
}

The first clause will execute for each line of the input. The for loop then iterates over each field and outputs the right thing.

The second clause will execute only for the first line (NR is the line number).

The same line can trigger multiple clauses, here, the first line of the input will trigger both clauses, whilst the remaining lines will only trigger the first clause.

So let's run it!

$ cargo license --all-features --avoid-build-deps --avoid-dev-deps --direct-deps-only --tsv | ./md-table.awk 
| name | version | authors | repository | license | license_file | description |
| --- |  --- |  --- |  --- |  --- |  --- |  --- | 
| clap | 2.33.0 | Kevin K. <kbknapp@gmail.com> | https://github.com/clap-rs/clap | MIT |  | A simple to use, efficient, and full-featured Command Line Argument Parser |
| heck | 0.3.1 | Without Boats <woboats@gmail.com> | https://github.com/withoutboats/heck | Apache-2.0 OR MIT |  | heck is a case conversion library. |
| kotlin | 0.1.0 | Philippe Gaultier <philigaultier@gmail.com> |  |  |  |  |
| log | 0.4.8 | The Rust Project Developers | https://github.com/rust-lang/log | Apache-2.0 OR MIT |  | A lightweight logging facade for Rust |
| pretty_env_logger | 0.3.1 | Sean McArthur <sean@seanmonstar> | https://github.com/seanmonstar/pretty-env-logger | Apache-2.0 OR MIT |  | a visually pretty env_logger |
| termcolor | 1.1.0 | Andrew Gallant <jamslam@gmail.com> | https://github.com/BurntSushi/termcolor | MIT OR Unlicense |  | A simple cross platform library for writing colored text to a terminal. |

Ok, it's hard to really know if that's correct or not. Let's pipe it into cmark-gfm to render this markdown table as HTML:

$ cargo license --all-features --avoid-build-deps --avoid-dev-deps --direct-deps-only --tsv | ./md-table.awk | cmark-gfm -e table

And voila:

name version authors repository license license_file description
clap 2.33.0 Kevin K. kbknapp@gmail.com https://github.com/clap-rs/clap MIT A simple to use, efficient, and full-featured Command Line Argument Parser
heck 0.3.1 Without Boats woboats@gmail.com https://github.com/withoutboats/heck Apache-2.0 OR MIT heck is a case conversion library.
kotlin 0.1.0 Philippe Gaultier philigaultier@gmail.com
log 0.4.8 The Rust Project Developers https://github.com/rust-lang/log Apache-2.0 OR MIT A lightweight logging facade for Rust
pretty_env_logger 0.3.1 Sean McArthur sean@seanmonstar https://github.com/seanmonstar/pretty-env-logger Apache-2.0 OR MIT a visually pretty env_logger
termcolor 1.1.0 Andrew Gallant jamslam@gmail.com https://github.com/BurntSushi/termcolor MIT OR Unlicense A simple cross platform library for writing colored text to a terminal.

All in all, very little code. I have a feeling that I will use this approach a lot in the future for reporting or even inspecting data easily, for example from a database dump.

⏴ Back to all articles

This blog is open-source! If you find a problem, please open a Github issue. The content of this blog as well as the code snippets are under the BSD-3 License which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.