Ognjen Regoje bio photo

Ognjen Regoje
But you can call me Oggy


I make things that run on the web (mostly).
More ABOUT me and my PROJECTS.

me@ognjen.io LinkedIn

Generating a PDF from a Jekyll post using `pandoc`

#jekyll #pandoc #technical

I discovered pandoc.

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

It can convert from markdown to PDF using LaTeX, so I used it to generate the PDF version of TheMarketplace.Guide.

It ended up being very straightforward. Here are the generalized instructions, instead of a detailed tutorial, simply because my use case was very specific.

Input is a markdown file

Because any jekyll post is valid input, you can get started just by running pandoc on it directly.

In fact, here is this post in PDF with no customizations whatsoever.

The command is:

pandoc generating-pdf-from-jekyll-using-pandoc.md \\
  --output=generating-pdf-from-jekyll-using-pandoc.pdf

It already looks better than wkhtmltopdf.

I added a pdf field to the frontmatter

Any posts with that field set to true will have their PDF automatically generated.

The code for that looks like this:

site.collections['posts'].select{|x| x.data['pdf']}.each do |post|
  `pandoc --from=markdown --output=#{post.data['slug']}.pdf \\
     #{post.path}` unless File.exists?("#{post.data['slug']}.pdf")
end

It is in the post_write hook. It also only generates documents that don’t already exist. This is a tradeoff between making it run quicker and having to delete files to update them.

You can also include additonal LaTeX functionality

You can use the frontmatter to customize the LaTeX generated.

The fields I found most useful are title, author and date which allow you to generate a title page. In the example PDF above, the title from the frontmatter is the title in the PDF.

You can also specify the documentclass. The default is article, but there are many more including book and report.

There is also the header-includes field that adds commands into the preamble. It must look something like this:

---
...
header-includes: |
  \usepackage{hyperref}
---

Focus is always on building new things. But I think we need to be stricter with what we do build.

I found it somewhat finicky. I had to put it last in the frontmatter, or it’d not get picked up and it had to follow exactly that alignment.

\usepackage[left=1in,top=1in,right=1in,bottom=1in]{geometry} is a useful snippet to add in the header-includes. It sets all the margins to one inch.

And you can sprinkle LaTeX in

Your input file can also include valid LaTeX commands.

I used \chapter and \section commands to properly structure the document.

\newpage{} between sections.

I also manually generated some content, not in the original markdown, and I used \item[$\blacksquare$].

The final command

For TheMarketplace.Guide PDF the full command I use is:

pandoc --from=markdown --output='The Marketplace Guide.pdf' \\
  combined.md --shift-heading-level-by=1

I used --shift-heading-level-by to have an additional level for the table of contents, by using \chapter.

I added this command into the post_write hook in Jekyll. The command takes about 15 seconds to run because it combines some hundred pages. Therefore, the hook runs the command only if the site isn’t serving (!site.config['serving']), only when running jekyll b.

The result is a nicely typeset document for which I’m comfortable charging.