Ognjen Regoje bio photo

Ognjen Regoje
But you can call me Oggy


I make things that run on the web (mostly).
More ABOUT me and my PROJECTS.

me@ognjen.io LinkedIn

Combining an external folder with _posts in Jekyll v2

#jekyll

I use markdown to take notes and at the same time this blog uses markdown for the content. Naturally, I wanted to have my blog posts in my notes but not have to keep the two in sync.

I had several objectives:

  1. The file in the notes would be used for the blog. I’d not need to copy files between the two.
  2. I could selectively include files to be published.
  3. I could have additional content in the files that would not be published.
  4. Images and links would work.
  5. I could schedule posts.

To accomplish that I:

  1. Created a symbolic link to the blog directory within my notes
  2. Added a Jekyll hook that takes the posts from that directory and combines them into the posts collection
  3. Fixed links and embedded images
  4. Added the ability to exclude parts of the markdown file from the generated content

This is the end result:

# _plugins/filter.rb
module Filter
  def self.process(site, payload)
    # The link to the notes is named _linked which creates `linked` collection in Jekyll
    # This takes all the files from there and keeps only the ones that have the `publish_on`
    # attribute in the front matter set.
    # This allows me to keep track of drafts or schedule publishing.
    site.collections['linked'].docs.select!{|x| x.data['publish_on']}
    now = Time.now

    # Next, I loop through the collection in order inject each file into the `posts`
    # collection. By default, Jekyll expects the filename to contain the date.
    # Because my files do not have that their dates default  to the file
    # creation date. Since that isn't the order that I want to publish them
    # in I've added a `publish_on` attribute to the front matter
    # that is parsed and posts are then ordered by that instead.
    site.collections['linked'].docs.sort_by{|x| Time.parse(x.data['publish_on'])}.each do |x|
      t = Time.parse(x.data['publish_on'])

      # For normal builds, I want to include only the posts that should have
      # already been published. But, if the --future flag is set when running
      # Jekyll, then all scheduled posts will be rendered.
      if t <= now or site.config['future']

        # Here, a new Jekyll::Document is created that is a copy of the linked
        # document. It's then tied to the posts collection.
        new_doc = Jekyll::Document.new(
          x.path,
          {site: site, collection: site.collections['posts']}
        )

        # Jekyll triggers this under the hood, although I haven't found exactly
        # where. Instead, I trigger it manually. This reads the contents of the
        # file and sets the front matter into the data attribute of the document.
        new_doc.read

        # The date is set to the parsed `publish_on` attribute.
        new_doc.data['date'] = t

        # I do not use the draft attribute explicitly. I use only the `publish_on`.
        # This line can be excluded if you intend to have a separate attribute in
        # the front matter to explicitly manage the draft status.
        new_doc.data['draft'] = false

        # Copy the categories and the description from the front matter.
        # I'm not sure why this is not picked up immediately.
        new_doc.data['categories'] = x.data['categories']

        # I use `desc` in the notes but Jekyll uses `description`.
        new_doc.data['description'] = x.data['desc']

        # Remove the duplicate description
        x.data.delete 'desc'

        # Set the layout.
        new_doc.data['layout'] = 'post'

        # Remove any dots from the slug. Note that this isn't removing the extension
        # but because some of my earlier posts used to have files with the format
        # blog.category.slug
        new_doc.data['slug'] = x.data['slug'].split('.').last

        # Set the __coll attribute.
        # While looking into this I noticed this attribute set, although I'm not sure
        # if it's strictly necessary or where it's used.
        new_doc.data['__coll'] = "posts"

        # This line removes any content in between < !-- exclude --> and < !-- include -->
        # This lets me have additional content in blog posts that I do not want
        # to be included in the generated site
        new_doc.content = new_doc.content.split("< !-- exclude -->")
        new_doc.content = new_doc.content.reject{|x| x.include? "< !-- include -->"}.join("")
        # Note that here there is a space between < and !. The actual file does
        # not have the space. It's needed here because the syntax highligter will
        # render each term within a span which will cause the delimiters to render
        # as comments since they're no longer direct descendants of the code block.

        # Secondly, this has a limitation in that it does not support having an
        # exclude block at the beginning of the file.

        # Fix the links to images because my blog notes are in notes/blog/<category>
        # and the images are in notes/assets/images while in Jekyll images are /images
        new_doc.content.gsub!("", "")

        # I do maintain separate copies of images for the blog. This is mainly
        # so that I can keep higher resolution images in my notes and serve
        # smaller more efficient versions in the blog.

        # Fix links between posts because in the blog notes it'd be
        # [Link](<category>/slug.md). The category and the .md need
        # to be removed so that only the slug remains.
        new_doc.content.gsub!(/\[(.+)\]\(..\/(.+)\/(.+)\.md\)/, '[\1](/\3)')

        # Finally, include the new document into the posts collection
        site.collections['posts'].docs << new_doc
      end
    end
  end
end

# Register the hook to run whenever files are read.
Jekyll::Hooks.register :site, :post_read do |site, payload|
  Filter.process(site, payload)
end

This way I don’t have to keep the notes in sync and any notes that I add will be automatically added to the site.

If you run a site on Jekyll take a look at the #jekyll tag for other other useful Jekyll tweaks.