build-a-blog

This commit is contained in:
Collin Lefeber 2024-06-17 19:52:21 -04:00
parent b8056a1a84
commit aeff37dfa6
11 changed files with 2598 additions and 8 deletions

View file

@ -1,13 +1,17 @@
---
image: alpine/edge image: alpine/edge
oauth: pages.sr.ht/PAGES:RW oauth: pages.sr.ht/PAGES:RW
packages: packages:
- hut - hut
- python3
- py3-markdown
environment: environment:
site: cfebs.srht.site site: cfebs.srht.site
dest: cfebs.com dest: cfebs.com
tasks: tasks:
- package: | - package: |
cd $site cd $site
python3 ./main.py
tar -cvz . > ../site.tar.gz tar -cvz . > ../site.tar.gz
- upload: | - upload: |
hut pages publish -d $dest site.tar.gz hut pages publish -d $dest site.tar.gz

2
.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
index.html
posts/*.html

7
chalk.min.css vendored Normal file
View file

@ -0,0 +1,7 @@
/*!
Theme: Chalk
Author: Chris Kempson (http://chriskempson.com)
License: ~ MIT (or more permissive) [via base16-schemes-source]
Maintainer: @highlightjs/core-team
Version: 2021.09.0
*/pre code.hljs{display:block;overflow-x:auto;padding:1em}code.hljs{padding:3px 5px}.hljs{color:#d0d0d0;background:#151515}.hljs ::selection,.hljs::selection{background-color:#303030;color:#d0d0d0}.hljs-comment{color:#505050}.hljs-tag{color:#b0b0b0}.hljs-operator,.hljs-punctuation,.hljs-subst{color:#d0d0d0}.hljs-operator{opacity:.7}.hljs-bullet,.hljs-deletion,.hljs-name,.hljs-selector-tag,.hljs-template-variable,.hljs-variable{color:#fb9fb1}.hljs-attr,.hljs-link,.hljs-literal,.hljs-number,.hljs-symbol,.hljs-variable.constant_{color:#eda987}.hljs-class .hljs-title,.hljs-title,.hljs-title.class_{color:#ddb26f}.hljs-strong{font-weight:700;color:#ddb26f}.hljs-addition,.hljs-code,.hljs-string,.hljs-title.class_.inherited__{color:#acc267}.hljs-built_in,.hljs-doctag,.hljs-keyword.hljs-atrule,.hljs-quote,.hljs-regexp{color:#12cfc0}.hljs-attribute,.hljs-function .hljs-title,.hljs-section,.hljs-title.function_,.ruby .hljs-property{color:#6fc2ef}.diff .hljs-meta,.hljs-keyword,.hljs-template-tag,.hljs-type{color:#e1a3ee}.hljs-emphasis{color:#e1a3ee;font-style:italic}.hljs-meta,.hljs-meta .hljs-keyword,.hljs-meta .hljs-string{color:#deaf8f}.hljs-meta .hljs-keyword,.hljs-meta-keyword{font-weight:700}

9
highlight.js.min.css vendored Normal file
View file

@ -0,0 +1,9 @@
/*!
Theme: Default
Description: Original highlight.js style
Author: (c) Ivan Sagalaev <maniac@softwaremaniacs.org>
Maintainer: @highlightjs/core-team
Website: https://highlightjs.org/
License: see project LICENSE
Touched: 2021
*/pre code.hljs{display:block;overflow-x:auto;padding:1em}code.hljs{padding:3px 5px}.hljs{background:#f3f3f3;color:#444}.hljs-comment{color:#697070}.hljs-punctuation,.hljs-tag{color:#444a}.hljs-tag .hljs-attr,.hljs-tag .hljs-name{color:#444}.hljs-attribute,.hljs-doctag,.hljs-keyword,.hljs-meta .hljs-keyword,.hljs-name,.hljs-selector-tag{font-weight:700}.hljs-deletion,.hljs-number,.hljs-quote,.hljs-selector-class,.hljs-selector-id,.hljs-string,.hljs-template-tag,.hljs-type{color:#800}.hljs-section,.hljs-title{color:#800;font-weight:700}.hljs-link,.hljs-operator,.hljs-regexp,.hljs-selector-attr,.hljs-selector-pseudo,.hljs-symbol,.hljs-template-variable,.hljs-variable{color:#ab5656}.hljs-literal{color:#695}.hljs-addition,.hljs-built_in,.hljs-bullet,.hljs-code{color:#397300}.hljs-meta{color:#1f7199}.hljs-meta .hljs-string{color:#38a}.hljs-emphasis{font-style:italic}.hljs-strong{font-weight:700}

1213
highlight.min.js vendored Normal file

File diff suppressed because one or more lines are too long

View file

@ -3,9 +3,10 @@
<head> <head>
<meta charset="UTF-8"> <meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>cfebs.com</title> <title>cfebs.com${more_title}</title>
<link rel="icon" type="image/png" href="/avatar.png"> <link rel="icon" type="image/png" href="/avatar.png">
<link rel="stylesheet" href="/style.css"> <link rel="stylesheet" href="/style.css">
<link rel="stylesheet" href="/chalk.min.css">
</head> </head>
<body> <body>
<nav class="container navbar navbar-light"> <nav class="container navbar navbar-light">
@ -14,7 +15,7 @@
<section class="container"> <section class="container">
<div class="row"> <div class="row">
<div class="col-md-8 col-sm-12"> <div class="col-md-8 col-sm-12">
<p>Welcome. Something will go here eventually.</p> ${content}
</div> </div>
<div class="col-md-4 col-sm-12"> <div class="col-md-4 col-sm-12">
<img style="height: 5rem" class="mb-2" alt="A yellow monster thing and iconic avatar used by cfebs" src="/avatar.png" /> <img style="height: 5rem" class="mb-2" alt="A yellow monster thing and iconic avatar used by cfebs" src="/avatar.png" />
@ -29,5 +30,7 @@
</div> </div>
</div> </div>
</section> </section>
<script src="/highlight.min.js"></script>
<script>hljs.highlightAll();</script>
</body> </body>
</html> </html>

522
index.xml Normal file
View file

@ -0,0 +1,522 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>cfebs.com</title>
<link>https://cfebs.com</link>
<description>Recent content from cfebs.com</description>
<language>en</language>
<lastBuildDate>Mon, 17 Jun 2024 19:49:00 -0000</lastBuildDate>
<atom:link href="https://cfebs.com/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Build-a-blog</title>
<link>https://cfebs.com/posts/build_a_blog.html</link>
<pubDate>Mon, 17 Jun 2024 14:46:36 -0400</pubDate>
<guid>https://cfebs.com/posts/build_a_blog.html</guid>
<description>&lt;p&gt;I want to share my thought process for how to go about building a static blog generator from scratch.&lt;/p&gt;
&lt;p&gt;The goal is to take 1 afternoon + caffeine + some DIY spirit → &lt;em&gt;something&lt;/em&gt; resembling a static site/blog generator.&lt;/p&gt;
&lt;p&gt;Lets see how hard this will be. Here&#x27;s what a blog is/requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generate an index with recent list of posts.&lt;/li&gt;
&lt;li&gt;Generate each individual post written in markdown -&amp;gt; html&lt;ul&gt;
&lt;li&gt;Support some metadata in each post&lt;/li&gt;
&lt;li&gt;A post title should have a slug&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Generate RSS&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That boils down to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read some files&lt;/li&gt;
&lt;li&gt;Parse markdown, maybe parse a header with some key/values.&lt;/li&gt;
&lt;li&gt;Template strings&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So there is 1 &quot;exotic&quot; feature in parsing/rendering Markdown as HTML.&lt;/p&gt;
&lt;p&gt;The rest is just file and string manipulation.&lt;/p&gt;
&lt;p&gt;Most scripting languages would be fine tools for this task. But how to handle Markdown?&lt;/p&gt;
&lt;h2 id=&quot;picking-the-tool-for-the-job&quot;&gt;Picking the tool for the job&lt;/h2&gt;
&lt;p&gt;I&#x27;ve had &lt;a href=&quot;https://crystal-lang.org/&quot;&gt;Crystal&lt;/a&gt; in the back of my mind for this task. It is a nice general purpose language that included Markdown in the stdlib! But unfortunately Markdown was removed in &lt;a href=&quot;https://github.com/crystal-lang/crystal/releases/tag/0.31.0&quot;&gt;0.31.0&lt;/a&gt;. Other than that, I&#x27;m not sure any other languages include a well rounded Markdown implementation out of the box.&lt;/p&gt;
&lt;p&gt;I&#x27;ll likely be building the site in docker with an alpine image, so just a quick search in alpines repos to see what could be useful:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; docker run --rm -it alpine
/ # apk update
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz
v3.18.6-263-g77db018514d [https://dl-cdn.alpinelinux.org/alpine/v3.18/main]
v3.18.6-263-g77db018514d [https://dl-cdn.alpinelinux.org/alpine/v3.18/community]
OK: 20079 distinct packages available
/ # apk search markdown
discount-2.2.7c-r1
discount-dev-2.2.7c-r1
discount-libs-2.2.7c-r1
kdepim-addons-23.04.3-r0
markdown-1.0.1-r3
markdown-doc-1.0.1-r3
py3-docstring-to-markdown-0.12-r1
py3-docstring-to-markdown-pyc-0.12-r1
py3-html2markdown-0.1.7-r3
py3-html2markdown-pyc-0.1.7-r3
py3-markdown-3.4.3-r1
py3-markdown-it-py-2.2.0-r1
py3-markdown-it-py-pyc-2.2.0-r1
py3-markdown-pyc-3.4.3-r1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&quot;https://pkgs.alpinelinux.org/package/edge/main/x86_64/py3-markdown&quot;&gt;&lt;code&gt;py3-markdown&lt;/code&gt; in alpine&lt;/a&gt; is the popular &lt;a href=&quot;https://python-markdown.github.io/&quot;&gt;&lt;code&gt;python-markdown&lt;/code&gt;&lt;/a&gt;. It&#x27;s mature and available as a package in my &lt;a href=&quot;https://archlinux.org/packages/extra/any/python-markdown/&quot;&gt;home distro&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With that, we should have the exotic Markdown dependency figured out.&lt;/p&gt;
&lt;h2 id=&quot;lets-build&quot;&gt;Let&#x27;s build&lt;/h2&gt;
&lt;p&gt;First, lets read 1 post file and render some html.&lt;/p&gt;
&lt;p&gt;We&#x27;ll store posts in &lt;code&gt;posts/&lt;/code&gt; like &lt;code&gt;posts/build_a_blog.md&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;And we&#x27;ll store the HTML output in the same directory: &lt;code&gt;posts/build_a_blog.html&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import re
import logging
import markdown
destpath_re = re.compile(r&#x27;\.md$&#x27;)
logging.basicConfig(encoding=&#x27;utf-8&#x27;, level=logging.INFO)
def render_post(fpath):
destpath = destpath_re.sub(&#x27;.html&#x27;, fpath)
logging.info(&amp;quot;opening %s for parsing, dest %s&amp;quot;, fpath, destpath)
# from: https://python-markdown.github.io/reference/
with open(fpath, &amp;quot;r&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;) as input_file:
logging.info(&amp;quot;reading %s&amp;quot;, fpath)
text = input_file.read()
logging.info(&amp;quot;parsing %s&amp;quot;, fpath)
out = markdown.markdown(text)
with open(destpath, &amp;quot;w&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;, errors=&amp;quot;xmlcharrefreplace&amp;quot;) as output_file:
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
output_file.write(out)
if __name__ == &#x27;__main__&#x27;:
render_post(&#x27;posts/build_a_blog.md&#x27;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And if we run it.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; python3 ./main.py
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking pretty good.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; head posts/build_a_blog.html
&amp;lt;h1&amp;gt;Build-a-blog&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;I want to share my thought process for how one would go about building a static blog generator from scratch.&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Generate an index with recent list of posts.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Generate each individual post written in markdown -&amp;amp;gt; html&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Support some metadata in each post&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A post title should have a slug&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Generate RSS&amp;lt;/li&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets do this for all &lt;code&gt;.md&lt;/code&gt; files in &lt;code&gt;posts/&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import glob
...
def render_posts():
files = glob.glob(&#x27;posts/*.md&#x27;)
logging.info(&#x27;found post files %s&#x27;, files)
for fname in files:
render_post(fname)
if __name__ == &#x27;__main__&#x27;:
render_posts()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And add another simple test post&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; echo &#x27;# A new post&#x27; &amp;gt; ./posts/a_new_post.md
python3 ./main.py
INFO:root:found post files [&#x27;posts/a_new_post.md&#x27;, &#x27;posts/build_a_blog.md&#x27;]
INFO:root:opening posts/a_new_post.md for parsing, dest posts/a_new_post.html
INFO:root:reading posts/a_new_post.md
INFO:root:parsing posts/a_new_post.md
INFO:root:writing to posts/a_new_post.html
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
head ./posts/a_new_post.html
&amp;lt;h1&amp;gt;A new post&amp;lt;/h1&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Basically at this point, it&#x27;s a blog generator!&lt;/p&gt;
&lt;p&gt;But I want a few more features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Want the posts listed in the index sorted by date.&lt;/li&gt;
&lt;li&gt;Want each post to be templated in some html wrapper.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;post-ordering-and-templating&quot;&gt;Post ordering and templating&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;python-markdown&lt;/code&gt; supports metadata embedded in posts: &lt;a href=&quot;https://python-markdown.github.io/extensions/meta_data/&quot;&gt;https://python-markdown.github.io/extensions/meta_data/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I thought I&#x27;d need to build something here, but turns out it&#x27;s exactly what I need to assign a few extra attributes to a post.&lt;/p&gt;
&lt;p&gt;We&#x27;ll adjust our &quot;spec&quot; for posts such that each post must include the following metadata at the top of the file:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And I&#x27;d like to insert the &lt;code&gt;Title&lt;/code&gt; automatically as a &lt;code&gt;&amp;lt;h1&amp;gt;&lt;/code&gt; tag in each post so I don&#x27;t have to write it again in the markdown.&lt;/p&gt;
&lt;p&gt;So first, lets test the metadata and adjust the test blog post.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; head -n4 ./posts/build_a_blog.md
Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And pop open a python repl to see how this works.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&amp;gt;&amp;gt;&amp;gt; md = markdown.Markdown(extensions = [&#x27;meta&#x27;]); f = open(&#x27;posts/build_a_blog.md&#x27;, &#x27;r&#x27;); txt = f.read(); out = md.convert(txt); md.Meta
{&#x27;title&#x27;: [&#x27;Build-a-blog&#x27;], &#x27;date&#x27;: [&#x27;2024-06-17T14:46:36-04:00&#x27;]}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looks pretty nice!&lt;/p&gt;
&lt;p&gt;So first I will adjust the rendering function to prepend a&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-markdown&quot;&gt;# {title}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Line just after we read the file and extract the metadata.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
md = markdown.Markdown(extensions = [&#x27;meta&#x27;])
logging.info(&amp;quot;parsing %s&amp;quot;, fpath)
out = md.convert(text)
title = md.Meta.get(&#x27;title&#x27;)[0]
date = md.Meta.get(&#x27;date&#x27;)[0]
out = markdown.markdown(&#x27;# &#x27; + title) + out
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, lets return a structure that will make other parts of the program aware of the filename that was rendered and the metadata (title, date)&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
out = markdown.markdown(&#x27;# &#x27; + title) + out
with open(destpath, &amp;quot;w&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;, errors=&amp;quot;xmlcharrefreplace&amp;quot;) as output_file:
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
output_file.write(out)
return {
&#x27;title&#x27;: title,
&#x27;date&#x27;: date,
&#x27;fpath&#x27;: fpath,
&#x27;destpath&#x27;: destpath,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have what we need to generate a complete index.&lt;/p&gt;
&lt;h3 id=&quot;index-templating&quot;&gt;Index templating&lt;/h3&gt;
&lt;p&gt;Lets start by defining what our index template file will be.&lt;/p&gt;
&lt;p&gt;I&#x27;ll choose &lt;code&gt;index.html.tmpl&lt;/code&gt; and after rendering we will write to &lt;code&gt;index.html&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So lets make a function that will take a list of our post structure above and render it in a &lt;code&gt;&amp;lt;ul&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from string import Template
...
def posts_list_html(posts):
post_tpl = &amp;quot;&amp;quot;&amp;quot;&amp;lt;li&amp;gt;
&amp;lt;a href=&amp;quot;{href}&amp;quot;&amp;gt;{title}&amp;lt;/a&amp;gt;
&amp;lt;time datetime=&amp;quot;{date}&amp;quot;&amp;gt;{disp_date}&amp;lt;/time&amp;gt;
&amp;lt;/li&amp;gt;&amp;quot;&amp;quot;&amp;quot;
out = &#x27;&amp;lt;ul class=&amp;quot;blog-posts-list&amp;quot;&amp;gt;&#x27;
for post in posts:
disp_date = datetime.datetime.fromisoformat(post.get(&#x27;date&#x27;)).strftime(&#x27;%Y-%m-%d&#x27;)
out += post_tpl.format(href=post.get(&#x27;destpath&#x27;),
title=post.get(&#x27;title&#x27;),
date=post.get(&#x27;date&#x27;),
disp_date=disp_date)
return out + &#x27;&amp;lt;/ul&amp;gt;&#x27;
def render_index(posts):
fname = &#x27;index.html.tmpl&#x27;
outname = &#x27;index.html&#x27;
with open(fname, &#x27;r&#x27;, encoding=&#x27;utf-8&#x27;) as inf:
tmpl = Template(inf.read())
posts_html = posts_html(posts)
html = tmpl.substitute(posts=posts_html)
with open(outname, &#x27;w&#x27;, encoding=&#x27;utf-8&#x27;) as outf:
outf.write(html)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Make sure that &lt;code&gt;index.html.tmpl&lt;/code&gt; contains a template variable for &lt;code&gt;${posts}&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; grep -C2 &#x27;\${posts}&#x27; ./index.html.tmpl
&amp;lt;div class=&amp;quot;col-md-8 col-sm-12&amp;quot;&amp;gt;
&amp;lt;p&amp;gt;Welcome. Something will go here eventually.&amp;lt;/p&amp;gt;
${posts}
&amp;lt;/div&amp;gt;
&amp;lt;div class=&amp;quot;col-md-4 col-sm-12&amp;quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And we now need to connect &lt;code&gt;render_posts()&lt;/code&gt; which returns each post that was processed to &lt;code&gt;render_index()&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_posts():
files = glob.glob(&#x27;posts/*.md&#x27;)
logging.info(&#x27;found post files %s&#x27;, files)
posts = []
for fname in files:
p = render_post(fname)
posts.append(p)
logging.info(&#x27;rendered post: %s&#x27;, p)
return posts
if __name__ == &#x27;__main__&#x27;:
posts = render_posts()
logging.info(&#x27;rendered posts: %s&#x27;, posts)
render_index(posts)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And lets run it!&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; python3 ./main.py
INFO:root:found post files [&#x27;posts/a_new_post.md&#x27;, &#x27;posts/build_a_blog.md&#x27;]
INFO:root:opening posts/a_new_post.md for parsing, dest posts/a_new_post.html
INFO:root:reading posts/a_new_post.md
INFO:root:parsing posts/a_new_post.md
INFO:root:writing to posts/a_new_post.html
INFO:root:rendered post: {&#x27;title&#x27;: &#x27;A new post&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T15:09:26-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/a_new_post.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/a_new_post.html&#x27;}
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
INFO:root:rendered post: {&#x27;title&#x27;: &#x27;Build-a-blog&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T14:46:36-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/build_a_blog.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/build_a_blog.html&#x27;}
INFO:root:rendered posts: [{&#x27;title&#x27;: &#x27;A new post&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T15:09:26-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/a_new_post.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/a_new_post.html&#x27;}, {&#x27;title&#x27;: &#x27;Build-a-blog&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T14:46:36-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/build_a_blog.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/build_a_blog.html&#x27;}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And check how the output looks:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; grep -C4 &#x27;blog-posts-list&#x27; ./index.html
&amp;lt;/nav&amp;gt;
&amp;lt;section class=&amp;quot;container&amp;quot;&amp;gt;
&amp;lt;div class=&amp;quot;row&amp;quot;&amp;gt;
&amp;lt;div class=&amp;quot;col-md-8 col-sm-12&amp;quot;&amp;gt;
&amp;lt;ul class=&amp;quot;blog-posts-list&amp;quot;&amp;gt;&amp;lt;li&amp;gt;
&amp;lt;a href=&amp;quot;posts/a_new_post.html&amp;quot;&amp;gt;A new post&amp;lt;/a&amp;gt;
&amp;lt;time datetime=&amp;quot;2024-06-17T19:48:17-04:00&amp;quot;&amp;gt;2024-06-17&amp;lt;/time&amp;gt;
&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;
&amp;lt;a href=&amp;quot;posts/build_a_blog.html&amp;quot;&amp;gt;Build-a-blog&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not bad!&lt;/p&gt;
&lt;h3 id=&quot;post-templating&quot;&gt;Post templating&lt;/h3&gt;
&lt;p&gt;I think I want my blog to just maintain the overall layout from the index page and just render the post body where the main post list is.&lt;/p&gt;
&lt;p&gt;So lets make that template rendering a bit more general.&lt;/p&gt;
&lt;p&gt;We&#x27;ll redefine the content area template variable to replace as &lt;code&gt;${content}&lt;/code&gt; too.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_template(tpl_fname, out_fname, content_html):
with open(tpl_fname, &#x27;r&#x27;, encoding=&#x27;utf-8&#x27;) as inf:
tmpl = Template(inf.read())
html = tmpl.substitute(content=content_html)
with open(out_fname, &#x27;w&#x27;, encoding=&#x27;utf-8&#x27;) as outf:
outf.write(html)
def render_index(posts):
content_html = posts_list_html(posts)
render_template(&#x27;index.html.tmpl&#x27;, &#x27;index.html&#x27;, content_html)
outf.write(out)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now adjust where posts are written out.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
out = markdown.markdown(&#x27;# &#x27; + title) + out
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
render_template(&#x27;index.html.tmpl&#x27;, destpath, html)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After running you should see the each &lt;code&gt;post/*.html&lt;/code&gt; file where each post file uses the full index template and includes each generated post HTML.&lt;/p&gt;
&lt;h3 id=&quot;post-sorting&quot;&gt;Post sorting&lt;/h3&gt;
&lt;p&gt;With everything wired up now we just need to sort the posts lists by the date metadata.&lt;/p&gt;
&lt;p&gt;Lets do a bit of python repl sort testing because I never remember &lt;code&gt;datetime&lt;/code&gt; usage.&lt;/p&gt;
&lt;p&gt;Lets generate a few nicely formatted ISO date strings for testing.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; date -d&#x27;2023-01-01&#x27; -Is
2023-01-01T00:00:00-05:00
date -Is
2024-06-17T16:30:35-04:00
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And make a test array&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&amp;gt;&amp;gt;&amp;gt; posts = [{&#x27;date&#x27;: &#x27;2023-01-01T00:00:00-05:00&#x27;}, {&#x27;date&#x27;: &#x27;2024-06-17T16:30:35-04:00&#x27;}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With our current script, the older post would be listed first. So lets try a sort.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Double checking datetime parsing
&amp;gt;&amp;gt;&amp;gt; import datetime
&amp;gt;&amp;gt;&amp;gt; newer = datetime.datetime.fromisoformat(&#x27;2024-06-17T16:30:35-04:00&#x27;)
datetime.datetime(2024, 6, 17, 16, 30, 35, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
&amp;gt;&amp;gt;&amp;gt; older = datetime.datetime.fromisoformat(&#x27;2024-06-17T16:30:35-04:00&#x27;)
datetime.datetime(2024, 6, 17, 16, 30, 35, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
# Checking python sorting methods work as expected
&amp;gt;&amp;gt;&amp;gt; newer.__gt__(older)
True
&amp;gt;&amp;gt;&amp;gt; newer.__lt__(older)
False
&amp;gt;&amp;gt;&amp;gt; older.__gt__(newer)
False
&amp;gt;&amp;gt;&amp;gt; older.__lt__(newer)
True
# Doing the sort
&amp;gt;&amp;gt;&amp;gt; sorted(posts, key=lambda x: datetime.datetime.fromisoformat(x[&#x27;date&#x27;]), reverse=True)
[{&#x27;date&#x27;: &#x27;2024-06-17T16:30:35-04:00&#x27;}, {&#x27;date&#x27;: &#x27;2023-01-01T00:00:00-05:00&#x27;}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets apply this to our posts.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;if __name__ == &#x27;__main__&#x27;:
posts = render_posts()
logging.info(&#x27;rendered posts: %s&#x27;, posts)
sorted_posts = sorted(posts,
key=lambda p: datetime.datetime.fromisoformat(p[&#x27;date&#x27;]), reverse=True)
render_index(sorted_posts)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;title-templating&quot;&gt;&lt;code&gt;&amp;lt;title /&amp;gt;&lt;/code&gt; Templating&lt;/h3&gt;
&lt;p&gt;The last bit of templating is to make each post &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; different.&lt;/p&gt;
&lt;p&gt;I&#x27;ll try something like &lt;code&gt;&amp;lt;title&amp;gt;cfebs.com - ${title}&amp;lt;/title&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So &lt;code&gt;index.html.tmpl&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;title&amp;gt;cfebs.com${more_title}&amp;lt;/title&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And where we&#x27;re using the title template &lt;code&gt;more_title&lt;/code&gt; will default to empty string.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_index(posts):
content_html = posts_list_html(posts)
render_template(&#x27;index.html.tmpl&#x27;, &#x27;index.html&#x27;, {&#x27;content&#x27;: content_html, &#x27;more_title&#x27;: &#x27;&#x27;})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But for a post:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
title = md.Meta.get(&#x27;title&#x27;)[0]
date = md.Meta.get(&#x27;date&#x27;)[0]
out = markdown.markdown(&#x27;# &#x27; + title) + out
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
render_template(&#x27;index.html.tmpl&#x27;, destpath, {&#x27;content&#x27;: out, &#x27;more_title&#x27;: &#x27; - &#x27; + title})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point we have functioning blog post generation with templating.&lt;/p&gt;
&lt;h2 id=&quot;rss&quot;&gt;RSS&lt;/h2&gt;
&lt;p&gt;This should be pretty easy as RSS is just reformatting our blog index list into different XML.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;render_template&lt;/code&gt; function will be useful here with a few more tweaks. So I&#x27;ll make another template file (based off a reference &lt;a href=&quot;https://drewdevault.com/blog/index.xml&quot;&gt;https://drewdevault.com/blog/index.xml&lt;/a&gt;)&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt;# Grab the reference
curl -sL &#x27;https://drewdevault.com/blog/index.xml&#x27; &amp;gt; index.xml.example
# After a bit of editing
cat ./index.xml.tmpl
&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot; standalone=&amp;quot;yes&amp;quot;?&amp;gt;
&amp;lt;rss version=&amp;quot;2.0&amp;quot; xmlns:atom=&amp;quot;http://www.w3.org/2005/Atom&amp;quot;&amp;gt;
&amp;lt;channel&amp;gt;
&amp;lt;title&amp;gt;${site_title}&amp;lt;/title&amp;gt;
&amp;lt;link&amp;gt;${site_link}&amp;lt;/link&amp;gt;
&amp;lt;description&amp;gt;${description}&amp;lt;/description&amp;gt;
&amp;lt;language&amp;gt;en&amp;lt;/language&amp;gt;
&amp;lt;lastBuildDate&amp;gt;${last_build_date}&amp;lt;/lastBuildDate&amp;gt;
&amp;lt;atom:link href=&amp;quot;${self_full_link}&amp;quot; rel=&amp;quot;self&amp;quot; type=&amp;quot;application/rss+xml&amp;quot; /&amp;gt;
${items}
&amp;lt;/channel&amp;gt;
&amp;lt;/rss&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;render_template&lt;/code&gt; now gets even more generic and passes a &lt;code&gt;dict&lt;/code&gt; to &lt;code&gt;Template.substitute()&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_template(tpl_fname, out_fname, subs):
with open(tpl_fname, &#x27;r&#x27;, encoding=&#x27;utf-8&#x27;) as inf:
tmpl = Template(inf.read())
out = tmpl.substitute(subs)
with open(out_fname, &#x27;w&#x27;, encoding=&#x27;utf-8&#x27;) as outf:
outf.write(out)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And make sure to adjust any usages of &lt;code&gt;render_template&lt;/code&gt; that exist.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_index(posts):
content_html = posts_list_html(posts)
render_template(&#x27;index.html.tmpl&#x27;, &#x27;index.html&#x27;, {&#x27;content&#x27;: content_html})
def render_post(fname):
...
render_template(&#x27;index.html.tmpl&#x27;, destpath, {&#x27;content&#x27;: out, &#x27;more_title&#x27;: &#x27; - &#x27; + title})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now we can hack away at RSS generation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def render_rss_index(posts):
subs = {
&#x27;site_title&#x27;: &#x27;cfebs.com&#x27;,
&#x27;site_link&#x27;: &#x27;https://cfebs.com&#x27;,
&#x27;self_full_link&#x27;: &#x27;https://cfebs.com/index.xml&#x27;,
&#x27;description&#x27;: &#x27;Recent content from cfebs.com&#x27;,
&#x27;last_build_date&#x27;: &#x27;TODO&#x27;,
&#x27;items&#x27;: &#x27;TODO&#x27;,
}
render_template(&#x27;index.xml.tmpl&#x27;, &#x27;index.xml&#x27;, subs)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After this initial test and a &lt;code&gt;python3 ./main.py&lt;/code&gt; run, we should see xml filled out.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; cat ./index.xml
&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot; standalone=&amp;quot;yes&amp;quot;?&amp;gt;
&amp;lt;rss version=&amp;quot;2.0&amp;quot; xmlns:atom=&amp;quot;http://www.w3.org/2005/Atom&amp;quot;&amp;gt;
&amp;lt;channel&amp;gt;
&amp;lt;title&amp;gt;cfebs.com&amp;lt;/title&amp;gt;
&amp;lt;link&amp;gt;https://cfebs.com&amp;lt;/link&amp;gt;
&amp;lt;description&amp;gt;Recent content from cfebs.com&amp;lt;/description&amp;gt;
&amp;lt;language&amp;gt;en&amp;lt;/language&amp;gt;
&amp;lt;lastBuildDate&amp;gt;TODO&amp;lt;/lastBuildDate&amp;gt;
&amp;lt;atom:link href=&amp;quot;https://cfebs.com/index.xml&amp;quot; rel=&amp;quot;self&amp;quot; type=&amp;quot;application/rss+xml&amp;quot; /&amp;gt;
TODO
&amp;lt;/channel&amp;gt;
&amp;lt;/rss&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets finish up by generating each item entry and collecting them to be replaced in the template.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def rss_post_xml(post):
tpl = &amp;quot;&amp;quot;&amp;quot;
&amp;lt;item&amp;gt;
&amp;lt;title&amp;gt;{title}&amp;lt;/title&amp;gt;
&amp;lt;link&amp;gt;{link}&amp;lt;/link&amp;gt;
&amp;lt;pubDate&amp;gt;{pubdate}&amp;lt;/pubDate&amp;gt;
&amp;lt;guid&amp;gt;{link}&amp;lt;/guid&amp;gt;
&amp;lt;description&amp;gt;{description}&amp;lt;/description&amp;gt;
&amp;lt;/item&amp;gt;
&amp;quot;&amp;quot;&amp;quot;
with open(post[&#x27;fpath&#x27;], &#x27;r&#x27;) as inf:
text = inf.read()
md = markdown.Markdown(extensions=[&#x27;extra&#x27;, &#x27;meta&#x27;])
converted = md.convert(text)
link = &amp;quot;https://cfebs.com/&amp;quot; + post[&#x27;destpath&#x27;]
pubdate = email.utils.format_datetime(datetime.datetime.fromisoformat(post[&#x27;date&#x27;]))
subs = dict(title=post[&#x27;title&#x27;], link=link,
pubdate=pubdate,
description=converted)
for k,v in subs.items():
subs[k] = html.escape(v)
return tpl.format(**subs)
def render_rss_index(posts):
items = &#x27;&#x27;
for post in posts[:5]:
items += rss_post_xml(post)
subs = {
&#x27;site_title&#x27;: &#x27;cfebs.com&#x27;,
&#x27;site_link&#x27;: &#x27;https://cfebs.com&#x27;,
&#x27;self_full_link&#x27;: &#x27;https://cfebs.com/index.xml&#x27;,
&#x27;description&#x27;: &#x27;Recent content from cfebs.com&#x27;,
&#x27;last_build_date&#x27;: email.utils.format_datetime(datetime.datetime.now()),
}
for k,v in subs.items():
subs[k] = html.escape(v)
subs[&#x27;items&#x27;] = items
render_template(&#x27;index.xml.tmpl&#x27;, &#x27;index.xml&#x27;, subs)
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Need to use &lt;code&gt;html.escape&lt;/code&gt; anywhere we could have quotes or HTML tags in output.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;posts[:5]&lt;/code&gt; should always take the most recent 5 posts to add to the RSS feed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;/h2&gt;
&lt;p&gt;Reached the end of the afternoon, so this is where I&#x27;ll leave it.&lt;/p&gt;
&lt;p&gt;It&#x27;s not great software.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No tests, no docs&lt;/li&gt;
&lt;li&gt;Hard coding values like the domain&lt;/li&gt;
&lt;li&gt;Using adhoc dicts for generic structures&lt;/li&gt;
&lt;li&gt;Relies on system python version and packages.&lt;/li&gt;
&lt;li&gt;Does not offer anything a tool like &lt;a href=&quot;https://gohugo.io/&quot;&gt;hugo&lt;/a&gt; does not already offer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But, it&#x27;s ~150 lines of python with 1 external dependency.&lt;/p&gt;
&lt;p&gt;If python or &lt;code&gt;python-markdown&lt;/code&gt; drastically changes, it&#x27;ll probably take 10 minutes to debug.&lt;/p&gt;
&lt;p&gt;And - it was fun to write and write about.&lt;/p&gt;
&lt;p&gt;View the complete source for generating this blog:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/main.py&quot;&gt;main.py&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/index.html.tmpl&quot;&gt;index.html.tmpl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/index.xml.tmpl&quot;&gt;index.xml.tmpl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Or the full repo tree: &lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree&quot;&gt;https://git.sr.ht/~cfebs/cfebs.srht.site/tree&lt;/a&gt;&lt;/p&gt;</description>
</item>
</channel>
</rss>

12
index.xml.tmpl Normal file
View file

@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>${site_title}</title>
<link>${site_link}</link>
<description>${description}</description>
<language>en</language>
<lastBuildDate>${last_build_date}</lastBuildDate>
<atom:link href="${self_full_link}" rel="self" type="application/rss+xml" />
${items}
</channel>
</rss>

140
main.py Normal file
View file

@ -0,0 +1,140 @@
import re
import glob
import html
import email
import logging
import datetime
from string import Template
import markdown
from markdown.extensions.toc import TocExtension
destpath_re = re.compile(r'\.md$')
logging.basicConfig(encoding='utf-8', level=logging.INFO)
def render_post(fpath):
destpath = destpath_re.sub('.html', fpath)
logging.info("opening %s for parsing, dest %s", fpath, destpath)
# from: https://python-markdown.github.io/reference/
with open(fpath, "r", encoding="utf-8") as input_file:
logging.info("reading %s", fpath)
text = input_file.read()
md = markdown.Markdown(extensions = ['extra', 'meta', TocExtension(anchorlink=True)])
logging.info("parsing %s", fpath)
out = md.convert(text)
title = md.Meta.get('title')[0]
date = md.Meta.get('date')[0]
out = markdown.markdown('# ' + title) + out
logging.info("writing to %s", destpath)
render_template('index.html.tmpl', destpath, {'content': out, 'more_title': ' - ' + title})
return {
'title': title,
'date': date,
'fpath': fpath,
'destpath': destpath,
}
def render_posts():
files = glob.glob('posts/*.md')
logging.info('found post files %s', files)
posts = []
for fname in files:
p = render_post(fname)
posts.append(p)
logging.info('rendered post: %s', p)
return posts
def posts_list_html(posts):
post_tpl = """<li>
<a href="{href}">{title}</a>
<time datetime="{date}">{disp_date}</time>
</li>"""
out = '<ul class="blog-posts-list">'
for post in posts:
disp_date = datetime.datetime.fromisoformat(post.get('date')).strftime('%Y-%m-%d')
out += post_tpl.format(href=post.get('destpath'),
title=post.get('title'),
date=post.get('date'),
disp_date=disp_date)
return out + '</ul>'
def render_template(tpl_fname, out_fname, subs):
with open(tpl_fname, 'r', encoding='utf-8') as inf:
tmpl = Template(inf.read())
out = tmpl.substitute(subs)
with open(out_fname, 'w', encoding='utf-8') as outf:
outf.write(out)
def render_index(posts):
content_html = posts_list_html(posts)
render_template('index.html.tmpl', 'index.html', {'content': content_html, 'more_title': ''})
def rss_post_xml(post):
tpl = """
<item>
<title>{title}</title>
<link>{link}</link>
<pubDate>{pubdate}</pubDate>
<guid>{link}</guid>
<description>{description}</description>
</item>
"""
link = "https://cfebs.com/" + post['destpath']
with open(post['fpath'], 'r', encoding='utf-8') as inf:
text = inf.read()
md = markdown.Markdown(extensions=['extra', 'meta', 'toc'])
converted = md.convert(text)
pubdate = email.utils.format_datetime(datetime.datetime.fromisoformat(post['date']))
subs = {
'title': post['title'],
'link': link,
'pubdate': pubdate,
'description': converted
}
for k,v in subs.items():
subs[k] = html.escape(v)
return tpl.format(**subs)
def render_rss_index(posts):
items = ''
for post in posts[:5]:
items += rss_post_xml(post)
subs = {
'site_title': 'cfebs.com',
'site_link': 'https://cfebs.com',
'self_full_link': 'https://cfebs.com/index.xml',
'description': 'Recent content from cfebs.com',
'last_build_date': email.utils.format_datetime(datetime.datetime.now()),
}
for k,v in subs.items():
subs[k] = html.escape(v)
subs['items'] = items
render_template('index.xml.tmpl', 'index.xml', subs)
def main():
posts = render_posts()
logging.info('rendered posts: %s', posts)
sorted_posts = sorted(posts,
key=lambda p: datetime.datetime.fromisoformat(p['date']), reverse=True)
render_index(sorted_posts)
render_rss_index(sorted_posts)
if __name__ == '__main__':
main()

647
posts/build_a_blog.md Normal file
View file

@ -0,0 +1,647 @@
Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
I want to share my thought process for how to go about building a static blog generator from scratch.
The goal is to take 1 afternoon + caffeine + some DIY spirit → _something_ resembling a static site/blog generator.
Lets see how hard this will be. Here's what a blog is/requirements:
* Generate an index with recent list of posts.
* Generate each individual post written in markdown -> html
* Support some metadata in each post
* A post title should have a slug
* Generate RSS
That boils down to:
1. Read some files
2. Parse markdown, maybe parse a header with some key/values.
3. Template strings
So there is 1 "exotic" feature in parsing/rendering Markdown as HTML.
The rest is just file and string manipulation.
Most scripting languages would be fine tools for this task. But how to handle Markdown?
## Picking the tool for the job
I've had [Crystal][1] in the back of my mind for this task. It is a nice general purpose language that included Markdown in the stdlib! But unfortunately Markdown was removed in [0.31.0][2]. Other than that, I'm not sure any other languages include a well rounded Markdown implementation out of the box.
I'll likely be building the site in docker with an alpine image, so just a quick search in alpines repos to see what could be useful:
```shell
docker run --rm -it alpine
/ # apk update
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz
v3.18.6-263-g77db018514d [https://dl-cdn.alpinelinux.org/alpine/v3.18/main]
v3.18.6-263-g77db018514d [https://dl-cdn.alpinelinux.org/alpine/v3.18/community]
OK: 20079 distinct packages available
/ # apk search markdown
discount-2.2.7c-r1
discount-dev-2.2.7c-r1
discount-libs-2.2.7c-r1
kdepim-addons-23.04.3-r0
markdown-1.0.1-r3
markdown-doc-1.0.1-r3
py3-docstring-to-markdown-0.12-r1
py3-docstring-to-markdown-pyc-0.12-r1
py3-html2markdown-0.1.7-r3
py3-html2markdown-pyc-0.1.7-r3
py3-markdown-3.4.3-r1
py3-markdown-it-py-2.2.0-r1
py3-markdown-it-py-pyc-2.2.0-r1
py3-markdown-pyc-3.4.3-r1
```
[`py3-markdown` in alpine][3] is the popular [`python-markdown`][4]. It's mature and available as a package in my [home distro][5].
With that, we should have the exotic Markdown dependency figured out.
## Let's build
First, lets read 1 post file and render some html.
We'll store posts in `posts/` like `posts/build_a_blog.md`.
And we'll store the HTML output in the same directory: `posts/build_a_blog.html`.
```python
import re
import logging
import markdown
destpath_re = re.compile(r'\.md$')
logging.basicConfig(encoding='utf-8', level=logging.INFO)
def render_post(fpath):
destpath = destpath_re.sub('.html', fpath)
logging.info("opening %s for parsing, dest %s", fpath, destpath)
# from: https://python-markdown.github.io/reference/
with open(fpath, "r", encoding="utf-8") as input_file:
logging.info("reading %s", fpath)
text = input_file.read()
logging.info("parsing %s", fpath)
out = markdown.markdown(text)
with open(destpath, "w", encoding="utf-8", errors="xmlcharrefreplace") as output_file:
logging.info("writing to %s", destpath)
output_file.write(out)
if __name__ == '__main__':
render_post('posts/build_a_blog.md')
```
And if we run it.
```shell
python3 ./main.py
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
```
Looking pretty good.
head posts/build_a_blog.html
<h1>Build-a-blog</h1>
<p>I want to share my thought process for how one would go about building a static blog generator from scratch.</p>
<ul>
<li>Generate an index with recent list of posts.</li>
<li>Generate each individual post written in markdown -&gt; html<ul>
<li>Support some metadata in each post</li>
<li>A post title should have a slug</li>
</ul>
</li>
<li>Generate RSS</li>
Now lets do this for all `.md` files in `posts/`
```python
import glob
...
def render_posts():
files = glob.glob('posts/*.md')
logging.info('found post files %s', files)
for fname in files:
render_post(fname)
if __name__ == '__main__':
render_posts()
```
And add another simple test post
```shell
echo '# A new post' > ./posts/a_new_post.md
python3 ./main.py
INFO:root:found post files ['posts/a_new_post.md', 'posts/build_a_blog.md']
INFO:root:opening posts/a_new_post.md for parsing, dest posts/a_new_post.html
INFO:root:reading posts/a_new_post.md
INFO:root:parsing posts/a_new_post.md
INFO:root:writing to posts/a_new_post.html
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
head ./posts/a_new_post.html
<h1>A new post</h1>
```
Basically at this point, it's a blog generator!
But I want a few more features:
* Want the posts listed in the index sorted by date.
* Want each post to be templated in some html wrapper.
## Post ordering and templating
`python-markdown` supports metadata embedded in posts: <https://python-markdown.github.io/extensions/meta_data/>
I thought I'd need to build something here, but turns out it's exactly what I need to assign a few extra attributes to a post.
We'll adjust our "spec" for posts such that each post must include the following metadata at the top of the file:
```txt
Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
```
And I'd like to insert the `Title` automatically as a `<h1>` tag in each post so I don't have to write it again in the markdown.
So first, lets test the metadata and adjust the test blog post.
```shell
head -n4 ./posts/build_a_blog.md
Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
```
And pop open a python repl to see how this works.
```python
>>> md = markdown.Markdown(extensions = ['meta']); f = open('posts/build_a_blog.md', 'r'); txt = f.read(); out = md.convert(txt); md.Meta
{'title': ['Build-a-blog'], 'date': ['2024-06-17T14:46:36-04:00']}
```
Looks pretty nice!
So first I will adjust the rendering function to prepend a
```markdown
# {title}
```
Line just after we read the file and extract the metadata.
```python
def render_post(fpath):
...
md = markdown.Markdown(extensions = ['meta'])
logging.info("parsing %s", fpath)
out = md.convert(text)
title = md.Meta.get('title')[0]
date = md.Meta.get('date')[0]
out = markdown.markdown('# ' + title) + out
```
Finally, lets return a structure that will make other parts of the program aware of the filename that was rendered and the metadata (title, date)
```python
def render_post(fpath):
...
out = markdown.markdown('# ' + title) + out
with open(destpath, "w", encoding="utf-8", errors="xmlcharrefreplace") as output_file:
logging.info("writing to %s", destpath)
output_file.write(out)
return {
'title': title,
'date': date,
'fpath': fpath,
'destpath': destpath,
}
```
Now we have what we need to generate a complete index.
### Index templating
Lets start by defining what our index template file will be.
I'll choose `index.html.tmpl` and after rendering we will write to `index.html`.
So lets make a function that will take a list of our post structure above and render it in a `<ul>`.
```
from string import Template
...
def posts_list_html(posts):
post_tpl = """<li>
<a href="{href}">{title}</a>
<time datetime="{date}">{disp_date}</time>
</li>"""
out = '<ul class="blog-posts-list">'
for post in posts:
disp_date = datetime.datetime.fromisoformat(post.get('date')).strftime('%Y-%m-%d')
out += post_tpl.format(href=post.get('destpath'),
title=post.get('title'),
date=post.get('date'),
disp_date=disp_date)
return out + '</ul>'
def render_index(posts):
fname = 'index.html.tmpl'
outname = 'index.html'
with open(fname, 'r', encoding='utf-8') as inf:
tmpl = Template(inf.read())
posts_html = posts_html(posts)
html = tmpl.substitute(posts=posts_html)
with open(outname, 'w', encoding='utf-8') as outf:
outf.write(html)
```
Make sure that `index.html.tmpl` contains a template variable for `${posts}`
```shell
grep -C2 '\${posts}' ./index.html.tmpl
<div class="col-md-8 col-sm-12">
<p>Welcome. Something will go here eventually.</p>
${posts}
</div>
<div class="col-md-4 col-sm-12">
```
And we now need to connect `render_posts()` which returns each post that was processed to `render_index()`
```python
def render_posts():
files = glob.glob('posts/*.md')
logging.info('found post files %s', files)
posts = []
for fname in files:
p = render_post(fname)
posts.append(p)
logging.info('rendered post: %s', p)
return posts
if __name__ == '__main__':
posts = render_posts()
logging.info('rendered posts: %s', posts)
render_index(posts)
```
And lets run it!
```shell
python3 ./main.py
INFO:root:found post files ['posts/a_new_post.md', 'posts/build_a_blog.md']
INFO:root:opening posts/a_new_post.md for parsing, dest posts/a_new_post.html
INFO:root:reading posts/a_new_post.md
INFO:root:parsing posts/a_new_post.md
INFO:root:writing to posts/a_new_post.html
INFO:root:rendered post: {'title': 'A new post', 'date': '2024-06-17T15:09:26-04:00', 'fpath': 'posts/a_new_post.md', 'destpath': 'posts/a_new_post.html'}
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
INFO:root:rendered post: {'title': 'Build-a-blog', 'date': '2024-06-17T14:46:36-04:00', 'fpath': 'posts/build_a_blog.md', 'destpath': 'posts/build_a_blog.html'}
INFO:root:rendered posts: [{'title': 'A new post', 'date': '2024-06-17T15:09:26-04:00', 'fpath': 'posts/a_new_post.md', 'destpath': 'posts/a_new_post.html'}, {'title': 'Build-a-blog', 'date': '2024-06-17T14:46:36-04:00', 'fpath': 'posts/build_a_blog.md', 'destpath': 'posts/build_a_blog.html'}]
```
And check how the output looks:
```shell
grep -C4 'blog-posts-list' ./index.html
</nav>
<section class="container">
<div class="row">
<div class="col-md-8 col-sm-12">
<ul class="blog-posts-list"><li>
<a href="posts/a_new_post.html">A new post</a>
<time datetime="2024-06-17T19:48:17-04:00">2024-06-17</time>
</li><li>
<a href="posts/build_a_blog.html">Build-a-blog</a>
```
Not bad!
### Post templating
I think I want my blog to just maintain the overall layout from the index page and just render the post body where the main post list is.
So lets make that template rendering a bit more general.
We'll redefine the content area template variable to replace as `${content}` too.
```python
def render_template(tpl_fname, out_fname, content_html):
with open(tpl_fname, 'r', encoding='utf-8') as inf:
tmpl = Template(inf.read())
html = tmpl.substitute(content=content_html)
with open(out_fname, 'w', encoding='utf-8') as outf:
outf.write(html)
def render_index(posts):
content_html = posts_list_html(posts)
render_template('index.html.tmpl', 'index.html', content_html)
outf.write(out)
```
And now adjust where posts are written out.
```python
def render_post(fpath):
...
out = markdown.markdown('# ' + title) + out
logging.info("writing to %s", destpath)
render_template('index.html.tmpl', destpath, html)
```
After running you should see the each `post/*.html` file where each post file uses the full index template and includes each generated post HTML.
### Post sorting
With everything wired up now we just need to sort the posts lists by the date metadata.
Lets do a bit of python repl sort testing because I never remember `datetime` usage.
Lets generate a few nicely formatted ISO date strings for testing.
```shell
date -d'2023-01-01' -Is
2023-01-01T00:00:00-05:00
date -Is
2024-06-17T16:30:35-04:00
```
And make a test array
```python
>>> posts = [{'date': '2023-01-01T00:00:00-05:00'}, {'date': '2024-06-17T16:30:35-04:00'}]
```
With our current script, the older post would be listed first. So lets try a sort.
```
# Double checking datetime parsing
>>> import datetime
>>> newer = datetime.datetime.fromisoformat('2024-06-17T16:30:35-04:00')
datetime.datetime(2024, 6, 17, 16, 30, 35, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
>>> older = datetime.datetime.fromisoformat('2024-06-17T16:30:35-04:00')
datetime.datetime(2024, 6, 17, 16, 30, 35, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
# Checking python sorting methods work as expected
>>> newer.__gt__(older)
True
>>> newer.__lt__(older)
False
>>> older.__gt__(newer)
False
>>> older.__lt__(newer)
True
# Doing the sort
>>> sorted(posts, key=lambda x: datetime.datetime.fromisoformat(x['date']), reverse=True)
[{'date': '2024-06-17T16:30:35-04:00'}, {'date': '2023-01-01T00:00:00-05:00'}]
```
Now lets apply this to our posts.
```python
if __name__ == '__main__':
posts = render_posts()
logging.info('rendered posts: %s', posts)
sorted_posts = sorted(posts,
key=lambda p: datetime.datetime.fromisoformat(p['date']), reverse=True)
render_index(sorted_posts)
```
### `<title />` Templating
The last bit of templating is to make each post `<title>` different.
I'll try something like `<title>cfebs.com - ${title}</title>`
So `index.html.tmpl`
```html
<title>cfebs.com${more_title}</title>
```
And where we're using the title template `more_title` will default to empty string.
```python
def render_index(posts):
content_html = posts_list_html(posts)
render_template('index.html.tmpl', 'index.html', {'content': content_html, 'more_title': ''})
```
But for a post:
```python
def render_post(fpath):
...
title = md.Meta.get('title')[0]
date = md.Meta.get('date')[0]
out = markdown.markdown('# ' + title) + out
logging.info("writing to %s", destpath)
render_template('index.html.tmpl', destpath, {'content': out, 'more_title': ' - ' + title})
```
At this point we have functioning blog post generation with templating.
## RSS
This should be pretty easy as RSS is just reformatting our blog index list into different XML.
The `render_template` function will be useful here with a few more tweaks. So I'll make another template file (based off a reference <https://drewdevault.com/blog/index.xml>)
```shell
# Grab the reference
curl -sL 'https://drewdevault.com/blog/index.xml' > index.xml.example
# After a bit of editing
cat ./index.xml.tmpl
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>${site_title}</title>
<link>${site_link}</link>
<description>${description}</description>
<language>en</language>
<lastBuildDate>${last_build_date}</lastBuildDate>
<atom:link href="${self_full_link}" rel="self" type="application/rss+xml" />
${items}
</channel>
</rss>
```
`render_template` now gets even more generic and passes a `dict` to `Template.substitute()`
```python
def render_template(tpl_fname, out_fname, subs):
with open(tpl_fname, 'r', encoding='utf-8') as inf:
tmpl = Template(inf.read())
out = tmpl.substitute(subs)
with open(out_fname, 'w', encoding='utf-8') as outf:
outf.write(out)
```
And make sure to adjust any usages of `render_template` that exist.
```python
def render_index(posts):
content_html = posts_list_html(posts)
render_template('index.html.tmpl', 'index.html', {'content': content_html})
def render_post(fname):
...
render_template('index.html.tmpl', destpath, {'content': out, 'more_title': ' - ' + title})
```
And now we can hack away at RSS generation:
```
def render_rss_index(posts):
subs = {
'site_title': 'cfebs.com',
'site_link': 'https://cfebs.com',
'self_full_link': 'https://cfebs.com/index.xml',
'description': 'Recent content from cfebs.com',
'last_build_date': 'TODO',
'items': 'TODO',
}
render_template('index.xml.tmpl', 'index.xml', subs)
```
After this initial test and a `python3 ./main.py` run, we should see xml filled out.
```
cat ./index.xml
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>cfebs.com</title>
<link>https://cfebs.com</link>
<description>Recent content from cfebs.com</description>
<language>en</language>
<lastBuildDate>TODO</lastBuildDate>
<atom:link href="https://cfebs.com/index.xml" rel="self" type="application/rss+xml" />
TODO
</channel>
</rss>
```
Now lets finish up by generating each item entry and collecting them to be replaced in the template.
```python
def rss_post_xml(post):
tpl = """
<item>
<title>{title}</title>
<link>{link}</link>
<pubDate>{pubdate}</pubDate>
<guid>{link}</guid>
<description>{description}</description>
</item>
"""
with open(post['fpath'], 'r') as inf:
text = inf.read()
md = markdown.Markdown(extensions=['extra', 'meta'])
converted = md.convert(text)
link = "https://cfebs.com/" + post['destpath']
pubdate = email.utils.format_datetime(datetime.datetime.fromisoformat(post['date']))
subs = dict(title=post['title'], link=link,
pubdate=pubdate,
description=converted)
for k,v in subs.items():
subs[k] = html.escape(v)
return tpl.format(**subs)
def render_rss_index(posts):
items = ''
for post in posts[:5]:
items += rss_post_xml(post)
subs = {
'site_title': 'cfebs.com',
'site_link': 'https://cfebs.com',
'self_full_link': 'https://cfebs.com/index.xml',
'description': 'Recent content from cfebs.com',
'last_build_date': email.utils.format_datetime(datetime.datetime.now()),
}
for k,v in subs.items():
subs[k] = html.escape(v)
subs['items'] = items
render_template('index.xml.tmpl', 'index.xml', subs)
```
* Need to use `html.escape` anywhere we could have quotes or HTML tags in output.
* `posts[:5]` should always take the most recent 5 posts to add to the RSS feed.
## Wrapping up
Reached the end of the afternoon, so this is where I'll leave it.
It's not great software.
* No tests, no docs
* No input validation
* Hard coding values like the domain
* Using adhoc dicts for generic structures
* Relies on system python version and packages.
* Does not offer anything a tool like [hugo][hugo] does not already offer.
But, it's ~150 lines of python with 1 external dependency.
If python or `python-markdown` drastically changes, it'll probably take <10 minutes to debug.
And - it was fun to write and write about.
View the complete source for generating this blog:
* [main.py](https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/main.py)
* [index.html.tmpl](https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/index.html.tmpl)
* [index.xml.tmpl](https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/index.xml.tmpl)
Or the full repo tree: <https://git.sr.ht/~cfebs/cfebs.srht.site/tree>
[1]: https://crystal-lang.org/
[2]: https://github.com/crystal-lang/crystal/releases/tag/0.31.0
[3]: https://pkgs.alpinelinux.org/package/edge/main/x86_64/py3-markdown
[4]: https://python-markdown.github.io/
[5]: https://archlinux.org/packages/extra/any/python-markdown/
[hugo]: https://gohugo.io/

View file

@ -10061,3 +10061,34 @@ dl dd {
padding-left: 0; padding-left: 0;
padding-right: 0 padding-right: 0
} }
/* custom */
h1,
.h1 {
font-size: 1.5rem;
}
h2,
.h2 {
font-size: 1.25rem;
}
h3,
.h3 {
font-size: 1.2rem;
}
h4,
.h4 {
font-size: 1.1rem;
}
h5,
.h5 {
font-size: 1.15rem;
}
h6,
.h6 {
font-size: 1rem
}