build_a_blog: rm index.xml, intro

This commit is contained in:
Collin Lefeber 2024-06-17 23:02:33 -04:00
parent 3519112157
commit 7cdfee36fc
3 changed files with 15 additions and 525 deletions

1
.gitignore vendored
View file

@ -1,2 +1,3 @@
index.html
posts/*.html
index.xml

522
index.xml
View file

@ -1,522 +0,0 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>cfebs.com</title>
<link>https://cfebs.com</link>
<description>Recent content from cfebs.com</description>
<language>en</language>
<lastBuildDate>Mon, 17 Jun 2024 19:49:00 -0000</lastBuildDate>
<atom:link href="https://cfebs.com/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Build-a-blog</title>
<link>https://cfebs.com/posts/build_a_blog.html</link>
<pubDate>Mon, 17 Jun 2024 14:46:36 -0400</pubDate>
<guid>https://cfebs.com/posts/build_a_blog.html</guid>
<description>&lt;p&gt;I want to share my thought process for how to go about building a static blog generator from scratch.&lt;/p&gt;
&lt;p&gt;The goal is to take 1 afternoon + caffeine + some DIY spirit → &lt;em&gt;something&lt;/em&gt; resembling a static site/blog generator.&lt;/p&gt;
&lt;p&gt;Lets see how hard this will be. Here&#x27;s what a blog is/requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generate an index with recent list of posts.&lt;/li&gt;
&lt;li&gt;Generate each individual post written in markdown -&amp;gt; html&lt;ul&gt;
&lt;li&gt;Support some metadata in each post&lt;/li&gt;
&lt;li&gt;A post title should have a slug&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Generate RSS&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That boils down to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read some files&lt;/li&gt;
&lt;li&gt;Parse markdown, maybe parse a header with some key/values.&lt;/li&gt;
&lt;li&gt;Template strings&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So there is 1 &quot;exotic&quot; feature in parsing/rendering Markdown as HTML.&lt;/p&gt;
&lt;p&gt;The rest is just file and string manipulation.&lt;/p&gt;
&lt;p&gt;Most scripting languages would be fine tools for this task. But how to handle Markdown?&lt;/p&gt;
&lt;h2 id=&quot;picking-the-tool-for-the-job&quot;&gt;Picking the tool for the job&lt;/h2&gt;
&lt;p&gt;I&#x27;ve had &lt;a href=&quot;https://crystal-lang.org/&quot;&gt;Crystal&lt;/a&gt; in the back of my mind for this task. It is a nice general purpose language that included Markdown in the stdlib! But unfortunately Markdown was removed in &lt;a href=&quot;https://github.com/crystal-lang/crystal/releases/tag/0.31.0&quot;&gt;0.31.0&lt;/a&gt;. Other than that, I&#x27;m not sure any other languages include a well rounded Markdown implementation out of the box.&lt;/p&gt;
&lt;p&gt;I&#x27;ll likely be building the site in docker with an alpine image, so just a quick search in alpines repos to see what could be useful:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; docker run --rm -it alpine
/ # apk update
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz
v3.18.6-263-g77db018514d [https://dl-cdn.alpinelinux.org/alpine/v3.18/main]
v3.18.6-263-g77db018514d [https://dl-cdn.alpinelinux.org/alpine/v3.18/community]
OK: 20079 distinct packages available
/ # apk search markdown
discount-2.2.7c-r1
discount-dev-2.2.7c-r1
discount-libs-2.2.7c-r1
kdepim-addons-23.04.3-r0
markdown-1.0.1-r3
markdown-doc-1.0.1-r3
py3-docstring-to-markdown-0.12-r1
py3-docstring-to-markdown-pyc-0.12-r1
py3-html2markdown-0.1.7-r3
py3-html2markdown-pyc-0.1.7-r3
py3-markdown-3.4.3-r1
py3-markdown-it-py-2.2.0-r1
py3-markdown-it-py-pyc-2.2.0-r1
py3-markdown-pyc-3.4.3-r1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&quot;https://pkgs.alpinelinux.org/package/edge/main/x86_64/py3-markdown&quot;&gt;&lt;code&gt;py3-markdown&lt;/code&gt; in alpine&lt;/a&gt; is the popular &lt;a href=&quot;https://python-markdown.github.io/&quot;&gt;&lt;code&gt;python-markdown&lt;/code&gt;&lt;/a&gt;. It&#x27;s mature and available as a package in my &lt;a href=&quot;https://archlinux.org/packages/extra/any/python-markdown/&quot;&gt;home distro&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With that, we should have the exotic Markdown dependency figured out.&lt;/p&gt;
&lt;h2 id=&quot;lets-build&quot;&gt;Let&#x27;s build&lt;/h2&gt;
&lt;p&gt;First, lets read 1 post file and render some html.&lt;/p&gt;
&lt;p&gt;We&#x27;ll store posts in &lt;code&gt;posts/&lt;/code&gt; like &lt;code&gt;posts/build_a_blog.md&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;And we&#x27;ll store the HTML output in the same directory: &lt;code&gt;posts/build_a_blog.html&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import re
import logging
import markdown
destpath_re = re.compile(r&#x27;\.md$&#x27;)
logging.basicConfig(encoding=&#x27;utf-8&#x27;, level=logging.INFO)
def render_post(fpath):
destpath = destpath_re.sub(&#x27;.html&#x27;, fpath)
logging.info(&amp;quot;opening %s for parsing, dest %s&amp;quot;, fpath, destpath)
# from: https://python-markdown.github.io/reference/
with open(fpath, &amp;quot;r&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;) as input_file:
logging.info(&amp;quot;reading %s&amp;quot;, fpath)
text = input_file.read()
logging.info(&amp;quot;parsing %s&amp;quot;, fpath)
out = markdown.markdown(text)
with open(destpath, &amp;quot;w&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;, errors=&amp;quot;xmlcharrefreplace&amp;quot;) as output_file:
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
output_file.write(out)
if __name__ == &#x27;__main__&#x27;:
render_post(&#x27;posts/build_a_blog.md&#x27;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And if we run it.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; python3 ./main.py
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking pretty good.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; head posts/build_a_blog.html
&amp;lt;h1&amp;gt;Build-a-blog&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;I want to share my thought process for how one would go about building a static blog generator from scratch.&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Generate an index with recent list of posts.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Generate each individual post written in markdown -&amp;amp;gt; html&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Support some metadata in each post&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A post title should have a slug&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Generate RSS&amp;lt;/li&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets do this for all &lt;code&gt;.md&lt;/code&gt; files in &lt;code&gt;posts/&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import glob
...
def render_posts():
files = glob.glob(&#x27;posts/*.md&#x27;)
logging.info(&#x27;found post files %s&#x27;, files)
for fname in files:
render_post(fname)
if __name__ == &#x27;__main__&#x27;:
render_posts()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And add another simple test post&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; echo &#x27;# A new post&#x27; &amp;gt; ./posts/a_new_post.md
python3 ./main.py
INFO:root:found post files [&#x27;posts/a_new_post.md&#x27;, &#x27;posts/build_a_blog.md&#x27;]
INFO:root:opening posts/a_new_post.md for parsing, dest posts/a_new_post.html
INFO:root:reading posts/a_new_post.md
INFO:root:parsing posts/a_new_post.md
INFO:root:writing to posts/a_new_post.html
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
head ./posts/a_new_post.html
&amp;lt;h1&amp;gt;A new post&amp;lt;/h1&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Basically at this point, it&#x27;s a blog generator!&lt;/p&gt;
&lt;p&gt;But I want a few more features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Want the posts listed in the index sorted by date.&lt;/li&gt;
&lt;li&gt;Want each post to be templated in some html wrapper.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;post-ordering-and-templating&quot;&gt;Post ordering and templating&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;python-markdown&lt;/code&gt; supports metadata embedded in posts: &lt;a href=&quot;https://python-markdown.github.io/extensions/meta_data/&quot;&gt;https://python-markdown.github.io/extensions/meta_data/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I thought I&#x27;d need to build something here, but turns out it&#x27;s exactly what I need to assign a few extra attributes to a post.&lt;/p&gt;
&lt;p&gt;We&#x27;ll adjust our &quot;spec&quot; for posts such that each post must include the following metadata at the top of the file:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And I&#x27;d like to insert the &lt;code&gt;Title&lt;/code&gt; automatically as a &lt;code&gt;&amp;lt;h1&amp;gt;&lt;/code&gt; tag in each post so I don&#x27;t have to write it again in the markdown.&lt;/p&gt;
&lt;p&gt;So first, lets test the metadata and adjust the test blog post.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; head -n4 ./posts/build_a_blog.md
Title: Build-a-blog
Date: 2024-06-17T14:46:36-04:00
---
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And pop open a python repl to see how this works.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&amp;gt;&amp;gt;&amp;gt; md = markdown.Markdown(extensions = [&#x27;meta&#x27;]); f = open(&#x27;posts/build_a_blog.md&#x27;, &#x27;r&#x27;); txt = f.read(); out = md.convert(txt); md.Meta
{&#x27;title&#x27;: [&#x27;Build-a-blog&#x27;], &#x27;date&#x27;: [&#x27;2024-06-17T14:46:36-04:00&#x27;]}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looks pretty nice!&lt;/p&gt;
&lt;p&gt;So first I will adjust the rendering function to prepend a&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-markdown&quot;&gt;# {title}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Line just after we read the file and extract the metadata.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
md = markdown.Markdown(extensions = [&#x27;meta&#x27;])
logging.info(&amp;quot;parsing %s&amp;quot;, fpath)
out = md.convert(text)
title = md.Meta.get(&#x27;title&#x27;)[0]
date = md.Meta.get(&#x27;date&#x27;)[0]
out = markdown.markdown(&#x27;# &#x27; + title) + out
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, lets return a structure that will make other parts of the program aware of the filename that was rendered and the metadata (title, date)&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
out = markdown.markdown(&#x27;# &#x27; + title) + out
with open(destpath, &amp;quot;w&amp;quot;, encoding=&amp;quot;utf-8&amp;quot;, errors=&amp;quot;xmlcharrefreplace&amp;quot;) as output_file:
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
output_file.write(out)
return {
&#x27;title&#x27;: title,
&#x27;date&#x27;: date,
&#x27;fpath&#x27;: fpath,
&#x27;destpath&#x27;: destpath,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have what we need to generate a complete index.&lt;/p&gt;
&lt;h3 id=&quot;index-templating&quot;&gt;Index templating&lt;/h3&gt;
&lt;p&gt;Lets start by defining what our index template file will be.&lt;/p&gt;
&lt;p&gt;I&#x27;ll choose &lt;code&gt;index.html.tmpl&lt;/code&gt; and after rendering we will write to &lt;code&gt;index.html&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So lets make a function that will take a list of our post structure above and render it in a &lt;code&gt;&amp;lt;ul&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from string import Template
...
def posts_list_html(posts):
post_tpl = &amp;quot;&amp;quot;&amp;quot;&amp;lt;li&amp;gt;
&amp;lt;a href=&amp;quot;{href}&amp;quot;&amp;gt;{title}&amp;lt;/a&amp;gt;
&amp;lt;time datetime=&amp;quot;{date}&amp;quot;&amp;gt;{disp_date}&amp;lt;/time&amp;gt;
&amp;lt;/li&amp;gt;&amp;quot;&amp;quot;&amp;quot;
out = &#x27;&amp;lt;ul class=&amp;quot;blog-posts-list&amp;quot;&amp;gt;&#x27;
for post in posts:
disp_date = datetime.datetime.fromisoformat(post.get(&#x27;date&#x27;)).strftime(&#x27;%Y-%m-%d&#x27;)
out += post_tpl.format(href=post.get(&#x27;destpath&#x27;),
title=post.get(&#x27;title&#x27;),
date=post.get(&#x27;date&#x27;),
disp_date=disp_date)
return out + &#x27;&amp;lt;/ul&amp;gt;&#x27;
def render_index(posts):
fname = &#x27;index.html.tmpl&#x27;
outname = &#x27;index.html&#x27;
with open(fname, &#x27;r&#x27;, encoding=&#x27;utf-8&#x27;) as inf:
tmpl = Template(inf.read())
posts_html = posts_html(posts)
html = tmpl.substitute(posts=posts_html)
with open(outname, &#x27;w&#x27;, encoding=&#x27;utf-8&#x27;) as outf:
outf.write(html)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Make sure that &lt;code&gt;index.html.tmpl&lt;/code&gt; contains a template variable for &lt;code&gt;${posts}&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; grep -C2 &#x27;\${posts}&#x27; ./index.html.tmpl
&amp;lt;div class=&amp;quot;col-md-8 col-sm-12&amp;quot;&amp;gt;
&amp;lt;p&amp;gt;Welcome. Something will go here eventually.&amp;lt;/p&amp;gt;
${posts}
&amp;lt;/div&amp;gt;
&amp;lt;div class=&amp;quot;col-md-4 col-sm-12&amp;quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And we now need to connect &lt;code&gt;render_posts()&lt;/code&gt; which returns each post that was processed to &lt;code&gt;render_index()&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_posts():
files = glob.glob(&#x27;posts/*.md&#x27;)
logging.info(&#x27;found post files %s&#x27;, files)
posts = []
for fname in files:
p = render_post(fname)
posts.append(p)
logging.info(&#x27;rendered post: %s&#x27;, p)
return posts
if __name__ == &#x27;__main__&#x27;:
posts = render_posts()
logging.info(&#x27;rendered posts: %s&#x27;, posts)
render_index(posts)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And lets run it!&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; python3 ./main.py
INFO:root:found post files [&#x27;posts/a_new_post.md&#x27;, &#x27;posts/build_a_blog.md&#x27;]
INFO:root:opening posts/a_new_post.md for parsing, dest posts/a_new_post.html
INFO:root:reading posts/a_new_post.md
INFO:root:parsing posts/a_new_post.md
INFO:root:writing to posts/a_new_post.html
INFO:root:rendered post: {&#x27;title&#x27;: &#x27;A new post&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T15:09:26-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/a_new_post.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/a_new_post.html&#x27;}
INFO:root:opening posts/build_a_blog.md for parsing, dest posts/build_a_blog.html
INFO:root:reading posts/build_a_blog.md
INFO:root:parsing posts/build_a_blog.md
INFO:root:writing to posts/build_a_blog.html
INFO:root:rendered post: {&#x27;title&#x27;: &#x27;Build-a-blog&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T14:46:36-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/build_a_blog.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/build_a_blog.html&#x27;}
INFO:root:rendered posts: [{&#x27;title&#x27;: &#x27;A new post&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T15:09:26-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/a_new_post.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/a_new_post.html&#x27;}, {&#x27;title&#x27;: &#x27;Build-a-blog&#x27;, &#x27;date&#x27;: &#x27;2024-06-17T14:46:36-04:00&#x27;, &#x27;fpath&#x27;: &#x27;posts/build_a_blog.md&#x27;, &#x27;destpath&#x27;: &#x27;posts/build_a_blog.html&#x27;}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And check how the output looks:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; grep -C4 &#x27;blog-posts-list&#x27; ./index.html
&amp;lt;/nav&amp;gt;
&amp;lt;section class=&amp;quot;container&amp;quot;&amp;gt;
&amp;lt;div class=&amp;quot;row&amp;quot;&amp;gt;
&amp;lt;div class=&amp;quot;col-md-8 col-sm-12&amp;quot;&amp;gt;
&amp;lt;ul class=&amp;quot;blog-posts-list&amp;quot;&amp;gt;&amp;lt;li&amp;gt;
&amp;lt;a href=&amp;quot;posts/a_new_post.html&amp;quot;&amp;gt;A new post&amp;lt;/a&amp;gt;
&amp;lt;time datetime=&amp;quot;2024-06-17T19:48:17-04:00&amp;quot;&amp;gt;2024-06-17&amp;lt;/time&amp;gt;
&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;
&amp;lt;a href=&amp;quot;posts/build_a_blog.html&amp;quot;&amp;gt;Build-a-blog&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not bad!&lt;/p&gt;
&lt;h3 id=&quot;post-templating&quot;&gt;Post templating&lt;/h3&gt;
&lt;p&gt;I think I want my blog to just maintain the overall layout from the index page and just render the post body where the main post list is.&lt;/p&gt;
&lt;p&gt;So lets make that template rendering a bit more general.&lt;/p&gt;
&lt;p&gt;We&#x27;ll redefine the content area template variable to replace as &lt;code&gt;${content}&lt;/code&gt; too.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_template(tpl_fname, out_fname, content_html):
with open(tpl_fname, &#x27;r&#x27;, encoding=&#x27;utf-8&#x27;) as inf:
tmpl = Template(inf.read())
html = tmpl.substitute(content=content_html)
with open(out_fname, &#x27;w&#x27;, encoding=&#x27;utf-8&#x27;) as outf:
outf.write(html)
def render_index(posts):
content_html = posts_list_html(posts)
render_template(&#x27;index.html.tmpl&#x27;, &#x27;index.html&#x27;, content_html)
outf.write(out)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now adjust where posts are written out.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
out = markdown.markdown(&#x27;# &#x27; + title) + out
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
render_template(&#x27;index.html.tmpl&#x27;, destpath, html)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After running you should see the each &lt;code&gt;post/*.html&lt;/code&gt; file where each post file uses the full index template and includes each generated post HTML.&lt;/p&gt;
&lt;h3 id=&quot;post-sorting&quot;&gt;Post sorting&lt;/h3&gt;
&lt;p&gt;With everything wired up now we just need to sort the posts lists by the date metadata.&lt;/p&gt;
&lt;p&gt;Lets do a bit of python repl sort testing because I never remember &lt;code&gt;datetime&lt;/code&gt; usage.&lt;/p&gt;
&lt;p&gt;Lets generate a few nicely formatted ISO date strings for testing.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt; date -d&#x27;2023-01-01&#x27; -Is
2023-01-01T00:00:00-05:00
date -Is
2024-06-17T16:30:35-04:00
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And make a test array&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&amp;gt;&amp;gt;&amp;gt; posts = [{&#x27;date&#x27;: &#x27;2023-01-01T00:00:00-05:00&#x27;}, {&#x27;date&#x27;: &#x27;2024-06-17T16:30:35-04:00&#x27;}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With our current script, the older post would be listed first. So lets try a sort.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Double checking datetime parsing
&amp;gt;&amp;gt;&amp;gt; import datetime
&amp;gt;&amp;gt;&amp;gt; newer = datetime.datetime.fromisoformat(&#x27;2024-06-17T16:30:35-04:00&#x27;)
datetime.datetime(2024, 6, 17, 16, 30, 35, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
&amp;gt;&amp;gt;&amp;gt; older = datetime.datetime.fromisoformat(&#x27;2024-06-17T16:30:35-04:00&#x27;)
datetime.datetime(2024, 6, 17, 16, 30, 35, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000)))
# Checking python sorting methods work as expected
&amp;gt;&amp;gt;&amp;gt; newer.__gt__(older)
True
&amp;gt;&amp;gt;&amp;gt; newer.__lt__(older)
False
&amp;gt;&amp;gt;&amp;gt; older.__gt__(newer)
False
&amp;gt;&amp;gt;&amp;gt; older.__lt__(newer)
True
# Doing the sort
&amp;gt;&amp;gt;&amp;gt; sorted(posts, key=lambda x: datetime.datetime.fromisoformat(x[&#x27;date&#x27;]), reverse=True)
[{&#x27;date&#x27;: &#x27;2024-06-17T16:30:35-04:00&#x27;}, {&#x27;date&#x27;: &#x27;2023-01-01T00:00:00-05:00&#x27;}]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets apply this to our posts.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;if __name__ == &#x27;__main__&#x27;:
posts = render_posts()
logging.info(&#x27;rendered posts: %s&#x27;, posts)
sorted_posts = sorted(posts,
key=lambda p: datetime.datetime.fromisoformat(p[&#x27;date&#x27;]), reverse=True)
render_index(sorted_posts)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;title-templating&quot;&gt;&lt;code&gt;&amp;lt;title /&amp;gt;&lt;/code&gt; Templating&lt;/h3&gt;
&lt;p&gt;The last bit of templating is to make each post &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; different.&lt;/p&gt;
&lt;p&gt;I&#x27;ll try something like &lt;code&gt;&amp;lt;title&amp;gt;cfebs.com - ${title}&amp;lt;/title&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So &lt;code&gt;index.html.tmpl&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;title&amp;gt;cfebs.com${more_title}&amp;lt;/title&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And where we&#x27;re using the title template &lt;code&gt;more_title&lt;/code&gt; will default to empty string.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_index(posts):
content_html = posts_list_html(posts)
render_template(&#x27;index.html.tmpl&#x27;, &#x27;index.html&#x27;, {&#x27;content&#x27;: content_html, &#x27;more_title&#x27;: &#x27;&#x27;})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But for a post:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_post(fpath):
...
title = md.Meta.get(&#x27;title&#x27;)[0]
date = md.Meta.get(&#x27;date&#x27;)[0]
out = markdown.markdown(&#x27;# &#x27; + title) + out
logging.info(&amp;quot;writing to %s&amp;quot;, destpath)
render_template(&#x27;index.html.tmpl&#x27;, destpath, {&#x27;content&#x27;: out, &#x27;more_title&#x27;: &#x27; - &#x27; + title})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point we have functioning blog post generation with templating.&lt;/p&gt;
&lt;h2 id=&quot;rss&quot;&gt;RSS&lt;/h2&gt;
&lt;p&gt;This should be pretty easy as RSS is just reformatting our blog index list into different XML.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;render_template&lt;/code&gt; function will be useful here with a few more tweaks. So I&#x27;ll make another template file (based off a reference &lt;a href=&quot;https://drewdevault.com/blog/index.xml&quot;&gt;https://drewdevault.com/blog/index.xml&lt;/a&gt;)&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt;# Grab the reference
curl -sL &#x27;https://drewdevault.com/blog/index.xml&#x27; &amp;gt; index.xml.example
# After a bit of editing
cat ./index.xml.tmpl
&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot; standalone=&amp;quot;yes&amp;quot;?&amp;gt;
&amp;lt;rss version=&amp;quot;2.0&amp;quot; xmlns:atom=&amp;quot;http://www.w3.org/2005/Atom&amp;quot;&amp;gt;
&amp;lt;channel&amp;gt;
&amp;lt;title&amp;gt;${site_title}&amp;lt;/title&amp;gt;
&amp;lt;link&amp;gt;${site_link}&amp;lt;/link&amp;gt;
&amp;lt;description&amp;gt;${description}&amp;lt;/description&amp;gt;
&amp;lt;language&amp;gt;en&amp;lt;/language&amp;gt;
&amp;lt;lastBuildDate&amp;gt;${last_build_date}&amp;lt;/lastBuildDate&amp;gt;
&amp;lt;atom:link href=&amp;quot;${self_full_link}&amp;quot; rel=&amp;quot;self&amp;quot; type=&amp;quot;application/rss+xml&amp;quot; /&amp;gt;
${items}
&amp;lt;/channel&amp;gt;
&amp;lt;/rss&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;render_template&lt;/code&gt; now gets even more generic and passes a &lt;code&gt;dict&lt;/code&gt; to &lt;code&gt;Template.substitute()&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_template(tpl_fname, out_fname, subs):
with open(tpl_fname, &#x27;r&#x27;, encoding=&#x27;utf-8&#x27;) as inf:
tmpl = Template(inf.read())
out = tmpl.substitute(subs)
with open(out_fname, &#x27;w&#x27;, encoding=&#x27;utf-8&#x27;) as outf:
outf.write(out)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And make sure to adjust any usages of &lt;code&gt;render_template&lt;/code&gt; that exist.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def render_index(posts):
content_html = posts_list_html(posts)
render_template(&#x27;index.html.tmpl&#x27;, &#x27;index.html&#x27;, {&#x27;content&#x27;: content_html})
def render_post(fname):
...
render_template(&#x27;index.html.tmpl&#x27;, destpath, {&#x27;content&#x27;: out, &#x27;more_title&#x27;: &#x27; - &#x27; + title})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now we can hack away at RSS generation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def render_rss_index(posts):
subs = {
&#x27;site_title&#x27;: &#x27;cfebs.com&#x27;,
&#x27;site_link&#x27;: &#x27;https://cfebs.com&#x27;,
&#x27;self_full_link&#x27;: &#x27;https://cfebs.com/index.xml&#x27;,
&#x27;description&#x27;: &#x27;Recent content from cfebs.com&#x27;,
&#x27;last_build_date&#x27;: &#x27;TODO&#x27;,
&#x27;items&#x27;: &#x27;TODO&#x27;,
}
render_template(&#x27;index.xml.tmpl&#x27;, &#x27;index.xml&#x27;, subs)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After this initial test and a &lt;code&gt;python3 ./main.py&lt;/code&gt; run, we should see xml filled out.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; cat ./index.xml
&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot; standalone=&amp;quot;yes&amp;quot;?&amp;gt;
&amp;lt;rss version=&amp;quot;2.0&amp;quot; xmlns:atom=&amp;quot;http://www.w3.org/2005/Atom&amp;quot;&amp;gt;
&amp;lt;channel&amp;gt;
&amp;lt;title&amp;gt;cfebs.com&amp;lt;/title&amp;gt;
&amp;lt;link&amp;gt;https://cfebs.com&amp;lt;/link&amp;gt;
&amp;lt;description&amp;gt;Recent content from cfebs.com&amp;lt;/description&amp;gt;
&amp;lt;language&amp;gt;en&amp;lt;/language&amp;gt;
&amp;lt;lastBuildDate&amp;gt;TODO&amp;lt;/lastBuildDate&amp;gt;
&amp;lt;atom:link href=&amp;quot;https://cfebs.com/index.xml&amp;quot; rel=&amp;quot;self&amp;quot; type=&amp;quot;application/rss+xml&amp;quot; /&amp;gt;
TODO
&amp;lt;/channel&amp;gt;
&amp;lt;/rss&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets finish up by generating each item entry and collecting them to be replaced in the template.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def rss_post_xml(post):
tpl = &amp;quot;&amp;quot;&amp;quot;
&amp;lt;item&amp;gt;
&amp;lt;title&amp;gt;{title}&amp;lt;/title&amp;gt;
&amp;lt;link&amp;gt;{link}&amp;lt;/link&amp;gt;
&amp;lt;pubDate&amp;gt;{pubdate}&amp;lt;/pubDate&amp;gt;
&amp;lt;guid&amp;gt;{link}&amp;lt;/guid&amp;gt;
&amp;lt;description&amp;gt;{description}&amp;lt;/description&amp;gt;
&amp;lt;/item&amp;gt;
&amp;quot;&amp;quot;&amp;quot;
with open(post[&#x27;fpath&#x27;], &#x27;r&#x27;) as inf:
text = inf.read()
md = markdown.Markdown(extensions=[&#x27;extra&#x27;, &#x27;meta&#x27;])
converted = md.convert(text)
link = &amp;quot;https://cfebs.com/&amp;quot; + post[&#x27;destpath&#x27;]
pubdate = email.utils.format_datetime(datetime.datetime.fromisoformat(post[&#x27;date&#x27;]))
subs = dict(title=post[&#x27;title&#x27;], link=link,
pubdate=pubdate,
description=converted)
for k,v in subs.items():
subs[k] = html.escape(v)
return tpl.format(**subs)
def render_rss_index(posts):
items = &#x27;&#x27;
for post in posts[:5]:
items += rss_post_xml(post)
subs = {
&#x27;site_title&#x27;: &#x27;cfebs.com&#x27;,
&#x27;site_link&#x27;: &#x27;https://cfebs.com&#x27;,
&#x27;self_full_link&#x27;: &#x27;https://cfebs.com/index.xml&#x27;,
&#x27;description&#x27;: &#x27;Recent content from cfebs.com&#x27;,
&#x27;last_build_date&#x27;: email.utils.format_datetime(datetime.datetime.now()),
}
for k,v in subs.items():
subs[k] = html.escape(v)
subs[&#x27;items&#x27;] = items
render_template(&#x27;index.xml.tmpl&#x27;, &#x27;index.xml&#x27;, subs)
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Need to use &lt;code&gt;html.escape&lt;/code&gt; anywhere we could have quotes or HTML tags in output.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;posts[:5]&lt;/code&gt; should always take the most recent 5 posts to add to the RSS feed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;/h2&gt;
&lt;p&gt;Reached the end of the afternoon, so this is where I&#x27;ll leave it.&lt;/p&gt;
&lt;p&gt;It&#x27;s not great software.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No tests, no docs&lt;/li&gt;
&lt;li&gt;Hard coding values like the domain&lt;/li&gt;
&lt;li&gt;Using adhoc dicts for generic structures&lt;/li&gt;
&lt;li&gt;Relies on system python version and packages.&lt;/li&gt;
&lt;li&gt;Does not offer anything a tool like &lt;a href=&quot;https://gohugo.io/&quot;&gt;hugo&lt;/a&gt; does not already offer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But, it&#x27;s ~150 lines of python with 1 external dependency.&lt;/p&gt;
&lt;p&gt;If python or &lt;code&gt;python-markdown&lt;/code&gt; drastically changes, it&#x27;ll probably take 10 minutes to debug.&lt;/p&gt;
&lt;p&gt;And - it was fun to write and write about.&lt;/p&gt;
&lt;p&gt;View the complete source for generating this blog:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/main.py&quot;&gt;main.py&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/index.html.tmpl&quot;&gt;index.html.tmpl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree/main/item/index.xml.tmpl&quot;&gt;index.xml.tmpl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Or the full repo tree: &lt;a href=&quot;https://git.sr.ht/~cfebs/cfebs.srht.site/tree&quot;&gt;https://git.sr.ht/~cfebs/cfebs.srht.site/tree&lt;/a&gt;&lt;/p&gt;</description>
</item>
</channel>
</rss>

View file

@ -3,9 +3,17 @@ Date: 2024-06-17T14:46:36-04:00
---
I want to share my thought process for how to go about building a static blog generator from scratch.
The goal is to take 1 afternoon + caffeine + some DIY spirit → _something_ resembling a static site/blog generator.
There will be nothing ground breaking here - in fact this software will not be good. So turn back now if you're expecting the new [Hugo][hugo].
Lets see how hard this will be. Here's what a blog is/requirements:
Actually you should probably stop reading and just use [Hugo][Hugo].
In case you are still interested, the goal is to take 1 afternoon + caffeine + some DIY spirit → _something_ resembling a static site/blog generator.
And I hope by the end of this post you might be inspired to build your own generation scripts, maybe in a new language you always wanted to try.
Lets see how hard this will be.
Here are the requirements for this blog:
* Generate an index with recent list of posts.
* Generate each individual post written in markdown -> html
@ -23,10 +31,12 @@ So there is 1 "exotic" feature in parsing/rendering Markdown as HTML that will n
The rest is just file and string manipulation.
Most scripting languages would be fine tools for this task. But how to handle Markdown?
Lets get it on.
## Picking the tool for the job
Most scripting languages would be fine tools for this task. But how to handle Markdown?
I've had [Crystal][1] in the back of my mind for this task. It is a nice general purpose language that included Markdown in the stdlib! But unfortunately Markdown was removed in [0.31.0][2]. Other than that, I'm not sure any other languages include a well rounded Markdown implementation out of the box.
I'll likely end up building the site in docker with an alpine image down the road, so just a quick search in alpines repos to see what could be useful:
@ -645,3 +655,4 @@ Or the full repo tree: <https://git.sr.ht/~cfebs/cfebs.srht.site/tree>
[4]: https://python-markdown.github.io/
[5]: https://archlinux.org/packages/extra/any/python-markdown/
[hugo]: https://gohugo.io/
[jekyll]: https://gohugo.io/