Blog Migration

TLDR, I moved from WordPress to Jekyll, then to Hexo, with WordPress still working as a CMS.

My experience in the past 2 years

TLDR, I start from WordPress and would like to try other platforms.

I started blog Mar. 2018. Many things happened since and blog has became part of my life. Two years ago when I started I knew nothing about web development, so I chose the easiest way back then: LNMP package and WordPress, staring my blog instantly from scratch within 2 hours.

WordPress is perfect as a content management system. Based on PHP, it's fast, mature and stable. Being popular, there are bunch of plugins with which few coding knowledge is required to fulfill certain tasks, like post archive, table of contents, etc. Then there comes the problem that my WordPress grows larger and larger with all the plugins, usually come along with abundant advertisements, despite on the admin end only.

The true challenge comes when the need of the markdown support shows up.

Being used in Github documents, my homework, my class notes and nearly all my daily documents, markdown is an essential part of my daily workflow. Though, WordPress does not natively support markdown, nor has it got the will to support it officially in the future. In fact WordPress is heading for its own block editor Gutenburg which facilitates content creation for business websites, where money comes from.

During these years, 2 markdown plugins showed up on WordPress: wp-editormd and wp-gihuber-md. Both are open-source plugins written by community. However, both are still, if I may say, in beta stage as problems show up pretty often, be it small as styling problem or backend processing problem. I actually made my contributions to both of them through Github, fixing bugs and adding features, which is fun and counts as my contribution to the open-source community.

Meanwhile, static content blog platform rises which features extensive Markdown support and light weight. They are hard to start for beginners, but with 2 years of web development experience, now that's just a piece of cake for me.

Languages

LanguageCommunityWeb-SpecializedThreads vs. ProcessesNon-blocking I/OEase of Use
PHPProcessesNo
PythonLargeNoProcessesNo
JavaNoThreadsAvailableRequires Callbacks
Node.jsLargeThreadsYesRequires Callbacks
GoSmallThreads (Goroutines)YesNo Callbacks Needed

Reference

Which backend language to learn in 2020? - MXX - Cloud architecture news

Server-side I/O: Node vs. PHP vs. Java vs. Go | Toptal

Platforms

PlatformCommunityLanguageStaticSpeedEasyExtensibilityMarkdown
WordPressLargePHPNo/YesGoodgood
DjangoPythonNo/NoGreat
Django CMSPythonNo/noGood
WagtailPythonNo/yesGood
JekyllLargeRubyYesslowYesGood
HexoNode.jsYesYesGood
HugoGoYesYesGood

For reference only, grading is subjected to personal preference with limited knowledge.

Migration, or co-existence

TLDR, combination makes better: WordPress + Hexo

I like the light weight (you always have your markdown files at hand) and good markdown support of static platforms, easy content management of WordPress.

Thus my decision is to use WordPress as back end and use a static platform as the main server, connected with a program to export my posts from WordPress as labeled Markdown.

Right now I am using Jekyll but the cons has shown, being exponentially longer building time with features adding up. A standard building time being about 1s for 100 posts, jekyll-minify will make it 15 times the building time, kramdown-math-katex will 30 times the building time. With both enabled, the building time is unbearably 450s.

Reason behind this might be the single thread processing method of ruby, which is very clear under Linux - 1 core 100% with 7 cores 0%, literally sitting around watching that poor core. Given the nature of Ruby, Node.js and Go, Hexo and Hugo are promised to be much faster - 8 times faster at most for my 8 thread cpu. Note that the building time is still exponential with number of features, which seems inevitable to me, though the algorithm can be tweaked.

The reason I chose Jekyll was its integration with Github Page and its popularity. But later I found out that to enable unsupported features means I have to render HTML locally, cutting the advantages by half. And the disadvantage of ruby has just shows up.

Thus my next step would be moving towards Hexo for better performance. This step should be easy since all the connecting work between WordPress and Jekyll has been done and can be reused. The only headache will be the new templates for blog archive, which on the other hand is a good opportunity to learn some Node.js.

An optional move would be change the WordPress back end to Wagtail, which is more troublesome than I thought. In fact, I am already half way through at this point - the auto import from WordPress to Wagtail, bottle neck being the implementation of tags and nested categories.

The advantage of Django lies in the use of Python. Until I need some features heavily relying on some python library I guess I will stick to WordPress. After all, Python is good but PHP is still strong. With Roundcube, Nextcloud, phpMyAdmin, etc. still using PHP, guess I won't leave it any soon.

And it's done while I am still writing this blog

I spent 1 day migrating into Hexo making an exactly the same blog as it was in WordPress and Jekyll, and the building time is now under 6s, with everything minified and rendered, including KaTeX, and even Prism code blocks now. Well, it's what Node.js bests at.

Appendix - the connecting script

Since the connecting script needs specific configuration to work, I'll save myself the trouble of making a command-line interface.

Optimized Configuration:

  • WordPress
    • plugin 'WP Githuber MD' or 'WP Editor.md'
  • MySQL
  • python
    • module mysql.connector
    • module wpconfigr

Exported terms in yaml tag:

  • title
  • date
  • author
  • comments
  • categories
  • tags
  • published

If a post is saved without plugin 'WP Githuber MD' or 'WP Editor.md', its HTML version will be exported as the post content.

Start by editing the variables

#!/usr/bin/env python3

# variables
# table prefix default to 'wp_'
table_prefix = 'wp_'
# wp_config.php absolute path
wp_config_path = r'/home/www/default/wp-config.php'
# export markdown file path, make sure it exists
export_path = r'/home/www/'

import mysql.connector
import re
import html
from wpconfigr import WpConfigFile

# connect database
wp_config = WpConfigFile(wp_config_path)
con = mysql.connector.connect(
  host=wp_config.get('DB_HOST'),
  user=wp_config.get('DB_USER'),
  password=wp_config.get('DB_PASSWORD'),
  database=wp_config.get('DB_NAME')
)

wp_posts = table_prefix + 'posts'
wp_terms = table_prefix + 'terms'
wp_term_taxonomy = table_prefix + 'term_taxonomy'
wp_term_relationships = table_prefix + 'term_relationships'
wp_users = table_prefix + 'users'

# posts info
""" 
SELECT ID, post_author, post_date_gmt, post_content, post_title, post_content_filtered, post_type, post_password, post_status, comment_status
FROM wp_posts
WHERE post_type = 'post'
  AND post_title <> 'Auto Draft';
"""
cur = con.cursor()
cur.execute("\
  SELECT ID, post_author, post_date_gmt, post_content, post_title, post_content_filtered, post_type, post_password, post_status, comment_status \
  FROM " + wp_posts + " \
  WHERE post_type = 'post' \
    AND post_title <> 'Auto Draft'; \
")

postsd = {}
columns = tuple( [d[0] for d in cur.description] )
for row in cur:
  postsd[row[0]]=(dict(zip(columns, row)))

# terms info
"""
SELECT wp_terms.name, wp_term_taxonomy.taxonomy
FROM wp_posts
  LEFT OUTER JOIN wp_term_relationships
    ON wp_posts.ID = wp_term_relationships.object_id
  LEFT OUTER JOIN wp_term_taxonomy
    ON wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
  LEFT OUTER JOIN wp_terms
    ON wp_term_taxonomy.term_id = wp_terms.term_id
WHERE ID = '';
"""

# author info
"""
SELECT display_name
  FROM wp_users
  WHERE ID = '';
"""
for ID in postsd:
  cur.execute("\
    SELECT " + wp_terms + ".name, " + wp_term_taxonomy + ".taxonomy \
    FROM " + wp_posts + " \
      LEFT OUTER JOIN " + wp_term_relationships + " \
        ON " + wp_posts + ".ID = " + wp_term_relationships + ".object_id \
      LEFT OUTER JOIN " + wp_term_taxonomy + " \
        ON " + wp_term_relationships + ".term_taxonomy_id = " + wp_term_taxonomy + ".term_taxonomy_id \
      LEFT OUTER JOIN " + wp_terms + " \
        ON " + wp_term_taxonomy + ".term_id = " + wp_terms + ".term_id \
    WHERE ID = " + str(ID) + " \
  ")
  postsd[ID]['categories'] = []
  postsd[ID]['tags'] = []
  for row in cur:
    if row[1] == 'category':
      postsd[ID]['categories'].append(row[0])
    if row[1] == 'post_tag':
      postsd[ID]['tags'].append(row[0])

  cur.execute("\
    SELECT display_name \
    FROM " + wp_users + " \
    WHERE ID = " + str(postsd[ID]['post_author']) + " \
  ")
  for row in cur:
    postsd[ID]['author'] = row[0]

def make_title_path_valid(_str):
  _str = re.sub(r'[\/]', '-', _str)
  _str = re.sub(r'[\\]', '-', _str)
  _str = re.sub(r'[\"]', '-', _str)
  _str = re.sub(r'[\:]', '-', _str)
  _str = re.sub(r'[\*]', '-', _str)
  _str = re.sub(r'[\?]', '-', _str)
  _str = re.sub(r'[\<]', '-', _str)
  _str = re.sub(r'[\>]', '-', _str)
  _str = re.sub(r'[\|]', '-', _str)
  _str = re.sub(r'[\s]', '-', _str)
  _str = _str.lower()
  return _str

def make_title_md_valid(_str):
  _str = r'"' + _str + r'"'
  return _str

for ID in postsd:
  file_name = postsd[ID]['post_date_gmt'].strftime("%Y-%m-%d-") + make_title_path_valid(postsd[ID]['post_title']) + '.md'
  file_path = export_path + file_name
  with open(file_path, 'w', encoding='utf-8', errors='ignore') as md_file:
    file_content = '---\n'
    file_content = file_content + 'layout: post\n'
    file_content = file_content + 'title: ' + make_title_md_valid(postsd[ID]['post_title']) + '\n'
    file_content = file_content + 'date: ' + postsd[ID]['post_date_gmt'].strftime("%Y-%m-%d %H:%M") + '\n'
    file_content = file_content + 'author: ' + postsd[ID]['author'] + '\n'
    file_content = file_content + 'comments: ' + ('true' if postsd[ID]['comment_status'] == 'open' else 'false') + '\n'
    file_content = file_content + 'categories: ' + str(postsd[ID]['categories']) + '\n'
    file_content = file_content + 'tags: ' + str(postsd[ID]['tags']) + '\n'
    file_content = file_content + 'published: ' + ('true' if (postsd[ID]['post_status'] == 'publish') and (postsd[ID]['post_password'] == '') else 'false') + '\n'
    file_content = file_content + '---\n'
    if postsd[ID]['post_content_filtered'] != '':
      file_content = file_content + html.unescape(postsd[ID]['post_content_filtered']) + '\n'
    else:
      file_content = file_content + html.unescape(postsd[ID]['post_content']) + '\n'
    md_file.write(file_content)

con.close()