In this post, the goal will be to read through the syntax definition, and start thinking about how we will translate it into code.
Let us digest this content together.
Note: This is the first post with code and code blocks. I kept it very simple for now, but I plan on adding syntax highlighting and some other tweaks to the code blocks.
Edit: April 15th, 2020 – Well, I just added basic syntax highlight and line numbers as a side effect of moving to Hugo, apparently. Yay!
Elements types
From the get-go, you will notice that Markdown supports two major types of elements:
- Block elements, elements that can, but do not have to, span multiple lines and that own their lines of the document (meaning that you cannot have two block elements sharing the same line)
- Span elements, elements that you can add anywhere inside a block element, and that will render inline with the rest of the content.
Block elements
Paragraphs
As per John Gruber’s specification:
A paragraph is simply one or more consecutive lines of text, separated by one or more blank lines. (A blank line is any line that looks like a blank line — a line containing nothing but spaces or tabs is considered blank.) Normal paragraphs should not be indented with spaces or tabs.
So right there, we have the definition of two of what will become symbols, or nodes, once our parser has identified their syntax:
- Blank lines, an empty line, or a line containing only spaces or tabs.
- Paragraphs, unindented consecutive text.
With that information, we can already make two decisions:
- We will treat blank lines as separators. They will split the document content into chunks that the parser will parse separately.
- By default, the parser will assume that a block is a paragraph, unless detected otherwise.
If we had to represent a blank line as a regex, we would probably use something like this:
/^[\t ]*$/;
To make sure that our regex is correct, we will use the basic following HTML code, put that on a page, and open it with our browser. If you are lazy (and as a good programmer, you should be), you can open this page.
|
|
If everything goes well with our regex, we should get the following on our page:
Found 6 lines!
0: This line is blank!
1: This line is blank!
2: This line has some content!
3: This line is blank!
4: This line is blank!
5: This line has some content!
Yay! We can now detect blank lines, the boundaries of each paragraph. Now onto our second task: extracting paragraphs.
We will update some of the javascript code like so:
|
|
Let us run the code again (you can follow this link to do that), and make sure that everything works like we expect. Our code should output the following lines:
Paragraph 1: Some content
Paragraph 2: Some other content
Perfect! Well, before we get ahead of ourselves, let us update our test content
and make sure that the paragraphs detection works correctly. We will add a few
lines to update the content
variable with the following:
|
|
Once again, if everything goes well, we will extract four paragraphs out of our test content. But you might already see a potential problem with our code. And you would be right. Let us run the code. This is what we get:
Paragraph 1: Some content
Paragraph 2: Some other content
Paragraph 3: A multiline paragraph
Paragraph 4: with a bit of content
Paragraph 5: A paragraph with a
Paragraph 6: line break
We said earlier that, according to the syntax, blank lines separate the
paragraphs. But right now, we use the newline character (\n
) as a separator.
Let us change that.
We will first need to store the lines of the current paragraph into a variable (a buffer), and process these when we encounter a new blank line. We will also implement the “hard-wrapping” feature:
When you do want to insert a
<br/>
break tag using Markdown, you end a line with two or more spaces, then type return.
To do that, we can change the javascript code with the following (or follow this link):
|
|
Now we get a promising result:
Paragraph 1: Some content
Paragraph 2: Some other content
Paragraph 3: A multiline paragraph with a bit of content
Paragraph 4: A paragraph with a
line break
This has been a long post, but here is the reward: we now can render paragraphs! You just made your first, very incomplete, Markdown parser and renderer.
To render your paragraphs, add the following code to your page (lazy-link):
|
|
|
|
And we are done! You can use the last “lazy link”, download the code (with the “Save as” menu of your browser) and you will get the code we have written in this post in its entirety. Play with it, tweak it or break it.
If you do, be sure to ping me on Twitter and show me your work.
In the next article of this series, we will tackle Headers.