Take raw JSON output from the Notion API and transform it into properties that can be used by your website’s pages and components.
This is part of a four-part series on showing the potential of Notion as a CMS for complex websites:
In the previous post, we retrieved page and block content from a Notion database and wrote the raw result to a local JSON file. We want to be able to map this content to components in a website.
Ideally, that content would be in a format where the website and its components don't have to do much work. They can receive properties and use them directly. That's what we want to build here, so that you could apply the process to whatever framework you choose to use.
To be able to use this content cleanly in the website code, we transform the raw output from Notion's API into something more workable. This means:
Let's start with a bit of setup.
I'm assuming the following:
children
properties, and have written the results to a JSON file. (If not, go through the previous post.)We're going to install highlight.js to help us handle the syntax highlighting, as an example of advanced content transformations.
npm install highlight.js
We'll be able to handle all other transformations with basic JavaScript code.
This script gets fairly complex pretty quickly, so we're going to step through it one section at a time. View the full script in this example project.
First, let's add our dependencies and make sure that the file you're looking for exists. If it doesn't, exit the script and print an appropriate message.
And we'll also load the export file into a pages
constant.
const fs = require("fs");
const path = require("path");
const hljs = require("highlight.js/lib/common");
const outputFile = path.join(__dirname, "notion-export.json");
if (!fs.existsSync(outputFile)) {
console.error(`File not found: ${outputFile}`);
process.exit(1);
}
const pages = JSON.parse(fs.readFileSync(outputFile));
Be sure to set outputFile
to the appropriate path pointing to the Notion export file.
We're going to use a blockMap
object which will handle transforming raw output from the Notion export into a component-ready object.
This work by using the keys with the blockMap
object as the type
property from the exported block data, while the value will be a function that will accept the raw block data and transform it.
To being, we'll just setup the structure on which we can expand.
// ...
const pages = JSON.parse(fs.readFileSync(outputFile));
const blockMap = {};
function transformBlocks(blocks) {
return blocks
.map((block) => {
if (blockMap[block.type]) {
return blockMap[block.type](block);
}
console.log("NOT SUPPORTED:", block.type);
})
.filter(Boolean);
}
let output = pages.map((page) => {
const { properties, children, id } = page;
return {
id,
title: properties.Name.title[0].text.content,
urlPath: properties.Slug.rich_text[0].plain_text,
blocks: transformBlocks(children),
};
});
const transformedOutput = path.join(__dirname, "content.json");
fs.writeFileSync(transformedOutput, JSON.stringify(output, null, 2));
console.log(`Transformed ${output.length} pages to ${transformedOutput}`);
Notice that this did the following:
pages
(raw Notion export) to set the a few properties for the page — id
, title
, and urlPath
. These are essentially pass-through values from the raw export of the page.blocks
on the page to the result of transformBlocks
, passing in the raw child block content.transformBlocks
uses the blockMap
object to map block.type
to a function expected to transform the block. If it doesn't find the function, it notifies the console, and returns undefined
. Thus, the block is skipped on the page.content.json
in the same directory as the script, and prints how many pages were transformed.Running this now would result in a number of "NOT SUPPORTED" notes, like this:
NOT SUPPORTED: paragraph
NOT SUPPORTED: paragraph
NOT SUPPORTED: paragraph
NOT SUPPORTED: callout
NOT SUPPORTED: paragraph
NOT SUPPORTED: callout
But the content.json
file should still be created. It just won't have any blocks
data. It should look something like this:
[
{
"id": "...",
"title": "Plain Text Page",
"urlPath": "plain-text-page",
"blocks": []
},
{
"id": "...",
"title": "Page with Components",
"urlPath": "page-with-components",
"blocks": []
}
]
Let's add our first transformer. We'll start simple and transform paragraph
blocks.
const blockMap = {
paragraph: (block) => {
if (block.paragraph.rich_text.length === 0) return null;
return {
component: "Paragraph",
text: block.paragraph.rich_text
.map(({ plain_text }) => plain_text)
.join(""),
};
},
};
Here we're returning an object with two properties:
component
: Set to Paragraph
text
: Set to the plain text from the content in NotionThis omits any style added within the paragraph. If you were taking this script to production, you'd want to introspect the raw export and support things like bold, code, and emojis within a paragraph. You'd also probably want to consider children, which we're not doing in any of these functions.
If you have paragraph content, you should see the paragraph warnings go away and paragraph block content should appear in the transformed output file after running the script again.
[
{
"id": "...",
"title": "Plain Text Page",
"urlPath": "plain-text-page",
"blocks": [
{
"component": "Paragraph",
"text": "This page doesn’t have any components. It’s just a bunch of text."
},
{
"component": "Paragraph",
"text": "Nulla commodo dolore culpa aute Lorem nostrud dolore ullamco consequat aliquip consequat. Nostrud veniam ex proident magna ipsum. Mollit deserunt fugiat ad sunt ipsum occaecat dolor cupidatat nisi. Eiusmod exercitation laborum nisi exercitation est eiusmod nisi esse. Qui incididunt exercitation incididunt sint nisi mollit ullamco pariatur laboris enim Lorem. Excepteur eu eu quis ea qui amet ea consectetur occaecat eiusmod est incididunt."
}
// ...
]
}
// ...
]
At this point, we've covered the basic foundation. You may have enough to take and run with for your particular use case.
Now we're going quite a bit deeper. We're going to show how we can create an advanced mapping system using Notion callouts.
First, add a callout
property to the blockMap
which calls a resolveCalloutComponent(block)
function to resolve the callout block. (Callouts are going to support multiple components, as we'll see shortly.)
const blockMap = {
paragraph: (block) => {
if (block.paragraph.rich_text.length === 0) return null;
return {
component: "Paragraph",
text: block.paragraph.rich_text
.map(({ plain_text }) => plain_text)
.join(""),
};
},
callout: (block) => {
const result = resolveCalloutComponent(block);
if (!result) return null;
return result;
},
};
Next, let's add the resolveCalloutComponent
function, which is going to use another map — calloutMap
— to further transform callouts based on the icon used in the callout. For this example, and to keep things simple, we're going to Notion's provided SVG icons and not the universal emojis.
But at first, like we did above, we'll start with placeholders and show that no callout icons are supported.
const calloutMap = {};
function resolveCalloutComponent(block) {
if (block.callout?.icon?.type !== "external") {
console.log("ICON NOT SUPPORTED:", block.callout.icon);
return null;
}
const iconName = block.callout.icon.external.url
.split("/")
.pop()
.split(".")[0]
.split("_")[0];
if (!calloutMap[iconName]) {
console.log("EXTERNAL ICON NOT SUPPORTED:", iconName);
return null;
}
return calloutMap[iconName](block);
}
Before we go further, let's look at source content that we're going to account for — tabs and code blocks.
The tabs callout uses a tabs icon, where the content in the primary callout doesn't do anything. It then contains child blocks which must be callouts, but their icon doesn't matter.
The code block callout uses the code icon. The main content is the filename of the example, and it should contain a child code block with the code to display, and the appropriate language selected.
With these two examples in place in Notion, you can run the script again. At this point, the output won't be any different, but you'll see any updated error messages pointed to the console.
EXTERNAL ICON NOT SUPPORTED: tabs
EXTERNAL ICON NOT SUPPORTED: code
Let's start with the tabs. First, we'll add a new function called resolveTabs
that is responsible for mapping over child blocks within the tabs callout and creating a tab component for each child that it finds. The individual tabs will have label
and text
properties.
And then we can add the tabs
property to the calloutMap
.
function resolveTabs(tabsBlocks) {
return tabsBlocks
.map((tab) => {
if (tab.type !== "callout") return null;
return {
label: tab.callout.rich_text
.map(({ plain_text }) => plain_text)
.join(""),
text: tab.children[0].paragraph.rich_text
.map(({ plain_text }) => plain_text)
.join(""),
};
})
.filter(Boolean);
}
const calloutMap = {
tabs: (block) => {
return {
component: "Tabs",
tabs: resolveTabs(block.children),
};
},
};
Run the script again and we should see transformed content for the tab component.
[
{
"id": "...",
"title": "Page with Components",
"urlPath": "page-with-components",
"blocks": [
{
"component": "Tabs",
"tabs": [
{
"label": "Tab #1",
"text": "This is content within the tabs."
},
{
"label": "Tab #2",
"text": "Qui ex tempor quis aliquip do veniam ea reprehenderit irure dolore duis. Ut cillum sint enim dolor eiusmod nulla reprehenderit proident tempor voluptate in. Pariatur officia anim et ea. Esse amet cupidatat cillum dolor laborum voluptate pariatur Lorem consequat amet."
}
]
}
// ...
]
}
]
Last, let's handle the code block example. Here we'll do all the work inside the code
property in the calloutMap
. The code block component gets component
, filename
, code
, and language
properties.
const calloutMap = {
tabs: (block) => {
return {
component: "Tabs",
tabs: resolveTabs(block.children),
};
},
code: (block) => {
if (!block.children[0].code) {
console.log("Code component must be first child of code callout");
return null;
}
const code = block.children[0].code.rich_text
.map(({ plain_text }) => plain_text)
.join("");
// YOU MAY NEED A MAPPER FOR THIS
const language = block.children[0].code.language;
const highlightedCode = hljs.highlight(code, { language }).value;
return {
component: "CodeBlock",
filename: block.callout.rich_text
.map(({ plain_text }) => plain_text)
.join(""),
code: highlightedCode,
language: block.children[0].code.language,
};
},
};
Notice there that we make use of highlight.js to process syntax highlighting on the code block. Notion doesn't give us this from the API response. But it's wise to do this work during this step so that we ship less JavaScript to the client. This way all we need on the front end is highlight.js's CSS file, but none of it's JavaScript.
As the comment notes above, the language values coming from Notion do not have one-to-one parity with highlight.js's expected language values. So you may need to build an additional mapper, which I did not do.
The output will look something like this:
[
{
"id": "...",
"title": "Page with Components",
"urlPath": "page-with-components",
"blocks": [
{
"component": "CodeBlock",
"filename": "hello-world.js",
"code": "<span class=\"hljs-keyword\">function</span> <span class=\"hljs-title function_\">helloWorld</span>(<span class=\"hljs-params\"></span>) {\n <span class=\"hljs-variable language_\">console</span>.<span class=\"hljs-title function_\">log</span>(<span class=\"hljs-string\">"Hello World"</span>);\n}",
"language": "javascript"
}
]
]
Here is an example project with the full script.
As you can probably see, this is a fairly big undertaking if you want to take something like this to a production application. And there are parts of it that are certainly brittle.
However, it's extremely powerful, and almost endlessly customizable to your needs.
We'll stop here, as we've already gone very deep, and it's enough to show the foundation that you would need to apply this to your site.
From here, you can take this approach and apply your opinions to build your own system. I'd love to hear from you. What did you build? How does this process hold up when scaling beyond a proof-of-concept to a real-world website?