Context
This section of my website is where I jot down notes that will be useful to my future self and hopefully to others.
For years my notes have been organized into a set of folders that serve as categories that I thought were relevant.
After 5 years and >120 notes, I realized that the categories were somewhat contrived (e.g. cli, linux), and that it was hard to locate a post and the assets were now managed consistently.
Hence, I decided to organize my notes into folders named after the date of the post.
More specifically, I opted for a 3 levels architecture:
- one folder per year;
- for each year one sub-folder per month (when a note has been written);
- one folder per post named after the date of the post plus the 3 first words of the title.
The third level includes index.md
, the the content of the note and optionally an assets
folder for all asset files.
Basically, for a given post, here is the desired organization:
1
2
3
4
5
6
7
8
| notes
└── year
└── month
└── year-month-day_3_first_words
└── assets
└── asset1.ext
└── asset2.ext
└── index.md
|
For a tangible example, let’s take the post ‘My first workflow with GitHub Actions’.
Previously, that the tree file was as follows:
1
2
3
4
5
6
| notes
└── ci
└── githubactions
├── index.md
├── ghactions00.png
└── ghactions01.png
|
Below is the desired tree file:
1
2
3
4
5
6
7
8
| notes
└── 2019
└── 11
└── 2019-11-20_my_first_workflow
├── index.md
└── assets
├── ghactions00.png
└── ghactions01.png
|
Doing this by hand for a couple of notes is OK.
Doing that for 124 notes by hand is pretty boring and error prone.
Thus, I decided to do that with R and here I explain how to proceed.
Generating the new file tree
The first step is to find all note files.
Those file will all end up being renamed index.md
and stored in a different location.
1
2
3
4
5
6
7
8
9
10
| # I create this function knowing I re-use the code later
get_file_list <- function(dir) {
# my notes are all written in markdown, the extension is .md
list.files(dir, pattern = "^[A-Za-z].+\\.md$", recursive = TRUE, full.names = TRUE)
}
lsf <- get_file_list("content/notes")
# [1] "content/notes/apt/lackofbootspace.md"
# [2] "content/notes/atom/avoidopeninglargefile.md"
# ...
# [128] "content/notes/webhosting/caneghpages/index.md"
|
⚠️ That’s more than the 124 notes listed… :thinking_face: Old drafts?
Let’s check, anyway I need to read metadata to create the new file tree.
To do so I call the function yaml_front_matter()
from the rmarkdown
package.
1
2
3
4
5
6
7
8
9
10
| library(rmarkdown)
yfm <- lapply(lsf, yaml_front_matter)
tbl <- yfm |>
lapply(\(x) data.frame(
title = x$title,
date = x$date, # I need this to create the new file tree
draft = ifelse(is.null(x$draft), FALSE, x$draft) # if TRUE this is a draft
)) |>
do.call(what = rbind)
tbl$path <- lsf
|
That gives the answer to my question: currently 4 unfinished notes are located in the section notes
of my website.
1
2
3
4
5
6
| R> tbl[tbl$draft, ]
title date draft path
10 Set up your own webradio 2021-05-02 TRUE content/notes/audio/setupawebradio.md
46 How to get all web fonts from one format [SOLVED] 2019-11-25 TRUE content/notes/font/webfonts.md
53 DNS, IP and Router 2021-06-30 TRUE content/notes/hardware/aboutIp.md
124 Paradox of enrichment with Sage 2019-07-20 TRUE content/notes/sage/poe_sage.md
|
I’ll see what I do with those later.
For now, I move them somewhere else.
Let’s re-execute the code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| lsf <- get_file_list("content/notes")
yfm <- lapply(lsf, \(x) c(yaml_front_matter(x), old_path = x))
yfm
# [[1]]
# [[1]]$title
# [1] "Lack of boot space"
#
# [[1]]$date
# [1] "2023-03-23"
#
# [[1]]$tags
# [1] "Ubuntu" "apt" "boot"
#
# [[1]]$old_path
# [1] "content/notes/apt/lackofbootspace.md"
...
# [[124]]
# [[124]]$title
# [1] "GitHub pages & custom domain name? Don't forget CNAME! `[SOLVED]`"
#
# [[124]]$date
# [1] "2020-02-04"
#
# [[124]]$tags
# [1] "GitHub pages" "CNAME"
#
# [[124]]$old_path
# [1] "content/notes/webhosting/caneghpages/index.md"
|
I have all metadata in a list, including dates and titles.
Note that with this, I can even analyze the tags I use!
What is missing is a tiny function to keep the 3 first words of the title.
1
2
3
4
5
6
7
8
9
| keep_3_words <- function(x) {
tmp <- strsplit(x, "[[:punct:] ]")[[1]] # split words/ remove punctuation
out <- tmp[grepl("^[[:alnum:]]+$", tmp)] # keep only alphanumeric symbols
out[seq(1, min(length(out), 3))] |> # max 3 words
tolower() |>
paste(collapse = "_")
}
# this add the new folder name for the previous lins
yfm2 <- lapply(yfm, \(x) c(x, new_folder = paste(x$date, keep_3_words(x$title), sep = "_")))
|
Next, I create the new directories and move files to their new location.
For simplicity, I use lubridate
to extract years and months.
Also, I use a safety net by generating a second folder notes, notes2
, that will replace notes
when everything is done.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| library(lubridate)
new_dir <- "content/notes2"
dir.create(new_dir, showWarnings = FALSE)
res <- lapply(
yfm2,
\(x) {
new_folder <- file.path(
new_dir,
year(x$date),
sprintf("%02d", month(x$date)),
x$new_folder
) # that's the new path
dir.create(new_folder, recursive = TRUE, showWarnings = FALSE)
new_path <- file.path(
new_folder,
"index.md"
)
file.copy(x$old_path, new_path) # add index.md
invisible(TRUE)
}
)
|
Migrating asset files
This is the trickiest part.
To migrate the asset files, I need to locate them and determine the posts they were used in.
As I did not use assets in a consistent fashion, I need to go through all note contents and look for the assets name.
I need to record this information to replace the links and then migrate the files.
First, let’s locate the file(s), note that for some of the file manipulation I use fs
.
1
2
3
4
5
6
| lsa <- list.files("content/notes", recursive = TRUE, full.names = TRUE)
lss <- lsa[!grepl(pattern = "\\.md$", lsa)]
library(fs)
lss |> fs::path_ext() |> table()
# bib png R webm webp yaml
# 3 44 1 8 1 1
|
58 asset files, noted!
Let’s read all files in notes2
and then match possible links using the filenames above.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| lsb <- lss |> fs::path_file()
ls_res <- list()
for (j in seq(lsb)) {
cat(j, "/", length(lsb), " \r")
# pattern to match (the entire path)
pat <- paste0("\\.?[[:alnum:]]*/[/[:alnum:]]*", lsb[j]) # potential matches; does not match filename only, minimum ./filename; does not work for with ../
fls <- get_file_list("content/notes2")
res <- fls |>
lapply(\(x) {
tmp <- stringr::str_extract(readLines(x), pat)
tmp[!is.na(tmp)]
}) # extract matches
idx <- lapply(res, \(x) length(x) > 0) |> unlist() |> which()
ls_res[[j]] <- data.frame(
file = fls[idx],
asset_path = lss[j],
old_link = res |> unlist(),
new_link = file.path("./assets", lsb[j])
)
}
tb_assets <- ls_res |>
do.call(what = rbind) |>
dplyr::distinct() # I only need one instance given the search and replace strategy that I use below
tb_assets
# file asset_path old_link new_link
# 1 content/notes2/2018/09/2018-09-29_prompt_before_opening/index.md content/notes/atom/assets/limitfilesize.webm /notes/atom/assets/limitfilesize.webm ./assets/limitfilesize.webm
# 2 content/notes2/2018/09/2018-09-11_more_than_one/index.md content/notes/atom/assets/locales.png /notes/atom/assets/locales.png ./assets/locales.png
# 3 content/notes2/2018/09/2018-09-19_i_love_the/index.md content/notes/atom/assets/multicursors.webm /notes/atom/assets/multicursors.webm ./assets/multicursors.webm
...
# 62 content/notes2/2021/01/2021-01-29_time_to_talk/index.md content/notes/webhosting/aboutthiswebsite/look_012021.png ./look_012021.png ./assets/look_012021.png
# 63 content/notes2/2021/01/2021-01-29_time_to_talk/index.md content/notes/webhosting/aboutthiswebsite/shared_on_slack.png ./shared_on_slack.png ./assets/shared_on_slack.png
# 64 content/notes2/2020/02/2020-02-04_github_pages_custom/index.md content/notes/webhosting/caneghpages/cname.png ./cname.png ./assets/cname.png
|
Below is the code that I use to perform the search and replace
1
2
3
4
5
6
7
8
9
10
11
| for (i in seq_len(nrow(tb_assets))) {
cat(i, "/", nrow(tb_assets), " \r")
# create folder and add asset file
new_folder <- file.path(tb_assets$file[i] |> fs::path_dir(), "assets")
if (!dir.exists(new_folder)) dir.create(new_folder)
file.copy(tb_assets$asset_path[i], new_folder)
# replace pattern with gsub
readLines(tb_assets$file[i]) |>
gsub(pattern = tb_assets$old_link[i], replacement = tb_assets$new_link[i]) |>
writeLines(con = tb_assets$file[i])
}
|
That worked pretty well.
The only final tweak was to modify cross references among notes.
Once completed, notes
was deleted and notes2
became notes
.
Even though I deleted and created files, git
was clever enough to figure out which files where renamed.
This took some time but I am happy with the new organization.
Plus I now have code to analyze the content of my notes.
All the work was completed in commit dae779a
.