, 5 min read
Pagefind: Searching in Static Sites
Original post is here eklausmeier.goip.de/blog/2023/10-23-pagefind-searching-in-static-sites.
Pagefind is a JavaScript library, which you add to your static site. By that you then have complete search-functionality. Pagefind has the following advantages over other JavaScript libraries:
- Easy to install, no JavaScript dependency hell.
- Easy to add the CSS and the two lines with
<script>
tag. - Creating the index is easy and reasonable quick.
Pagefind was mainly written by Liam Bigelow from New Zealand and is promoted by CloudCannon. It is open source. It is written in Rust and JavaScript.
Language | kLOC | #files |
---|---|---|
Rust | 36 | 63 |
JavaScript | 2 | 20 |
1. One-time installation. Installing Pagefind is just downloading a single binary from GitHub: select the proper binary for Apple, Linux, or Windows. In my case I used pagefind-v1.0.3-x86_64-unknown-linux-musl.tar.gz
for Arch Linux. Unpack with
tar zxf pagefind-v1.0.3-x86_64-unknown-linux-musl.tar.gz
Unpacking the 10 MB archive will create a 22 MB exectuable, which is statically linked and therefore has no dependencies. That's it.
2. Add CSS and JavaScript to template. Add below CSS and JavaScript reference to your template file outside of <body>
:
<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<script>
window.addEventListener('DOMContentLoaded', (event) => {
new PagefindUI({ element: "#search", showSubResults: true });
});
</script>
Then add the actual search dialog in your template inside <body>
, in my case to top-layout.php
:
<div id="search"></div>
3. Creating index files. This step must repeated whenever you have new content, or rename files. It does not need to be repeated whenever you regenerate your static HTML files. Altough if you want to play safe, you can do just that. Index creation is using the above mentioned executable pagefind
. Running this command shows all the options:
$ pagefind -h
Implement search on any static website.
Usage: pagefind [OPTIONS]
Options:
-s, --site <SITE>
The location of your built static website
--output-subdir <OUTPUT_SUBDIR>
Where to output the search bundle, relative to the processed site
--output-path <OUTPUT_PATH>
Where to output the search bundle, relative to the working directory of the command
--root-selector <ROOT_SELECTOR>
The element Pagefind should treat as the root of the document. Usually you will want to use the data-pagefind-body attribute instead.
--exclude-selectors <EXCLUDE_SELECTORS>
Custom selectors that Pagefind should ignore when indexing. Usually you will want to use the data-pagefind-ignore attribute instead.
--glob <GLOB>
The file glob Pagefind uses to find HTML files. Defaults to "**/*.{html}"
--force-language <FORCE_LANGUAGE>
Ignore any detected languages and index the whole site as a single language. Expects an ISO 639-1 code.
--serve
Serve the source directory after creating the search index
-v, --verbose
Print verbose logging while indexing the site. Does not impact the web-facing search.
-l, --logfile <LOGFILE>
Path to a logfile to write to. Will replace the file on each run
-k, --keep-index-url
Keep "index.html" at the end of search result paths. Defaults to false, stripping "index.html".
-h, --help
Print help
-V, --version
Print version
This blog uses Simplified Saaze. In the case of Simplified Saaze I generate static files like this:
php saaze -mortb /tmp/build
This builds all static files in /tmp/build
, which happens to be in a RAM disk on Arch Linux. Then change to this directory and issue
$ time pagefind -s . --exclude-selectors aside --exclude-selectors footer --force-language=en
Running Pagefind v1.0.3
Running from: "/tmp/build"
Source: ""
Output: "pagefind"
[Walking source directory]
Found 555 files matching **/*.{html}
[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.
[Reading languages]
Discovered 1 language: en
[Building search indexes]
Total:
Indexed 1 language
Indexed 555 pages
Indexed 33129 words
Indexed 0 filters
Indexed 0 sorts
Finished in 1.618 seconds
real 1.65s
user 1.49s
sys 0
swapped 0
total space 0
The command
pagefind -s . --force-language=en
would habe been enough in many cases. In my special case I want to exclude content, which resides between <aside>
and </aside>
, and similarly between <footer>
and </footer>
.
The option --force-language=en
is required in my case as I have English and German posts.
Without this option pagefind would create two distinct indexes: You can then either only search in one language but not in the other.
By forcing the language pagefind puts everything into a single index.
See Multilingual search.
Indexing creates a directory called pagefind
. Just copy this directory to your web-server during deployment. This directory looks something like this:
pagefind
├── fragment
│ ├── en_0933ef4.pf_fragment
│ ├── en_100be25.pf_fragment
│ ├── en_10b07a1.pf_fragment
│ ├── . . .
│ └── en_fef8cdb.pf_fragment
├── index
│ ├── en_22c87b9.pf_index
│ ├── en_26afa46.pf_index
│ ├── en_2a80efb.pf_index
│ ├── . . .
│ └── en_fde0a3b.pf_index
├── pagefind.en_d6828bd6ef.pf_meta
├── pagefind-entry.json
├── pagefind.js
├── pagefind-modular-ui.css
├── pagefind-modular-ui.js
├── pagefind-ui.css
├── pagefind-ui.js
├── wasm.en.pagefind
└── wasm.unknown.pagefind
3 directories, 596 files
These files in index
are usually around 40KB each, those in fragment
are usually around 1-10 KB each. The JavaScript totals 100KB, CSS is less than 20KB.
4. Network traffic. Pagefind was particularly designed to only load small amounts of data over the network. This can be seen from below diagram.
This makes Pagefind particularly attractive performancewise.
5. Using Pagefind as user. Using Pagefind as user is intuitive and needs no further explanation. This blog has Pagefind integrated into every page as of now. Just type a word you want to search, then results will pop-up almost instantly. This instant reaction is no surprise as the actual searching is done in the browser.
There is one slight limitation of Pagefind: currently you cannot search for word groups. I.e., consider Shakespeare's Hamlet:
To be, or not to be, that is the question
Searching for to
or be
would likely give you lots of results, but probably not the ones you are looking for. Clearly not a problem for this blog, as I do not have lyrics here.