NeetCode Newsletter
Posts
How Notion uses SQLite for Caching

How Notion uses SQLite for Caching

Notion uses SQLite for client-side caching. We'll talk about why they implemented this and hurdles they needed to solve

September 07, 2024

How Notion uses SQLite for Caching

Notion is a popular productivity app. It’s used it for taking notes, building spreadsheets, managing calendars, tracking timelines and more. Think Google Docs, but much more versatile.

Notion Branding

Engineers at Notion must ensure that every user has a great experience regardless of whether they’re using Notion through the website, windows application, iOS app, macOS app etc.

In 2021, Notion started using a SQLite database for caching data in the windows and macOS apps. This change made initial page loads 50% faster and also sped up navigating between pages by 50%.

Objective: In 2021 Notion wanted to use caching to bring performance gains for users who use Notion through their browser. For example, caching API responses and page edits to the local file system. This means navigating between pages wouldn’t always result in an API request to Notion’s backend.

Notion wrote a terrific blog post delving into how this change was implemented and why it was challenging. We will summarize the main technical details below.

Intro to SQLite

SQLite is an open source, embedded relational database. It supports ACID transactions, full-text search, geospatial queries and much more.

The key feature of SQLite is how it’s lightweight and “serverless”, but in a different sense. It doesn’t require a separate process for the server. Instead, a single library contains the entire database system and it integrates directly into your application code.

Comparing SQLite vs. traditional client-server databases.

If you’re building a Python app then you can install and import the sqlite3 library and use it to create tables and make SQL.

The limitation is that all the data must be stored locally in the same environment that the application code is running.

SQLite is a great option when you’re writing an app where the user doesn’t have a stable internet connection. It lets you easily store the user’s data on their device and read/write the data with SQL. It also supports the most common programming languages.

It’s used extensively in areas like:

Embedded Systems - Devices like industrial controllers, automotive infotainment systems, home automation systems, etc. can often have limited internet connectivity while requiring high reliability and low latency. SQLite provides an easy way to store data on the device without taking up too much RAM/CPU.
Desktop & Mobile Apps - If you’re building apps for macOS, windows, iOS, Android, etc. then you can import SQLite into your application code and use it for writing data locally to your user’s disk. It’s useful if your users need to use the app without a network connection or if you want to cache data locally.
Web Development - With SQLite compiled to WebAssembly, you can also use SQLite with your web applications (you just import it in your JavaScript code). This is useful for offline functionality, caching data locally and more. In the rest of the article, we’ll talk about how Notion used WASM SQLite for caching data in the browser.

How Notion uses SQLite in the Browser

Notion uses the WebAssembly implementation of SQLite to implement caching in the browser.

With WebAssembly, you can take programming languages other than JavaScript and compile them to a low-level assembly-like language (called WebAssembly or WASM) and run it in the browser. Modern browsers can all run WASM with near-native performance.

Under the hood, the WASM SQLite library uses the browser’s File System API to write the state of the SQLite database to the user’s hard drive. More specifically, the library uses the Origin Private File System (OPFS) endpoint in the File System API.

Here’s how the caching works at a high level

Web Workers - A web worker is a feature browsers provide that let you run JavaScript in a background thread without affecting the browser’s UI thread. This lets you run computationally intensive tasks on your website locally without affecting the user experience. Each Notion tab that you open in your browser has its own dedicated web worker for writing to SQLite.

For example code of web workers please see below:

// Create a new Worker
const worker = new Worker('worker.js');

// Send a message to the Worker
worker.postMessage('Hello from main script');

// Receive messages from the Worker
worker.onmessage = function(event) {
  console.log('Received from worker:', event.data);
};

// Receive messages from the main script
self.onmessage = function(event) {
  console.log('Received in worker:', event.data);
  
  // Send a message back to the main script
  self.postMessage('Hello from worker');
};

Web Locks to prevent Multiple Writers - The “active tab” (whichever the user opened last) is the only tab that is permitted to use its web worker for writing to SQLite. Notion manages this by using the Web Locks API and a separate web worker that interacts with all the open Notion browser tabs, referred to as SharedWorker.

SharedWorker Passes Queries to the Active Tab - SharedWorker will keep track of which tab is the “active tab”. Whenever SharedWorker gets a database write from a non-active Notion tab, SharedWorker will redirect the query to the active tab’s web worker.

Diagram from Notion’s Engineering Blog

Issues Notion Faced

Database Corruption Issues

Prior to using the SharedWorker-powered approach (with locks), Notion faced issues around data corruption with the SQLite database. Some users were seeing the wrong data on a page with comments attributed to the wrong co-worker or incorrect page previews.

Why? Concurrency issues. In theory, SQLite is ACID compliant. That means concurrent transactions should be isolated. But it turns out, that the browser’s OPFS API, which was used by the SQLite processes, did not have the same support.

The OPFS API (for reading/writing to disk) didn’t have good concurrency handling so that was causing corruption issues. This issue was resolved by switching to the SharedWorker-powered architecture we described above (only the active Notion tab can write to disk).

Notion Blog - The architecture of WASM Sqlite at the point they observed corruption issues.

Slow Disk Reads

The Notion team started to see performance regressions in the slowest devices (the 95% percentile times for page Navigation were getting worse). After investigating, they found it was because older Android phones were reading from disk extremely slowly.

Surprisingly, in some cases it was faster to read from Notion’s backend rather than from a user’s disk. Can you think of a quick fix? Simply have the browser “race” the two async requests (disk cache with SQLite vs. network request with Notion’s API). This meant that the website would automatically use network requests if reading from disk was too slow.

Notion sent requests to both their backend and the user device’s local disk cache.

Not Loading SQLite Asynchronously

Initially, the Notion team wasn’t loading WASM SQLite asynchronously. This meant the download/processing was blocking the page load process and slowing down the initial page load. After loading WASM SQLite asynchronously, they saw faster page load times but it meant that the initial page data wouldn’t be loaded from SQLite (it would have to be a network request). However, this trade-off still resulted in a lower page load time for users.