Written by Mordecai
on May 24, 2026

A sketch of PersonalDB

PersonalDB¹ is a little concept I’ve been throwing around in my head for supporting some things around the endless text files people (or at least, me) have lying around for notes, cheat sheets, references, plans, etc. It’s a thing I have kinda started building, but I need to come to grips with what exactly it should be, hence this article.

In essence, the origin of the problem is that “a file” does not really lend itself to a useful taxonomy for personal writing. You can make it so, by using it as a vague categorization system, but you will inevitably run into the ruin of all rigid categorizations, the edge cases that are two kinds of something. For an illustrative example, I have two files, cooking.org, which houses my cooking recipes, and baking.org, which houses my baking recipes. Smashed cucumber salad? Clearly cooking. Cake? Clearly baking. Soda bread? Savory, but baked, so clearly baking. Now, naan? It’s bread, and all of the other bread is with the baking, but it isn’t baked but instead griddled, does that make it cooking? It’s a question I’ve answered by just adding the recipes to both files to avoid the frustration of opening a file and then having to backtrack after searching, but that now means drift if it were to change.²

The solution would clearly be some sort of tagging system rather than a taxonomy. Naan gets both tags, both searches find it, grand. This would be solvable with org-mode, which supports tagging. But then I’m forced into re-doing all the files and the structure, and eh… Tagging systems outside of things like org-mode are just kind of… bad. They’re usually just kind of barely functioning, really old, or bring way more machinery to the table than I want to for purposes of organizing some text files.

As I was thinking about this, I remembered columnar databases, and how they made additional columns almost free, as well as entity component systems (ECS) that kinda do similar things. Then I was thinking about how this is personal, one-human scale infrastructure, and how that lets me do some things that would never fly in many-humans infrastructure.

The Idea Behind PersonalDB

The core idea is abstracting over “a text file”. There is an entity, or a row, with an ID. An entity can have any number of columns, which are key/value pairs, but the default value is null. This makes any entity, if we squint a bit, a sparse JSON object, when trimmed for null values.

Suppose now that we take all of our recipes of some variety, and turn them into entities that looks like this:

{
    "text": "Delicious, Buttery Naan\n // the text key contains the entirety of the original text file.",
    "cooking_recipe": true,
    "baking recipe": true
}

Given a big list of all entities, finding recipes is now very easy. Attaching metadata is now very easy. I could include a column of contains_gluten at no cost and filter against it when reviewing recipes for what to make if I have a guest with celiac disease.

So far, so “why not just include these in the file and use grep?”. To me, the ‘big trick’ of this is this: This same DB could house all my documents, all my text files, all ad-hoc structured entities I come up with for various reasons, with no loss of functionality, and the same underlying model, and the same tools. Blog posts in this blog could be entities in the same DB, with the recipe column values set to null and a blog_post column being set to true, and it would not impact anything at all, but now I no longer really have to deal with the file system taxonomy for something that to me is fundamentally the same type of thing, a piece of personal or internal writing³.

Because this is also a very simple data structure, it would also allow me to use it for ad-hoc storage of personal tooling, having access to the same abstractions and the same data. Magic: The Gathering deck lists could be a column tag, bibliography citations could be a column tag. It also enables notes on just about everything that is stored here, just by adding a column. Accessing this DB programmatically from a script or other application is trivial, and I probably will write a few libraries just for that.

Inconsistency (what if something is tagged as a M:TG decklist but also as a cooking recipe and it now has two ‘types’?) does not really matter, because there are no by default no not-me consumers of the DB. There’s just me, and I’ll see it unexpectedly, and fix it. This gets a bit more dicey with automated processing via scripts, but honestly, that doesn’t really matter all that much to me.

The Sketch Of A User Interface

The idea is that this is personal-scale software, so I can cut some corners I would never get away with in other places. I’ll use a CLI as illustration, although the exact same interface can be implemented in many ways, since it is quite simple at its core.

There are fundamentally four actions: list, get, write, and delete.

list is the search functionality. It accepts any number of predicates that run against the collection of entities, for example: list baking_recipe=true text=~naan. These predicates are fundamentally AND conjunctions, but OR is possible as well: list either(baking_recipe=true, cooking_recipe=true). The predicate DSL will have some leeway for being pleasant to use, like any or testing for non-null if no value is specified. list also returns the entity ID.

get takes an entity ID, and assembles the full entity JSON object from there. It returns a JSON object, with all non-null columns. Column order is disregarded entirely. There is also an option to pretty-print the keys, and to transform the raw JSON blob into readable text that doesn’t have escaped whitespacing and newlines.

write takes a entity ID and a JSON object, and replaces what the entity with the given entity ID with the object.⁴ If it contains a new column, a new column is created on write.

delete deletes (or alternatively, tombstones) an entity. Columns are checked for whether or not this was the last entity with such a column, and then deleted if so.

The Storage Engine Underneath

With this being a small columnar DB, the easy answer is to throw this to Parquet, and move on with life, and chances aren’t bad that I might just do that.

An alternative I’ve been thinking about that preserves the plain-text origins of this, is rooted in entity component systems, and that is to just turn each column into a separate file, with the entity ID as the key in a big object:

/T/t/personaldb tree .
├── baking_recipe.json
├── cooking_recipe.json
└── text.json

Where then each file contains a sparse ‘list’ of column values:

/T/t/personaldb jq < text.json
{
  "21": "Delicious Buttery Naan\n...",
  "121": "Smashed Cucumber Salad\n..."
}

This makes the management and search of this outside of specialized tooling possible, whereas Parquet would just be a pile of binary blobs that I can’t do anything with outside of the tooling.

A Note On Maintenance

File systems for notes get messy quick, nobody wants to do the house keeping when it’s about doing something, or makings something. Part of the reason why I like this idea so much is that it makes housekeeping very easy, because there are now no longer any folders that are just forgotten in some dusty corner. The same structure houses everything, and reading any note forces you to see all of its set columns. If a column seems old or antiquated, it can be very easily overhauled programmatically, or the column file outright deleted, and that is that.

I think I’ll spend some time in the next few days and weeks building something like this, it seems fun and rewarding for just me.

Or OmniDB, as I’ve been kicking it around in my head, but it sounds too grandiose for what it is, honestly. ↩
Obviously, this is barely a real issue. It’s however the little paper cut that made me think about this topic, and then I clearly thought a little too much about it, and here we are. ↩
See this post for what I call ‘internal writing’. ↩
Chances are I will instead create a new entity and shift the entity ID, for “undo” support, given that writes are destructive. ↩

← → Top