Beyond Kilo: Some Improvements for My Text Editor

I recently followed through the Build Your Own Text Editor tutorial that walks through all of antirez's kilo editor as a way to learn more about how terminals really work under the hood. In the process I ended up having so much fun that I decided to re-implement everything in Rust and expand what I had into a more capable editor.

There are obvious features that I think anyone would want to add like an undo stack and support for multiple files, but I wanted to start with improving some of the ways the editor interacted with the terminal. This post documents some improvements I've made so far, but I'm still learning about this stuff. Expect some stream of consciousness, unanswered questions, and potentially incorrect (or more likely incomplete) information.

Alternate Screen Buffer

One of the first things that I found was that kilo leaves traces of itself behind in the terminal's scrollback. More complete editors like Vim, Emacs, and Nano all return the terminal to the state it was in before they were launched.

The answer is a feature called alternate screen buffer. This StackOverflow answer describes how to switch into and out of the alternate screen buffer in a backwards compatible way by writing the escape sequences "\033[?1049h\033[2J\033[H" and "\033[2J\033[H\033[?1049l" respectively.

That same answer also provides a function for writing to stdin that still confuses me. It's unclear to me from the answer why I would want to write to stdin instead of stdout. Writing these strings to stdout seems to work fine, but I need to investigate that further.

Resizing the Terminal

The next thing I wanted to fix was handling the terminal being resized. There is a naive solution which is to resize the terminal between every key press. Kilo's input loop runs every tenth of a second, so this works relatively well, but it seems inefficient and could be more responsive.

Unfortunately the only way to detect a terminal resize is handling the SIGWINCH signal. Signal handlers can be dangerous and are generally discouraged if not strictly necessary. They're essentially another thread that isn't allowed to do any locking, so ultimately the only way safely communicate back to the main application thread is writing to a named pipe. The main thread can then read data from that pipe and trigger a resize.

If this were the only signal I needed to handle, I would probably forgo signal handling altogether and opt for the naive solution. Unfortunately there is one place I actually need one.

ctrl+z Support

In a typical terminal application, hitting ctrl+z will cause the process to be sent to the background. This utilizes a feature the shell provides called job control and is typically handled automatically. Unfortunately there are two problems that prevent this from working as expected.

The first problem is raw mode's handling of key presses. In raw mode, ctrl+z is sent along to the application instead of being handled by the terminal. Normally the terminal sends the SIGTSTP signal to the application when ctrl+z is pressed. I couldn't find any information on how to make a process background itself, so I guessed and tried making it send SIGTSTP to itself using libc's kill function.

unsafe {
    libc::kill(std::process::id() as i32, libc::SIGTSTP);
}

Thankfully that worked. The process was sent to the background with raw mode and the alternate screen buffer being exited as well.

The second problem is returning to the foreground. Running fg does resume the process, but it doesn't handle entering raw mode and the alternate screen buffer. To make matters worse, there are now multiple race conditions to contend with.

You may assume you can just make the next line of code after sending SIGTSTP enter raw mode and the alternate screen buffer, but that doesn't work. Signal handling isn't synchronous so that code can run before the process is backgrounded. The only way to ensure that code runs after returning to the foreground is to handle the SIGCONT signal.

Using the same method as the resize handler, the signal handler can write data to a named pipe then have the main thread read that to trigger entering raw mode and the alternate screen buffer. There is another race condition though. The application thread can write to stdout before this handler is finished thus printing data outside the alternate screen buffer and not in raw mode. This means some sort of locking needs to be added to prevent writing in the time between SIGTSTP being sent and raw mode and the alternate screen buffer being re-entered.

Making it Feel More Responsive

The main application thread is now reading data from two signal handlers to handle resizing and returning to the foreground. With the input loop structured like Kilo reading that data can only happen between key presses. In the worst case, that can mean waiting the full 1/10th of a second timeout before a signal takes effect. That's not a terrible delay but it is enough to be perceptible. I decided I wanted better.

I ended up adding two threads to the application. The first thread reads from stdin then sends structured events back to the main thread via a Rust channel. The second thread reads the named pipe the signal handlers are using and sends structured events back to the main thread through the same channel. Instead of taking turns reading stdin and the signal handler pipe, the main thread now just reads the single event stream and decides what to write to stdout based on that.

Input Timeout and the Escape Key

At this point my editor is starting to feel much more polished. Resizing and ctrl+z work, and when it exits nothing is left behind in the terminal's scrollback.

With the new threaded architecture I decided to revisit the way stdin was being read. I thought the timeout might not be needed anymore. Reading stdin could now block forever without affecting the main thread, but this proved to be a mistake.

The 1/10th second timeout was useful for allowing the editor to periodically handle other tasks while waiting on the user to press a key. It served a second function however: detecting when the escape key was pressed.

It turns out that the only way to differentiate the user pressing the escape key from the terminal sending an escape sequence is to wait a little while to see if an escape sequence follows the escape character. If that 1/10th of a second passes with no further data we assume it was the user pressing escape.

But what about network delay? What if an escape sequence gets spread apart in time? What if user input gets bunched up? What if the user just types something in really fast? How can we tell the difference? The answer is you can't.

You can test it if you want. Open a text file with some content in Nano and as quickly as you can type <esc>[F. That's the escape sequence for End. If you do it fast enough the cursor will move to the end of the line. Nano thinks you're the terminal sending an escape sequence.

So ultimately I'm stuck with the timeout. It's the only way to distinguish the user from the machine. I'm never going to be happy with the ambiguity involved with handling the escape key, but that's just how it is when dealing with legacy behavior from the 1960s.