DOS from Scratch: Hello World

Previous: Initial Bootloader

Now that we have a machine that can boot and execute some code, there is enough of a foundation to do a “Hello World” tutorial. There isn't too much we can do inside the 512 bytes of the boot sector, but it's enough space to play around with some fundamentals before moving on to loading an actual kernel.

So the boot loader from the previous post contained this block of code to print a dollar sign to the screen:

mov al, "$"
mov ah, 0x0e
int 0x10

You can probably infer what most of this does. int 0x10 triggers an interrupt handler that does something with the contents of ax. We printed a dollar sign so al is probably where we put a character we want to print, and the 0x0e in ah must refer to some sort of “print character” function. But how does that actually work? What if we want to print a string and not a single character? What if we want to use colors? Can we change the font? What if we want to deal with graphics instead of text?

INT 10H

INT 10H (or int 0x10 as I prefer writing it in code), is the interrupt vector for the BIOS video services. It can do several useful things, but what we care about right now is controlling text output.

When triggering INT 10H, ah is used as the function code. The function code we used (0x0e) is for teletype output, which behaves more or less like a TTY you're probably used to from modern operating systems. In text mode, it prints a character at the current cursor position then advances the cursor. It also line wraps if necessary, scrolls the screen if necessary, and responds to various control codes.

So that's enough information to output a character, and based on the described behavior you can probably figure out how to print full strings using a simple loop. This information also presents some unanswered questions though. Are there other types out output besides teletype? What modes besides text are there? What other function codes can INT 10H handle?

Video Modes

Let's start from the top. The very first function code INT 10H provides (function code 0) allows you to set the video mode. The value of al determines which of the many available modes the video card will use, but all of these fall into one of two categories: text and graphics.

In text mode, the video card interprets video memory as ASCII characters and renders those to the screen.

In graphics mode, the video card interprets video memory as pixel data. If you want to print text in this you'll have to write your own font rendering code, which is far beyond the scope of this post.

I'm going to choose to use text mode 3, which is a 80x25 character text mode with 16 colors. On any modern computer or emulator this is likely the mode the computer booted into, but we'll set it to be safe. As an added bonus, setting the video mode will also clear the screen.

mov ah, 0
mov al, 0x03
int 0x10

NOTE: BIOS also supports multiple “pages” of text that you can toggle between, but for the time being we'll just do everything on page 0.

Cursor Position and Shape

Before we begin outputting text, it's important to understand how to manipulate the cursor on screen. Some output functions don't advance the cursor, so manually advancing the cursor is necessary.

Function 0x01

This sets the cursor shape. The cursor is always a full character width but you can specify the starting and ending rows to control the height and vertical position of of the box that's drawn. In text mode 3, characters are 16 pixels tall. The default underline cursor starts at line 13 and ends at line 14 (I assume that last pixel is left blank for line spacing). If you wanted the cursor to be a box instead, you could fill from 0 to 14 instead.

; Make cursor a square
mov ah, 0x01
mov ch, 0
mov cl, 0x0e
int 0x10

Function 0x02

This sets the cursor position using row and column indexing with 0, 0 being the top left of the screen, and 24, 79 being the bottom right corner.

; Put cursor in bottom right corner
mov ah, 0x02
mov bh, 0
mov dh, 24
mov dl, 79
int 0x10

Text Output Functions

INT 10H provides four functions for text output.

Note: The first three should work on any IBM compatible PC, but the string output function wont work on some older machines. I haven’t been able to find a definitive answer as to where the exact cutoff is, but any machine with a VGA card is almost certainly new enough.

Function 0x09

This writes a character with a given attribute at the current cursor position. It does not advance the cursor or respond to control codes. It can, however, repeat a character more than one time.

The attribute refers to the color of the text, which in 16 color mode can be anything between 0x00 and 0x0f.

; Set the page to use (used by function 0x02 and 0x09)
mov bh, 0

; Set the color to red (used by function 0x09)
mov bl, 0x04

; Move cursor to 10, 0
mov ah, 0x02
mov dh, 10
mov dl, 0
int 0x10

; Output "r" at the cursor position
mov ah, 0x09
mov al, 'r'
mov cx, 1
int 0x10

; Move cursor to 10, 1
mov ah, 0x02
mov dh, 10
mov dl, 1
int 0x10

; Output "e" at the cursor position
mov ah, 0x09
mov al, 'e'
mov cx, 1
int 0x10

; Move cursor to 10, 2
mov ah, 0x02
mov dh, 10
mov dl, 2
int 0x10

; Output "e" at the cursor position
mov ah, 0x09
mov al, 'd'
mov cx, 1
int 0x10

; Move cursor to 10, 3
mov ah, 0x02
mov dh, 10
mov dl, 3
int 0x10

Function 0x0a

This is the same as 0x09, but you don't provide an attribute value. The text will be printed using whatever the last attribute value at that position was.

Function 0x0e

The behavior of the teletype output function was mostly discussed above, but it should be added that this function will also retain the previous attribute value like 0x09.

Other Useful Functions

We've discussed all the text output functions BIOS provides, but there are several other useful functions you can use in conjunction with those.

Function 0x05

This function selects which display page to use.

; Set the currently displayed page to page 0
mov ah, 0x05
mov al, 0
int 0x10

Function 0x06

This function scrolls an area of the active page up one row. Text that goes outside that area isn't retained, so scrolling up one followed by scrolling down one would result in the top line of text in that area being blank.

; Clear the screen (assuming an 80x25 character video mode)
mov ah, 0x06

; Scroll 25 rows and set the attribute to 0x07 (white)
mov al, 25
mov bh, 0x07

; Top left corner of scroll area
mov ch, 0
mov cl, 0

; Bottom right corner of scroll area
mov dh, 24
mov dh, 79

int 0x10

Function 0x07

This scrolls the active page down. It behaves the same as 0x06 aside from the direction.

Printing Strings

Now that we've gone over all the various text related functions of INT 10H, we can put all that together to actually print strings. What's that you ask? Isn't that just using function 0x013?

It could be depending on what you were building, but this series is specifically about learning how DOS works, and function 0x013 isn't actually all that useful for displaying DOS's string format.

DOS’s print string function operates on dollar terminated strings. Rather than taking the length of the string as a parameter, it just keeps printing characters until it comes across a dollar sign. We could loop over our string to find its length, but if we’re looping anyway it’s easier to just print as we iterate.

As best I can tell from experimentation, the DOS print string function interprets characters the same as the BIOS teletype output, with one exception. DOS does not display the character 0x27 (escape), and also does not display the next character following 0x27. I don't fully understand why, but I'll update this section once I do.

For now, I'm going to implement the print string function by using the BIOS teletype output function and call that close enough.

; Prints a $ terminated string
; bx: address of the string to print
print:
    pusha
  
    ; SI will point to the current character
    mov si, bx

    ; Set the arguments for BIOS teletype output
    mov ah, 0x0e
    mov bx, 0

    print_loop:
        ; load the next character from memory
        mov al, [si]

        ; check if we're at the end of the string
        cmp al, '$'
        je print_exit

        ; print the character
        int 0x10

        ; move on to the next character
        add si, 1
        jmp print_loop

    print_exit:
        popa
        ret

What if I don't want to go through the BIOS?

In some cases, you may find it more effective to directly write text to graphics memory instead of using the various character and string printing functions provided by the BIOS. In text modes, text is stored directly in graphics memory and the graphics adapter handles converting that text data into pixels. All that's necessary to modify this data directly is knowing where it's located in memory and how it's structured, but that will be discussed in a future post about VGA.

Conclusion

Now that we have a print function we can update our bootloader to print an actual “hello world” message.

; boot.asm
[bits 16]
[org 0x7c00]


main:
    ; disable interrupts
    cli

    ; make sure the CPU is in a sane state
    jmp 0x0000:clear_segment_registers
    clear_segment_registers:
        xor ax, ax
        mov ds, ax
        mov es, ax
        mov ss, ax
        mov sp, main
        cld

    ; re-enable interrupts
    sti

    ; print a message
    mov bx, msg
    call print

    ; "It's now safe to turn off your computer."
    hlt
    jmp $-1


; Prints a $ terminated string
; bx: address of the string to print
print:
    pusha
    
    ; SI will point to the current character
    mov si, bx

    ; Set the arguments for BIOS teletype output
    mov ah, 0x0e
    mov bx, 0

    print_loop:
        ; load the next character from memory
        mov al, [si]

        ; check if we're at the end of the string
        cmp al, '$'
        je print_exit

        ; print the character
        int 0x10

        ; move on to the next character
        add si, 1
        jmp print_loop

    print_exit:
        popa
        ret


msg:
    db "Hello World!"
    crlf db 0x0d, 0x0a
    endstr db '$'


; padding
times 510-($-$$) db 0

; literally magic
dw 0xaa55

Next: A Simple Toolchain