Link’s Awakening disassembly progress report – part 8

This article is part of an ongoing “Disassembling Link’s Awakening” series, where I attempt to gain some understanding on how special effects were implemented in this game.

In the past weeks, a lot of work related to entities got made. The entities placement data was parsed, and the entities handlers were finally all figured out. Let’s dive in!

Entities placement data

First, what’s an entity? The “entity” term is vague, and may have several meanings. In this context, an entity represents a dynamic element in a room – such as an NPC, an enemy, an interactive device, and so on (as opposed to static room building blocks: walls, floor, pits, etc.).

A screenshot of Marin singing
Here’s how a room look like with only the static objects (but without entities).

A screenshot of Marin singing
And here’s how the room entities look like.

(Sidenote: you may ask, “So entities are simply sprites, right?” Well, not exactly. A sprite is a simple 8x16 pixels image displayed over the background. An entity will usually display itself by managing several independent sprites.

For instance, Bow-Wow is a single entity composed of several sprites: two for the head, one for the shadow, and four for the chain.)

Now, how are entities in a room defined? Very simply, each room has an associated list of entities, indexed by the room id. When loading a new room, the game only has to:

  1. Use the room index to find the address of the list for that room;
  2. Walk the entities list, and for each value create an entity at the given position.

But until now, these lists weren’t parsed: although strictly speaking the entities placement data had been dumped, only raw unstructured bytes were present.

; data/entities/entities.asm

; Entities placement data.
; Each entry places entities at a specific location in a room.
;
; TODO: write a proper Python script to generate the pointers tables
; and entities objects properly.

    db   $FF, $FF, $24, $39, $FF, $05, $42, $32, $2C, $55, $2C, $FF, $14, $17, $03, $42
    db   $FF, $13, $17, $66, $16, $15, $1C, $33, $1C, $FF, $23, $59, $FF, $31, $27, $45
    db   $19, $FF, $FF, $27, $20, $FF, $22, $90, $27, $90, $34, $90, $05, $42, $FF, $11
    db   $27, $18, $27, $61, $27, $68, $27, $FF, $FF, $24, $29, $FF, $35, $29, $14, $17
    db   $67, $16, $FF, $44, $1E, $26, $19, $35, $19, $FF, $67, $17, $55, $16, $23, $1E
    db   $FF, $34, $61, $38, $81, $36, $82, $FF, $34, $19, $35, $19, $44, $19, $45, $19
    db   $FF, $14, $20, $52, $1C, $57, $1C, $FF, $43, $1E, $46, $1E, $54, $19, $55, $19
    ; continued for 300 lines…

The raw bytes for entities lists are not very readable. Although if you squint, you can notice the list separator.

Writing a Python script to parse the entities was easier than parsing the room static objects: the entities lists data format is quite straightforward to parse, and some of the static objects code structure was reused.

Also, the static objects data format had a lot of quirks (unused labels, duplicated rooms, invalid pointers…). But for entities lists, the only intricacy was some lists being referenced by multiple rooms – which wasn’t hard to detect and handle properly.

In the end, it only took a couple of hours to generate a parsed version of the entities lists:

; data/entities/indoors_a.asm
; File generated automatically by `tools/generate_entities_data.py`

IndoorsA00Entities::
  entities_end

IndoorsA01Entities::
  entities_end

IndoorsA02Entities::
  entity $2, $4, ENTITY_INSTRUMENT_OF_THE_SIRENS
  entities_end

IndoorsA03Entities::
  entity $0, $5, ENTITY_OWL_STATUE
  entity $3, $2, ENTITY_SPIKED_BEETLE
  entity $5, $5, ENTITY_SPIKED_BEETLE
  entities_end

IndoorsA04Entities::
  entity $1, $4, ENTITY_SPARK_CLOCKWISE
  entity $0, $3, ENTITY_OWL_STATUE
  entities_end

IndoorsA05Entities::
  entity $1, $3, ENTITY_SPARK_CLOCKWISE
  entity $6, $6, ENTITY_SPARK_COUNTER_CLOCKWISE
  entity $1, $5, ENTITY_MINI_GEL
  entity $3, $3, ENTITY_MINI_GEL
  entities_end

; continued…

Easy enough: each list is associated to a room, and an entity is defined by its vertical position, horizontal position, and type. A nice, well-structured data format, without surprising behaviors. Thanks, original game developers.

Of course this nice readable list owes much to a couple of macros, that help to transform the readable values into a sequence of bytes:

; code/macros.asm

; Define an entity in an entities list
; Usage: entity <vertical-position>, <horizontal-position>, <type>
entity: macro
    db   \1 * $10 + \2, \3
endm

; Mark the end of an entities list
entities_end: macro
    db   $FF
endm

Entities handlers

Once entities are loaded in a room, they need to actually do something. For this, each entity has an associated entity handler. This piece of code is responsible for defining how the entity looks, how it is animated, and how the player can interact with it.

Entities handler are executed on every frame, for each entity. This is implemented by having a large table of function pointers to all entities handlers.

On each frame, the game will enumerate all entities, and:

  1. Load the current index of the entity,
  2. Lookup the address of the handler for this entity in the handlers table,
  3. Execute the entity handler.

Until know, although the code banks containing the entity handlers lookup table and code has been identified, the code fragments weren’t properly sorted out. All of this looked like a mess of instructions, dozens of thousands of them.

; Table of entities handlers
; First 2 bytes: memory address; third byte: bank id
EntityPointersTable::
._00 db $34, $6A, $03
._01 db $61, $44, $19
._02 db $96, $66, $03
._03 db $E3, $7B, $18
._04 db $B2, $69, $03
._05 db $28, $53, $03
._06 db $49, $52, $03
._07 db $DD, $7B, $07
._08 db $66, $79, $18
._09 db $E9, $57, $03
._0A db $26, $6A, $03
._0B db $27, $58, $03
; continued…

Cross-referencing all these function pointers to their respective location in the source code was tedious, and took several weeks.

But now, every handler has been labeled and cross-referenced in the handlers table. And again, a small entity_pointer macro even makes it easy to read:

; First 2 bytes: memory address; third byte: bank id
entity_pointer: macro
    db LOW(\1), HIGH(\1), BANK(\1)
endm

; Table of entities handlers
EntityPointersTable::
._00 entity_pointer ArrowEntityHandler
._01 entity_pointer BoomerangEntityHandler
._02 entity_pointer BombEntityHandler
._03 entity_pointer HookshotChainEntityHandler
._04 entity_pointer HookshotHitEntityHandler
._05 entity_pointer LiftableRockEntityHandler
._06 entity_pointer PushedBlockEntityHandler
._07 entity_pointer ChestWithItemEntityHandler
._08 entity_pointer MagicPowderSprinkleEntityHandler
._09 entity_pointer OctorockEntityHandler
._0A entity_pointer OctorockRockEntityHandler
._0B entity_pointer MoblinEntityHandler
; continued…

How useful is that?

Well, now it’s easy to know which section of code is responsible for the behavior of a specific entity. Are you interested by the exact behavior of arrows? Do you want to explore all the easter-egg messages Marin can tell to Link when following him? How are the several forms of the end-game boss implemented? You can just follow the handlers table, and start reading the code for these entities.

What’s next?

Of course, there is still a lot of work to be done regarding entities.

First, it would be nice to move all entities handler to their own files. Instead of having large files like entities/bank15.asm, they would be split into entities/arrow.asm, entities/marin.asm, and so on.

Also, the code for these entities is not documented yet: many helpers used to work with entities are not understood yet, which makes the code difficult to read. Documenting all these helpers would definitely help.

The way the game loads entities is not completely figured out yet. Notably, the behavior where entities cleared in a room do not immediately respawn when moving to another room is still mysterious to me: the game must have a way to keep track of which entities have been destroyed recently (in order not to respawn them), but how?

Last, we can see in the handlers table that several entities have been blanked out during the development of the game – but also that several entities have associated code, but never appear in the game. Analyzing the code to see what these entities were supposed to do could sure be interesting.

Want to read more? Discover more of the code, or join the discussion on Discord.

Link’s Awakening disassembly progress report – part 7

This article is part of an ongoing “Disassembling Link’s Awakening” series, where I attempt to gain some understanding on how special effects were implemented in this game.

The big news of this progress report is that the disassembly is finally standalone: it doesn’t require the original ROM anymore to fill out uncompleted sections. The project can now be built entirely only from its source code.

Why wasn’t that done before? After all mgbdis, the most-commonly used disassembler, can generate a valid disassembly of any Game Boy ROM in ten seconds.

So what took so long?

In the beginning

First, at the time this project begun, mgbdis didn’t exist. So the project started with a custom-made disassembler, written in C.

As a work-in-progress, support for less common special cases was added progressively to this disassembler. Which means it was tested first on a few sections of code – hoping that progressive improvements would allow it to generate the entire disassembly.

But instead, this custom disassembler slowly bitrot.

mgbdis

Enter mgbdis: a pretty nice generic disassembler, that guarantees it will always generates valid code that can be compiled back to the original ROM. Awesome! We can now apply it to our game, have it entirely disassembled, and commit the result to the project, right?

The problem is that some sections were already decompiled using the custom disassembler, which uses slightly different conventions. How to make these existing sections match the mgbdis output format? We won’t want to re-generate these sections from scratch: there are already heavily documented, with labels and annotations in comments.

In the end, the most sensible way to produce a good disassembly was to improve mgbdis itself, so that it would be useful even when working with an existing partial disassembly. This meant:

In the end, it was possible to generate the entire disassembly using mgbdis, while respecting the existing code, symbols and formatting conventions. Then it was just a matter of copying the code into our partial disassembly, section by section – having just to fix up a small number of remaining errors on the way.

Very good, but not entirely satisfying.

Code and data

You see, one of the initial tasks when working on a vintage game disassembly is separating code from data.

Most cartridge-based games of the early days don’t have a file system. Everything is compiled into a single binary file (usually called a ROM)1.

So when trying to figure out how the game work, there are no separate files in the ROM – and usually no metadata to tell us which parts of the binary are compiled code, 2D-textures, 3D-models, animations, maps, music… All we have is this large binary file, and we must try to sort out code from data.

A basic solution to this problem is to interpret everything as code. When generating a disassembly, let’s just consider every byte to be a valid code instruction. This is exactly the approach of mgbdis.

Every time a new section of code was copied from the mgbdis-generated disassembly, we knew that a large part of this code was probably actually data. And it felt inelegant to just insert these wrong segments of code into the project. Shouldn’t we sort out code from data before inserting them into the project; at least roughly?

This is why the integration of the mgbdis-disassembled sections was slow: before generating each section of code, it was manually scanned to guess which sections would actually be data. Many sections were integrated to the projects using this process, but it was tedious and slow.

Fortunately, some people tried to find solutions to automatically sort out code from data.

A tracing disassembler

A partial solution for sorting out code from data is to use a profile-guided disassembler. The idea is to play the game in a special emulator, which records every instruction executed to a game profile. Then a disassembler can use this profile to mark every executed instruction as code – and all the other as data.

The issue with this, of course, is that any missed code path will be interpreted as data. If during the profile-recording the player misses a gameplay branch, the code for this branch will never be marked as executed. And so this code will be turned into data by the disassembler.

If only we could know which code paths are executable, without having to play through every branch of the whole game…

That’s exactly the idea behind tracing disassemblers. Instead of being guided by the game being playing dynamically, a tracing disassembler will read the ROM statically, without actually executing it. But it will try to follow the flow of the instructions, one by one, and mark all the reachable bytes as code. In the end, all the possible branches should be traced – which means that all the remaining bytes are data.

Of course, this means a tracing disassembler must be good at following code paths. If some code path are missed, they will incorrectly appear as data instead. And following code paths can prove quite challenging.

Instructions

The easiest case, of course, is easy: start with the entry point of the ROM, and follow instructions from there. For instance, with simple instructions:

EntryPoint::
    ld   a, $30        ; $0100
    ld   [rP1], a      ; $0102
    ld   a, $01        ; $0104
    ld   [rKEY1], a    ; $0106

Great, so we know that bytes $0100 to $0108 of the ROM are actually code (and not data).

Jumps

When an unconditional jump occurs, we can simply mark the target address as being executable – and continue the tracing from here. For instance, continuing from the previous example:

    ld   [rKEY1], a    ; $0106
    jp   $0120         ; $0108

So we know that byte $0108 is really code – but also that the jump target address, $0120, also holds some executable code. And the tracing can continue from there.

And when the code reaches a conditional jump, then both the target address and the next instruction are marked as executable. The disassembler will then trace one of the code paths until the end, then remember to follow the other one later.

    ldh  a, [$FFF1]     ; $0120
    jp   z, $0200       ; $0121
    xor  a              ; $0124
    ldh  [$FFF1], a     ; $0125

In this case, when reaching the instruction jp z, $0200 (i.e. “jump to $0200 if the result of the previous operation is zero”), two addresses are marked as executable: the jump target address, $0200, and the next instruction taken if the branch is not taken, $0124.

Banking

Here start the issues.

Older consoles like the NES, Super NES and Game Boy have this concept of banks. Banks solve a problem that presents itself very quickly when programming a game on these consoles: the addressing space is too small. As pointers are only two-bytes long, the maximum length of ROM that can be addressed in 65 KB. And games can grow bigger than that pretty quickly. Although Super Mario Bros. 1, as a marvel of engineering, takes just under 32 KB, many games will need much more resources to display rich graphics, sounds and behavior. Link’s Awakening DX, for instance, is a 1 MB game.

Game Boy memory map – without bank switching

So how was this addressing problem solved? By allowing some of the address space to be swapped dynamically during gameplay. For instance, on the Game Boy, there are two code slots (a.k.a “banks”) of 16 KB: the first one, usually referred as bank 0, is always loaded into memory. But the second one is dynamic: the game can request for bank 3, 4 or 42 to be loaded into memory, and then copy data or execute code from this bank.

Game Boy memory map – using bank switching

Problem solved!2

So banks can be swapped in and out. When reading a game’s source code, it can for instance look like this:

    ; Load bank 2 into memory
    ld   a, $02                             ; $0200
    ld   [MBC3SelectBank], a                ; $0202
    ; Jump to the address $4020 in bank 2
    jp   $4020                              ; $0205

So, taken all by itself, the “jump at address $4020” is ambiguous: it could mean “the address $4020 in bank 1”, or “the address $4020 in bank 5”, or in bank 37 – it all depends on the bank currently loaded. Although here we know by reading the code that the loaded bank will be bank 2.

Which means that, in order to know what an address actually refers to, a tracing disassembler has to know which bank is loaded at the point an instruction is executed.

Fortunately, in the example above, the bank switch is relatively easy to figure out: the disassembler can track when a new value is written to the MBC3SelectBank memory address, and update it’s internal representation accordingly. Although, as you can see, the bank write always goes through an intermediate register (in the example above, a). So to know which value is written to MBC3SelectBank, the disassembler must also track the content of each register. Err, not so easy, but ok.

So now we can trace code jumping through different banks. Good.

Dynamic bank switching

Enter dynamic bank switching. Often, the bank number isn’t hardcoded in the ROM: it is figured out at runtime. For instance, some code could be written like this:

    ; If the player in currently indoors…
    ld   a, [hIsIndoor]
    and  a
    jp   z, .outdoor
    ; … use the bank $20
    ld   a, $20
    jp   .endIf
.outdoor
    ; … else use the bank $21.
    ld   a, $21
.endIf

    ; Load bank $20 or $21 into memory
    ld   [MBC3SelectBank], a
    ; Jump to address $4020 in bank $20 or $21
    jp   $4020

So the disassembler has to understand that, at the point the final jump is executed, the target bank can actually have several different values.

And this is the easiest case: the bank number can be further manipulated by a function–or even read at runtime from some sort of table. Great.

In this case, a tracing disassembler may try a combination of being smart (like storing the different possible values for the active bank), or starting to require human assistance. For instance, a human can read the code, and then tell the disassembler that at this specific location, a can only be $20 or $21 (instead of just any possible value).

Dynamic jumps

We’ve seen how the bank number can sometime by dynamically generated at runtime – but the jump address also can.

Often games will read the target address of a jump from an external array (which is sometime called a “jump table”). And sometime the address will even be computed entirely from code.

JumpTargetsArray:
    dw   $4000
    dw   $4100
    dw   $4200

JumpToTarget::
    ; Make `hl` the address of the jump targets array
    ld   hl, JumpTargetsTable
    ; Add the array index contained in `bc` to `hl`
    add  hl, bc
    ; Read the jump target address from the table
    ld   a, [hl]
    ; Jump to the address read from the table.
    ; (Can be either `$4000`, `$4100`, or `$4200`.)
    jp   hl

This is often the nail in the coffin for tracing disassemblers. It becomes very hard to figure out the jump target without executing the entire program. And if the tracing disassembler can’t figure out the target, then all the code accessible from this jump will be flagged as a blob of data, instead of code.

This is generally the moment when some per-game configuration has to be done by hand. Dynamic jumps are not that frequent in a game – and they usually follow a predictable pattern. So, by writing custom recognizers that can identify these formats, the disassembler can read the jump tables directly – and mark all the targets addresses as executable code.

Of course the recognizers to identify the jump table patterns are different for every game. And even in the same game, some dynamic jumps don’t obey to a specific pattern: they must be identified using a single-use recognizer, that will be applied only to a single code location.

So, while this is the end of our dream for a completely automated tracing disassembler, all of this is still useful. Even if some manual work is required, this is way better than manually going through lengthy sections of disassembled output, figuring out where the code stops and the data starts.

Making the disassembly standalone

Tracing disassemblers for the Game Boy are still experimental. But Discord user @featherless provided the output of a custom tracing disassembly which they are working on, applied to Link’s Awakening ROM.

And the results are stunning: in banks containing a lot of mingled code and data, the tracing disassembler can sort out executable code from other arrays, tables, and data with great accuracy.

So three months ago, when the final rush to finally add the remaining missing banks to the disassembly started, having this traced disassembly made adding the last banks much easier. Of course some manual work had to be performed to match the style of the existing disassembly – but it was way better than identifying data sections by hand.

Conclusion

So the disassembly is now standalone: it can compile back to the target ROM without needing external resources.

More important, the general purpose of each code section is roughly figured out. Instead of a long series of files named “bank35.asm”, “bank36.asm”, “bank37.asm”, most of the files are split and named according to the function of the role they contain (“entities.asm”, “super_gameboy.asm”, “photos_cutscenes.asm”, and so on).

Of cours most of this code is still generated by an automatic disassembler: there are no human-readable labels or documentation about the exact purpose of each function yet. All this work still has to be done. But it will be much easier to figure this out once the large pieces of the puzzle are identified.

For instance, good progress has been made recently on understanding the how entities system works. Hopefully this will be further expanded in the next progress report.

New contributors

Since the last report, the following people made first-time contributions to the project:

Want to read more? Discover more of the code, or join the discussion on Discord.

  1. This lasted more or less until the advent of CD-based games, which started to use file systems to package the various game assets. 

  2. Of course, for game developers, this “solution” of using banks quickly becomes a programming nightmare. Swappable banks mean, for instance, that code can’t jump from bank 2 to bank 3, as only a single non-0 bank can be loaded into memory at once. Writing a routine in bank 4 that copies data in bank 2? Also not possible: these two banks can’t be loaded at the same time.

    Every time an operation like this is needed, the code has to use a trampoline: a small piece of code in the bank 0 (which is always loaded) that will swap the banks, execute the target code, and then swap back to the original bank. So code has to be architected around this limitation, which this is usually one of the major hurdle to making ambitious games on these platforms. 

Link’s Awakening disassembly progress report – week 6

This article is part of an ongoing “Disassembling Link’s Awakening” series, where I attempt to gain some understanding on how special effects were implemented in this game.

These “weekly” reports are more like monthly now, and will probably be renamed at some point. But work continues! Let’s see what interesting happened in the past weeks.

Making the disassembly stand-alone

An ongoing effort is to make the disassembly standalone. For now, the disassembled files covers only around 80% of the ROM. The remaining blanks are filled at compile-time using the original ROM.

This is not optimal. It means that a copy of the original game is still needed to compile the disassembly. Also some parts of the code jumps to functions that are not referenced in the disassembly yet.

Making the disassembly standalone requires adding those missing and less figured out banks. And this is what is being done now, starting with the audio code.

Audio code and data

The audio in the game is split in several banks; each one covering a specific kind of audio effect:

The code and data for playing these audio effects was missing from the ROM–until now. This month, a basic disassembly was added for banks containing the audio code ; that is banks 0x1B, Ox1E and 0x1F. The code now lies in the src/code/audio/ directory.

This is a good progress towards understanding more of the audio formats. However the disassembly is still very rough: it has only very few labels and comments–and the music data format has not been figured out yet. This allows to fill blanks in the project for now; but much more work is waiting.

Credits

Continuing the “making the disassembly standalone” project, another bank was added this month: the code and data responsible for the ending sequence and credits. And there are a few tidbits of interest in it.

For instance, it seems that the staff roll code was written way before the other parts of the credits sequence. The staff roll code and data is laid out at the beginning of the bank–even the code that switches to the different parts of the ending sequence comes after.

This is unusual: most of the time the entry point of the bank is near the beginning of the bank, and the dispatch table for the different states comes quickly after.

Link's Awakening – layout of bank 17

Was the credits code written before the ending sequence was designed? Or is it a relic from “For the frogs the bell tolls”, the game whose code was used as the basis of Link’s Awakening engine?


Another interesting find is the way the staff names are displayed during the credits roll.

Link's Awakening Staff Roll

First, how is this nice transparency effect done? The Game Boy doesn’t have dedicated graphics functions to display full transparency, blending the foreground with the background.

So a hardware trick is used throughout the game: it relies on the LCD screen latency. When a game object needs to have some transparency (like shadows or ghosts), it is only displayed every other frame. As the individual pixels on the screen have some latency, the final color that the screen displays to the player is a mix between the foreground and the background color–thus providing the transparency effect.

But why was transparency used here? Although it showcases a visual effect uncommon and difficult to achieve on this console, it also makes the end result is not very readable.

The thing is, the developers had to.

Remember, the Game Boy has three ways to display graphics on screen: a tiled background, a window overlapping the background, and sprites. Here the tiled background is already used to display the clouds and the see. And the window is always opaque–it can’t be configured to show the background image underneath. So the letters need to be displayed using sprites. Sprites can have one color of transparency, and show the background underneath, so this is all fine.

Except for one thing. At most 10 sprites can be displayed on a single line. Due to a hardware limitation, aligning more than 10 sprites on the same horizontal line will result in a nasty uncontrollable flicker. And some names will clearly use more than 10 letters, and thus more than 10 sprites. The letters would flicker like crazy.

So what do developers usually did back then? A common technique is to avoid displaying all the sprites on a same line at once. For instance by, you guessed it, displaying some of the sprites only on even frames–and the others only on odd frames.

And this is exactly what Link’s Awakening developers did. The staff roll actually looks like this:

Link's Awakening Staff Roll - Even frames

Link's Awakening Staff Roll - Odd frames

Clever? Yes.
Readable? Not so much.
Unavoidable? Definitely.

Contribution activity

Since last month there’s been a lot more discussions on the Discord server of the project. People much more knowledgeable than me about Zelda DX and disassembling things chimed in–coming from the pret discussion group, the Zelda speedrun community, and the Zelda 4 randomiser chat.

I’m really happy about having discussions with other people about this project, other disassembling efforts, and more. For instance, @featherless used an experimental tracing disassembler of them – to produce a relatively high-level disassembly of the ROM that automatically sorts out code from data.

Also this month, to give an easier time to newcomers, the README file has been streamlined, new wiki guides were written, and others were completed.

Automated checks

And, of course, say hello to our new continuous integration bot! This bot will tell you, for each PR, wether the code still compiles, and check that it still produces a 1:1 reproduction of the original ROM.

Setting up a compilation toolchain in the cloud wasn’t that easy (I’m looking at you, Docker). But it was worth it: since it was set up, the automated checks already caught a mistake of mine.

Github automated version check passing
Isn’t that green checkmark lovely?

What’s next?

My short-term goal is still to make the disassembly stand-alone in a relatively short time.

But there is more to do. For instance, the entities data could be better parsed. And more exciting, all the entities IA code is just sitting here, waiting to be indexed. Ever wanted to know how was implemented the behavior of your favorite Zelda enemy? Now is the chance to figure it out.

Want to read more? Discover more of the code, or join the discussion on Discord.

Link’s Awakening disassembly progress report – week 5

This article is part of an ongoing “Disassembling Link’s Awakening” series, where I attempt to gain some understanding on how special effects were implemented in this game.

It’s been a while! The last disassembly progress report was published more than one year ago. Meanwhile, what happened?

Well, kind of a slowdown, actually. But a few weeks ago, with the release of Link’s Awakening remake on the Switch drawing nearer, the disassembling efforts gained some steam again.

Which means many things appeared! Let’s see what’s new.

New repository

In 2015, on the 11th of August, mojobojo started a disassembly project. Four years later, this project has grown tremendously. With the participation of several external contributors, it made little sense to still have the repository under a single user account.

Thankfully, mojobojo agreed to transfer the admin rights to an organization. Which means the repository has a new home! It is now reachable at github.com/zladx/LADX-Disassembly.

The new organization makes also possible to add many admins and many contributors, which could make the contributions smoother.

More maps

Many of Link’s Awakening ROM hacks feature new maps. Throughout the years, many level editors came and went, trying to figure out how to read the game maps, and how to insert modified maps back without breaking the ROM.

Zelda Link’s Awakening entire Overworld Isn’t that beautiful?

Unfortunately there is no central documentation about the map data format. Informations are scattered here and there, deep in the code of the level editors.

So with the help of Xkeeper0, a new parser for the maps, rooms, objects and warp points was written in Python. This parser emits nicely formatted files containing the rooms table and objects.

In the end, the format is easy enough:


The neat thing is the objects format.

To save space on the original cartridge, rooms are not stored as a sequential array of all the rooms’ blocks (which would always use 10 × 8 bytes per room).

Instead, rooms are painted.

First, a configurable ground tile is repeated on the whole room.

room-1

Then, in dungeons, as the walls of most rooms look the same, a room template is applied.

room-2

And afterwards, objects are laid out individually on the room, in single-blocks – or in strips spanning several blocks.

room-3

room-4

room-5

room-6

In the end, here is how the final room looks like, after it has been entirely painted with objects.

room-final

The final size for this: only 30 bytes (instead of 80 bytes if all objects had to be stored individually). Neat. The game even defines a few macros which can paint directly a whole house or tree, for instance.

This also allows for nice animations of the rooms being constructed. You can even see the designers changing their minds mid-development, and overlapping bushes with flowers.

An animated version of an overworld room being constructed
Credits: XKeeper0 Link’s Awakening Depot

More code

Last year, around 3 banks of code had been disassembled. Today, we are at 9 banks of code disassembled–and counting.

Disassembling new banks used to be difficult. The disassembler spew lot of invalid code, that had to be fixed by hand, and the existing variables and functions were not carried to the newly disassembled code.

This changed with the use of mattcurie’s mgbdis new disassembler, which emits very good quality code.

But disassembling a bank is useless if we don’t do anything with it. Once we have the raw code, it is only useful after cutting some noise: separate data from the actual code, and cross-referencing it with the existing banks. All of this takes time–so new banks are added progressively.

Recently, banks 14 and 20 were added. They contain some inventory code, the code responsible for applying room templates when loading a dungeon’s room, overworld audio tasks, palette effects…

… and something interesting: the code for enemies, NPC, dungeon bosses and entities behavior. Reverse-engineering the bosses IA will be fun.

More audio

The game uses the Game Boy audio capabilities for good effect: it may simultaneously play a long music track, a short jingle, and very short sound-effect.

A screenshot of Marin singing
I know you can hear the music in your head.

For a long time, the disassembly labelled two variables to control the sound effect:

; Play an audio effect immediately
hSFX::      ds 1 ; FFF3

; Play an audio effect next
hNextSFX::  ds 1 ; FFF4

Put the identifier of a sound effect in hSFX, and it will immediately. Put it in hNextSFX, and it will play after the current sound effect is finished.

Easy, right?

Well, while I was disassembling the code for the overworld audio tasks performed on each frame, I noticed the identifiers between these two variables didn’t match. For instance, writing $04 in hSFX would play a beeping SFX_LOW_HEARTS sound effect; but writing the same value in hSFX would play a “The doors of the room are now open” sound effect. Hmm.

I dug into the code to find where hNextSFX was eventually written into hSFX. I couldn’t find such code. Something was wrong.

Now, audio is not my speciality. So I turned to the Game Boy hardware documentation. Turns out that the Game Boy can generate a wide variety of audio (in stereo, for good mesure): it has two square-wave outputs, one waveform output, and one configurable noise generator.

Oh. Of course.

The reason two sound effects can sound different is that there are actually two kind of sound-effects: wave-based, and noise-based. It all comes to the hardware capabilities.

Everything makes much more sense:

Now I could fix the code:

; Play a waveform-based audio effect immediately
hWaveSFX::   ds 1 ; FFF3

; Play a noise-based audio effect immediately
hNoiseSFX::  ds 1 ; FFF4

In hindsight this is quite obvious, and probably well-known by ROM hackers. But well, this was my epiphany. At least we now have a growing but already decent list of the different sound effects.

And now?

Two days ago, while trying to understand the entities data format, it occurred to me: this project is now closer to its completion than from its beginning. Maps, tiles and dialogs are dumped, most banks are disassembled, around half of the code is at least partially documented. And with all the data already labelled, the second half should be much easier.

Of course there are tons of things still to be done. Tilesets, warp data and chest data are missing. The room formats could be much improved. The enemy IA is in view, but still will require efforts. Audio has only been lightly touched.

But it has never been easier to make progress. Gone are the days of erring in a pile of assembly instructions, having no clue of what the code is doing: there is always a labelled variable to give a hint.

Want to read more? Discover more of the code, or join the discussion on Discord.

An in-depth look at Link’s Awakening render loop

This article is part of an ongoing “Disassembling Link’s Awakening” series, where I attempt to gain some understanding on how special effects were implemented in this game.

We’ve seen previously how Link’s Awakening renders the opening cutscene. This time, let’s take a step back, generalize, and have a look at the main render loop.

The game’s core

Games are usually based on a main loop. Conceptually, it looks like this:

while (true) { // loop forever
  processInput();
  updateGameState();
  renderFrame();
  waitForNextFrame();
}

In plain text, the main loop repeats the same operations, once per frame, over and over:

Of course this is a simplification, and a few key elements have been omitted (for instance there is no audio there). For more information, you can read this much more extensive article about the render loops generic structure.

Link’s Awakening makes no exception, and has its own render loop, right after the initialization code. Let’s see how it is handled.

Read more…