YAY, Another YAML!

YAML is a popular file format that's easy for people to read and to write, but it's difficult to parse by hand. YAY is a system for creating simple-to-parse file formats that look and feel similar to YAML.

Here's an example:
 - At the top level, this example is an object.  An object associates names with
   values.  Each name ends in the : symbol and is followed by its associated
   value.  In this example, there's just one name in the object: "Here's an
   example".
 - Associated with the "Here's an example" name is this array.  Each entry
   starts with a - symbol.  Note that the entries in the array can continue onto
   the next line as long as the indentation matches.
 - Objects can appear in arrays as well.  Here are some fruits and their colors:
 - Carrot: Orange
   Banana: Yellow
   Eggplant: Purple

YAY doesn't try to disambiguate between objects, arrays, and other types of data. Instead, it provides procedures which break arrays into sequences of multi-line string values and objects into sequences of values with associated names. These string values can then be broken down into further objects and arrays. This means there's no way to write a generic YAY parser -- it's a system for creating your own file formats, not a file format on its own.

arrays

Each value in a YAY array begins with a hyphen. If the hyphen is followed by a space, the value begins on the same line, indented along the column after the space. If the hyphen is followed by a newline, the value begins on a new line, indented according to the number of spaces beginning the new line. Each value must be indented more than the dashes themselves.

- value on the same line
-
 value on a new line
- indent same-line values
  like this
-
 indent new-line values
 like this
-
      or
      like
      this

objects

The rules for objects are similar to arrays, except that instead of a hyphen, each value begins with a name. Names must not contain newlines or colons and must end with a colon. Just like hyphens in arrays, colons in objects can be followed by a space or a newline. The indentation rules are also the same as array values.

here's an example name: value on the same line
another:
 value on a new line
name: indent same-line values
      like this
more names:
 indent new-line values
 like this
last name:
      or
      like
      this

parsing YAY formats

The remainder of this document describes how to parse YAY arrays and objects. To begin, let's define exactly what we're parsing.

A YAY string (or just a string, for short), is an array of bytes together with two integers: an indentation and a current line length. The indentation is how many spaces to ignore after each newline. The current line length is the number of bytes between the previous newline and the start of the string -- it's used to determine the indentation for nested values.

To create a YAY string from a plain array of bytes (like you might get from a file), set both its indentation and its current line length to zero.

To parse a string as an array, repeatedly run the parse the next array value procedure. The procedure will either return a string value or signal that there are no more values in the array. Similarly, to parse objects, repeatedly run parse the next object value, which returns the names associated with each value alongside the values themselves.

Each procedure operates on a string and modifies it. For example, parse the next array value will remove the bytes representing the parsed value from the beginning of the string. When parse the next array value is run again, it will remove the next parsed value from the string in turn, and so on until every value is parsed.

Procedures can be run on the values returned by procedures to parse nested structures.

To consume a byte from a string, perform these steps:

  1. If the string's array of bytes is empty, return EOF (a value different from any byte value) and skip the rest of these steps.
  2. Let byte be the first byte in array of bytes.
  3. If byte is 0xA (an ASCII newline), set the string's current line length to zero. Otherwise, add one to the string's current line length.
  4. Remove the first byte from the array of bytes, decreasing its length by one.
  5. Return byte as the consumed byte.

To parse the next inner value in a string, perform these steps:

  1. If the string's array of bytes is empty, return an empty string as the next value and skip the rest of these steps.
  2. If the first byte in the string is equal to 0xA (an ASCII newline),
    1. consume a byte from the string,
    2. as long as the first byte in the string is equal to 0x20 (an ASCII space), keep consuming a byte from the string,
    3. if the string's current line length is less than or equal to its indentation, return an empty string as the next value and skip the rest of these steps.
    Otherwise, if the first byte in the string is equal to 0x20 (an ASCII space), consume a byte from the string. If it isn't equal to 0x20, this is a "the '-' or ':' must be separated from the following value by a space or a newline" error -- signal that there are no further values and skip the rest of these steps.
  3. Set value to a copy of the string. Set value's indentation to the string's current line length.
  4. Repeat the following steps in a loop:
    1. If the string's array of bytes is empty, return value as the next value and skip the rest of these steps.
    2. Consume a byte from the string. If the byte is not equal to 0xA (an ASCII newline), go back to the beginning of the loop.
    3. As long as the first byte in the string is equal to 0x20 (an ASCII space), keep consuming a byte from the string.
    4. If the first byte in the string is equal to 0xA (an ASCII newline), consume a byte from the string and go back to the beginning of the loop.
    5. If the string's current line length is less than the string's indentation, or if the string's array of bytes is empty, perform the following steps:
      1. Let remaining length be the length of the string's array of bytes, plus the string's current line length, plus one.
      2. Remove remaining length bytes from the end of value's array of bytes.
      3. Return value as the next value and skip the rest of these steps.

To parse the next array value in a string, perform these steps:

  1. If the string's array of bytes is empty, signal that there are no further values in the array and skip the rest of these steps.
  2. If the string's current line length is not equal to its indentation, this is an "incorrect indentation" error -- signal that there are no further values in the array and skip the rest of these steps.
  3. If the first byte in the string is equal to 0x2D (an ASCII hyphen), consume a byte from the string. If it isn't, this is a "missing '-' to start array element" error -- signal that there are no further values in the array and skip the rest of these steps.
  4. Parse the next inner value in the string and return it as the next value in the array.

To parse the next object value in a string, perform these steps:

  1. If the string's array of bytes is empty, signal that there are no further values in the object and skip the rest of these steps.
  2. If the string's current line length is not equal to its indentation, this is an "incorrect indentation" error -- signal that there are no further values in the object and skip the rest of these steps.
  3. Set name to a copy of the string. Set name length to zero.
  4. Repeat the following steps in a loop:
    1. Consume a byte from the string. If the byte is equal to 0x3A (an ASCII colon), perform the following steps:
      1. Truncate name's array of bytes to its first name length bytes.
      2. Parse the next inner value in the string. Return it as the next value in the object, with name as its associated name, and skip the rest of these steps.
    2. If the string's current line length is less than or equal to the string's indentation, this is a "newlines not allowed in object name" error -- signal that there are no further values in the object and skip the rest of these steps.
    3. Add one to name length.
    4. If the string's array of bytes is empty, this is a "missing ':' to end field name" error -- signal that there are no further values in the object and skip the rest of these steps.