Self-describing format

What it is

It's a format on how to tokenize files.

It need a file header that starts with an element separator

Then a part separator, comment opener & closer, context opener & closer are defined in between element separator.

It then follows with trimmable characters up until an element separator.

As an example:

ABACADAEAFAGGGA
DATA HERE

Here:

See more examples below

What's weird about it

  1. Elements separator must be a single character
  2. Space as an element charcter makes the header look weird
  3. The header kind of looks weird on its own :/

Why

The goal of this format is to not be restrictive as to how to present data to the user.

It also thrives to be easily made compatible with existing formats.

The semantics of the data would still be up to the program to decide.

Why not only one abstraction depth

Why have parts and elements inside a context? Why not only elements and contexts?

The idea behind it is that most format out there already have a two level deep.

The two examples below kinda show this. CSV has , for parts and ; or \n for elements. Smalltalk has as delimiters between methods and objects, and has . has separator between statements.

To access deeper levels I propose to use contexts openers and closers kind of how Smalltalk has [...] blocks or C-link languages have {...} blocks.

If I would only define elements separator and context, we would basically have s-expressions which forces a lot of context indentation, also known as parens-hell.

Examples

CSV

As an example, CSV could be parsed from this format.

;,;{{{;}}};{{{{;}}}}; \n\t;
header 1, header 2, header 3;
value 1, value 2  , value 3 ;

Here we do not want comments nor contexts

A Smalltalk like language

. ./*.*/.[.]. \n\t.
SMALLTALK HERE

Lisp

 ___ (* *) ( )  \n\t 
LISP HERE
XXIIVV webring