We will find arbitrary graph structure by interpreting strings as foreign keys. With that said, we now assert some limitations on the JSON we will read which make it more suitable for transmission between programs.
We take some inspiration from the LangSec movement.
# Structure
JSON consists of nested Arrays, Objects and Scalars. We abbreviate these A, O and S where we might think of S as String but know it could also be Number, True, False and Null. If we expect an Array of Scalers we would write that using regular expression conventions as AS*
Array of Scalars.
AS*
Object of Scalars.
OS*
Array of Objects of Scalars.
A(OS*)*
Object of Scalars and Arrays of Scalars.
O(S*|AS*)*
Alternation in our regular descriptions implies that interpreting the json will require case analysis based on object field names or reflection on found element types.
Alternation also enables variable depth in the json tree structure but not unlimited nesting.
Array of (Scalars or Array of (Scalars or Array of Scalars))
A(S*|A(S*|AS*)*)*
We consider this limitation on nesting a feature of this approach. It is what makes this notation regular instead of only context-free.
# Limits
Imagine running this simple json parsing script overnight.
while true do echo '[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[' done | jq .
A structure validator could quickly reject this input before the second echo got to run.
We similarly should limit the number of elements in an array, the number of fields in an object, the number of characters in a string, and the number of digits in a number.
Imagine parsing this unlimited string.
( echo '["' while true do echo 'xxxxxxxxxxxxxxxxxx' done ) | jq .
Experiments show that browsers handle one or two megabytes at a time. Wiki caps individual posts at five but this is easily exceed by assembling multiple images on a single page. post
Choosing even numbers, I might suggest 64 unicode characters for field names and 128k for strings. A more complete specification might enumerate permissible field names and define their own limits for strings in each context.