First-Order, Second-Order Expressions, and Delimiters in Languages

This article non-rigorously examines a simple observation about “languages (loosely defined as a series of expressions that are evaluated) by revealing a fundamental duality between first-order and second-order expressions. All opinions herein are solely our own and do not express the views or opinions of our employer.

There seems to be a pattern in languages (in the broadest sense in the context of this article: a series of symbols that are evaluated or interpreted): a duality between first-order expressions and second-order expressions that have different purposes but paradoxically use the same symbols.

This observation is probably self-evident, but we have not seen it explicitly formulated:

  • A language primarily uses first-order expressions, which are intended to be evaluated. A human reader evaluates a human language to produce a representation of the world. An example of a first-order expression in English is: It is raining outside.

  • A language also needs to represent information used in a capacity different from first-order expressions. This information is represented with second-order expressions. Example: “And what is the use of a book,” thought Alice, “without pictures or conversations?”. In this context, this expression has a different purpose: it represents what someone other than the narrator is thinking or saying.

  • First- and second-order expressions use the same symbols (for instance, the English alphabet).

  • First- and second-order expressions are separated with delimiters. Only delimiters identify if an expression is of the first-order kind or of the second-order one. In other words, they signal a transition between purposes.

  • A language evaluator (e.g., someone reading a sentence) needs the delimiters to switch between first-order and second-order purposes.

This observation can be applied to many forms of languages. We suggest three of them: natural language, artificial language, and genetic code.

  • Since antiquity (for example, ḏd-mdw in Ancient Egypt), human language has used this mechanism to report what someone other than the narrator says. In English, the delimiters are the quotation marks.

  • Many programming languages use this approach to distinguish between evaluated and represented statements. Examples in Python: print(1 + 1) → 2 v. print('1 + 1') → 1 + 1.

    LISP uses an apostrophe (or QUOTE). Quine programs are characteristic of this approach: they usually duplicate and combine first-order symbols (the code evaluated by the computer) and second-order symbols (the code represented by the computer) to blur the line between them as they output the exact representation of the evaluated code. The delimiters are usually quotation marks, too.

  • When viewing DNA as a language, a similar mechanism appears to be in place. Palindrome-like sequences act as delimiters to identify segments with a different purpose than the rest of the DNA. For instance, when a bacteria is infected by a virus, it can store part of the DNA of the latter—in segments named “spacers”—(second-order expression) into its own DNA (first-order expression). Precisely because they are second-order expressions, these spacers are not evaluated but ultimately (after their transcription) used as signatures by Cas9 to detect known invading viruses. They act as a genetic memory (i.e., they represent information). More generally, CRISPR embodies this duality.

    In the scientific literature, analogies with human language focus mainly on the palindromic features (for example, this parallel with the Sator square). While this is true, we believe a more meaningful analogy would be to compare them with quotation marks. Because genetic code has a limited set of “symbols,” it cannot express the delimiters with a special base: the palindromic disposition of the bases constitutes the delimiters. It is as if the English language did not use special characters but palindrome-like sequences instead as delimiters: She told him xyab It is raining outside aayx and therefore . . .. (In this example, xyab would represent the switch from first- to second-order expression [opening delimiter] and aayx, the transition from second- to first-order expression [closing delimiter]).

The following table summarizes this idea.

There are certainly more examples, and we notably wonder if there is a possible articulation between artificial intelligence and this duality:

  • Can models infer by themselves the existence of first- and second-order expressions and the role of delimiters in switching between them?

  • Could this duality be leveraged to reinforce models’ security by making them inherently distinguish pure knowledge representation from information evaluation during the training stage?