I was reading something and boom, this lands in my feed. A Polish theoretical physicist just proved you can recreate all of math's elementary functions from JUST one operation.
Every single operation you'd find on a scientific calculator: +, −, ×, ÷, √, trig, log, hyperbolics, inverses, even constants like e, π and i, as you can see below. Extremely mathematically elegant.
My first reaction was, ok cool, another reduction paper, those show up every few years, let me skim it. Then I hit the abstract properly and sat up. The claim wasn't "we reduced 36 operations to 6 primitives", which would be a nice optimisation. The claim was that a single two-input function plus the number 1 is enough to reconstruct every constant, every arithmetic operation, every trig function, every hyperbolic function, every inverse, every radical. Everything. The entire scientific calculator collapses into one button and a seed value.
That sounds like one of those "technically true but useless" results until you realize that continuous mathematics has been looking for exactly this for about a hundred years and nobody had found it.
the comparison that makes this land
In 1913, Henry Sheffer published a paper showing that a single two-input logical operator, NAND, is enough to build every Boolean function. AND, OR, NOT, XOR, everything. Dump a pile of NAND gates on a table and you can literally construct an entire CPU out of them. This is not a trick, it's how digital hardware actually works. Modern chips are, at the lowest level, seas of identical gates wired into patterns that compute. That one fact is responsible for about half of what makes silicon cheap.
Continuous math never had this. A scientific calculator has thirty-plus keys, and the reason is that every new function the designers wanted to support needed its own button and its own internal routine. Logarithms reduced multiplication to addition back when Napier published in 1614. Euler's formula folded trig into complex exponentials in the 1700s. The exp-log representation formalised that further. And then it stopped. For the last couple of centuries the minimal basis for elementary functions has been "a few primitives", never one. It felt like the bottom of the well.
This paper pulls the bucket out of the well. A single binary operator, plus the constant 1, is sufficient. The gap between this result and everything that came before is roughly the gap between "we use the Sheffer stroke because it's convenient" and "wait, it actually reduces to one thing."
the operator
Here it is in full. No tricks hidden in the corners.
EML stands for Exp-Minus-Log. That's the whole definition.
The reason you need a constant paired with it is that ln(1) = 0, so anywhere you want to drop the log side of the operator, you plug in 1 on the right. That single trick lets you peel the operator apart into its components whenever you need them. For example:
Read that last one carefully, because it's the first time you see what a proper EML expression looks like. The natural logarithm of z takes three nested applications of the same operator and four instances of the constant 1. It's simple in the Kolmogorov sense but very nested in practice. Most things you'd expect to be short turn out to be long.
Before we get into how anyone found this in the first place, play with it.
EML analyzer — one tree, every operation
Below is the full x + y tree. Every simpler operation is a literal subgraph of it. Pick an operation from the dropdown and watch the active path light up. The numeric value at every node is live, so you can see subtree results bubble up.
- Three nested emls. Inner = eml(1, x) = e − ln(x). Middle = eml(inner, 1) = exp(inner).
- Outer = eml(1, middle) = e − ln(exp(inner)) = e − (e − ln(x)) = ln(x).
- The extra e term that appears in the middle cancels itself at the outer layer.
Try ln z with z = 2.718. The EML tree should spit back 1.0, because ln(e) = 1, and you can watch each sub-node compute its partial value. Then try e^x at x = 2 and confirm you get e² ≈ 7.389. The point of the interactive is that you can feel how an elementary function, which you normally think of as a named primitive, is actually just a specific arrangement of a single repeating element. The "function" stops being a label and starts being a tree shape.
how on earth do you find an operator like this
This is the part that broke my brain a little. You can't just stare at exp and ln and derive their difference by inspection. The paper is very honest about this. It was found by systematic ablation, starting from a list of 36 primitives that together form a standard scientific calculator, which is Table 1 in the paper: 8 constants, 20 unary functions, and 8 binary operations. Elements get iteratively removed from that list, and each remaining set is checked to see if it can still reconstruct the original 36.
The verification itself is where the clever bit lives. Direct symbolic verification is computationally intractable. Kolmogorov-style complexity for these formulas tends to sit around 7 to 9 in Reverse Polish Notation length, which sounds small until you realise the search space at depth 9 is already brutal. So instead of symbolic manipulation, the paper uses a hybrid numeric bootstrap. Free variables x and y get substituted with algebraically independent transcendental constants, specifically the Euler-Mascheroni constant γ ≈ 0.577216 and the Glaisher-Kinkelin constant A ≈ 1.28243, the target expression is evaluated numerically, and then compared against a massive sieve of candidate formulas built from the current operator set.
The reason this works is Schanuel's conjecture. Under Schanuel, a coincidental equality between two algebraically independent transcendentals evaluated at the same point is vanishingly unlikely. So if a candidate formula gives you a numeric match to double-precision on γ and A, it's almost certainly an actual equivalence and not a fluke. This turns formula discovery into a high-precision lookup problem. The double-precision result goes through inverse symbolic calculator software, which returns a candidate closed form, and then you verify it symbolically.
That entire pipeline was originally implemented in Mathematica as a package called VerifyBaseSet. Then GPT Codex 5.3 was used to translate it into Rust, which made it three orders of magnitude faster. Mathematica took hours, Rust takes seconds. That's what made it feasible to run the ablation exhaustively on every element of the 36-primitive list and track how the calculator shrinks.
the reduction chain
Here is the actual shrink, in order. I'm keeping the paper's naming.
| Name | Constants | Unary | Binary | Total |
|---|---|---|---|---|
| Base 36 | 8 | 20 | 8 | 36 |
| Wolfram | π, e, i | ln | +, ×, ^ | 7 |
| Calc 3 | none | exp, ln, −x, 1/x | + | 6 |
| Calc 2 | none | exp, ln | − | 4 |
| Calc 1 | e or π | none | x^y, log_x y | 4 |
| Calc 0 | none | exp | log_x y | 3 |
| EML | 1 | none | eml(x, y) | 3 |
Each row is a complete scientific calculator on its own. Calc 3 sits near the Wolfram Mathematica primitive set, which has been the optimised minimalist basis for about forty years. Calc 2 is stricter: negation and reciprocal fall out, but you need subtraction as the non-commutative backbone, because without a non-commutative operation you lose the ability to invert. Calc 1 is a radically different attack, a top-down approach built from binary exponentiation and binary logarithm, which needs either e or π as its seed constant. Calc 0 absorbs e into the exp function itself, leaving three primitives total.
Calc 0 was the configuration that strongly suggested a single binary operator could exist. Once you have a system with three primitives and one of them is a unary function, it's natural to ask whether you can absorb the unary into the binary. A month of searching later, eml fell out of the search.
EML is also not unique. It has at least two close cousins.
EDL swaps subtraction for division and trades the constant 1 for e. The minus-eml variant flips operand order and requires the terminal symbol to be −∞, which is weird but valid in extended reals.
The paper then speculates, with empty table rows, that a two-primitive or five-primitive system might exist waiting to be discovered. A ternary operator T(x, y, z) = exp(x) · ln(z) / (ln(x) · exp(y)) already has the property that T(x, x, x) = 1, which would let it act as its own constant. That's a separate paper in preparation for Acta Physica Polonica B.
the complexity tax
This is the part you can't ignore. Reducing to one operator is elegant, but the cost is that simple things become giant. The paper includes a beautiful table showing how many RPN instructions each primitive needs when compiled to pure EML form.
| Symbol | EML compiler | Direct search |
|---|---|---|
| e | 3 | 3 |
| 1 | 1 | 1 |
| 0 | 7 | 7 |
| −1 | 17 | 15 |
| 2 | 27 | 19 |
| √2 | 165 | >47 |
| i | 131 | >55 |
| π | 193 | >53 |
| e^x | 3 | 3 |
| ln x | 7 | 7 |
| x × y | 41 | 17 |
| x^y | 49 | 25 |
| √x | 139 | 43 |
Computing π in pure EML needs 193 instructions. Not 193 nested evaluations, 193 nodes in the expression tree. That is absurd by any reasonable standard. But it's also deeply interesting, because it tells you something about how much of the "complexity" of elementary functions we have been hiding inside named primitives. We built our scientific calculator to look simple by giving individual names to the expensive things. Strip the names away and the real cost becomes visible. Digital hardware is fine with this, incidentally: a modern CPU has billions of transistors, so 193 of anything is a rounding error.
The uniform structure is what the paper is really selling. Because every EML expression is a binary tree of identical nodes, you get a trivial context-free grammar.
S → 1 | eml(S, S)
That's the whole language. And that grammar is isomorphic to full binary trees, which means EML expressions are in one-to-one correspondence with Catalan structures. You get a clean combinatorial object to search over. You can enumerate EML trees by leaf count and exhaustively check every small tree. You can compile any elementary function into one. You can even build an analog circuit where each identical component is a physical eml gate, with the same status as a NAND gate in digital hardware or an op-amp in analog electronics. The paper actually draws the schematic symbol.
the part that surprised me
Then the paper does something I wasn't expecting. It treats the EML tree as a differentiable neural network and tries to recover closed-form formulas from numerical data using gradient descent.
The idea is simple once you see it. Build a full binary EML tree of depth n. At every leaf, instead of hardcoding a 1 or a variable, put a soft mixture of the three possible values weighted by trainable parameters.
leafᵢ = αᵢ + βᵢ · x + γᵢ · f
The α, β, γ at each leaf are logits pushed through a softmax, which means the tree is parameterised by roughly 5·2ⁿ − 6 continuous weights. A level-2 tree has 14 parameters, a level-3 tree has 34 parameters which simplex reparameterisation reduces to 20. Standard Adam optimiser, PyTorch, complex128 to handle the internal complex arithmetic, clamping to deal with exp overflow, and then a hardening phase that snaps each softmax output to a vertex of the simplex, meaning each leaf commits to being either a 1 or an x or the previous sub-expression.
The result is that for simple target functions, the network recovers the exact symbolic formula. Fit ln(x) using a depth-3 tree, reduce 34 free parameters to 20 via simplex reparameterisation, run plain NMinimize on a black-box basis, and at the end the snapped weights give exactly ln(x). Not approximately. The mean squared error at convergence is on the order of 10^−32, which is machine epsilon squared. When you round the softmax weights to the nearest vertex, you get the actual closed-form formula, and it generalises perfectly outside the training range.
The blind-recovery numbers across 1000+ runs are also honest:
- Depth 2: 100% blind recovery from random initialisation
- Depth 3 to 4: about 25% blind recovery
- Depth 5: below 1% blind recovery
- Depth 6: 0 out of 448 attempts
So it's not magic. Finding the correct EML tree from scratch gets harder fast. But there's an important follow-up experiment. When you take a correct EML tree, perturb its weights with Gaussian noise, and re-optimise, the optimiser converges back to the correct solution in 100% of runs, even at depth 5 and 6. Which means the basins of attraction around correct EML formulas are valid and wide. The hard part is getting close enough to one of them in the first place.
This is the piece that feels different from the rest of the paper. It's saying that because EML expressions are a uniform, differentiable, complete family, you can do symbolic regression by gradient descent on a fixed architecture. No heterogeneous grammars, no handcrafted operator sets, no tree-search bookkeeping. Just an EML tree, a softmax over leaves, and an optimiser. When the generating law is elementary, the weights snap to exact symbolic values. The paper calls this a form of interpretability that conventional neural architectures cannot provide: if your training succeeds, the network is literally a closed-form formula.
the deep technical dive
If you're the kind of person who wants receipts, here they are.
Three lines of Mathematica reproduce the discovery:
Import["SymbolicRegression.m"]
EML[x_, y_] := Exp[x] - Log[y]
VerifyBaseSet[{1}, {}, {EML}]
That's it. Run this and the package re-generates all 36 elementary operations from Table 1 of the paper within about an hour, depending on your machine. The Rust re-implementation, rust_verify, does it in seconds and also handles arbitrary-precision checks across multiple real transcendentals instead of just two.
The level-2 master formula:
F(x) = eml(
α₁ + β₁·x + γ₁·eml(α₃ + β₃·x, α₄ + β₄·x),
α₂ + β₂·x + γ₂·eml(α₅ + β₅·x, α₆ + β₆·x)
)
Fourteen parameters in total. Set α₁ = 0, β₁ = 1, γ₁ = 0 on the left child and α₂ = 1, β₂ = γ₂ = 0 on the right, and you recover exp(x). Set α₁ = α₂ = 1 with everything else zero, you recover e. Set α₁ = β₁ = 0, γ₁ = 1, α₂ = 1, β₂ = γ₂ = 0, α₃ = 0, β₃ = 1, γ₃ = 0, α₄ = 1, γ₄ = 0, you recover exp(exp(x)). The point is that every reachable function at depth ≤ 2 is a specific vertex of a 14-dimensional simplex. Training the EML tree is equivalent to searching that simplex with gradient descent.
Internals are complex. EML operates over ℂ using the principal branch. You need complex arithmetic internally to generate things like i and π, because π shows up as −1 × i × ln(−1), which means you have to evaluate ln(−1) = iπ. So every EML evaluator, even for real outputs, runs on complex128 under the hood. Real-valued code using <math.h> mostly works but has a subtle i-sign flip on ln(z) for z < 0 that you have to fix with a principal-branch redefinition. The paper also points out that Python and Julia traps on signed zeros, NumPy and PyTorch handle it with overflow signals, and Lean 4's total function convention makes ln(0) return a junk value that breaks formalisation. These are edge cases, not showstoppers, but they exist.
Reproducibility. Code lives at github.com/VA00/SymbolicRegressionPackage, archival snapshot at Zenodo DOI 10.5281/zenodo.19183008.
Caveats the paper is careful about. The existence of an EML-type operator working purely on the real domain seems impossible. Attempts using pairs of trigonometric or hyperbolic functions found nothing. Complex intermediates are not a bug, they appear to be necessary. Also, the identity function x takes leaf count 9 in its shortest non-trivial EML form (though the compiler can return it directly as leaf count 1). The shortest non-trivial EML expression of anything is, somewhat hilariously, longer than you'd guess.
Still open. Whether a truly univariate Sheffer-like function exists, one where even the constant falls out of the operator itself, is open. The ternary operator T(x, y, z) = exp(x) · ln(z) / (ln(x) · exp(y)) with T(x, x, x) = 1 is the current best candidate and is the subject of a follow-up paper. Proving general impossibility is non-trivial because of weird examples like B(x, y) = x − y/2 where B(x, x) = x/2 but B(B(x, x), x) = 0, which the paper uses to motivate why you can't hand-wave the proof.
what this actually changes
You could argue that EML is just a curiosity. It doesn't let you compute anything new, it just re-shuffles what we already compute. Fair. But I think the framing matters more than people might initially notice.
First, elementary functions now have a canonical normal form. Every closed-form expression you can write with sin, cos, exp, ln, and friends has a unique compiled EML tree, which is a unique element of a context-free grammar. That gives you a clean object to hash, search, and deduplicate. Symbolic regression systems have wanted this kind of uniform structure for years.
Second, this is a genuine interpretability handle for machine learning. The paper quietly makes the point that every standard neural network is, in principle, a special case of an EML tree architecture, because ReLU and similar activations are themselves elementary functions. That means an EML-based symbolic-regression network isn't a toy replacement for a transformer, it's a version of a transformer where the weights, when training succeeds, are exact formulas. The circuit is legible as a closed-form expression. That is not something you can currently say about, say, any large language model, and it's a form of interpretability that no conventional deep learning architecture provides.
Third, the analog-computing angle is intriguing. If you can fabricate a physical eml gate the way we fabricate NAND gates, you get a completely uniform substrate for evaluating elementary functions. No lookup tables, no special circuits for different functions, just a sea of identical elements wired differently. Whether anyone actually builds this is a different question, but the paper provides the theoretical ground floor.
my honest take
I didn't expect a 2026 paper to close a gap that has been open since Sheffer. I especially didn't expect the proof to come out of a systematic ablation search rather than a deep structural argument. The vibe of the paper is "nobody actually tried this exhaustively and the search space was tractable with a Rust rewrite", and I really like that. It's a reminder that foundational discoveries sometimes happen because someone with the right tools decides to actually run the exhaustive search.
The complexity tax is real. No one is going to compute π as a 193-node EML tree in production. But production was never the point. The point is that elementary functions, for a hundred years treated as an irreducible handful of things, are actually members of a single connected family under one generator. That's a structural fact about mathematics, not a performance optimisation. The paper is clean, the code is reproducible, and the harder variants are left on the table for the next person to chase.
The conclusion ends with, in my paraphrase, "the EML operator may be the tip of an iceberg". I believe it. If you want to read the original and check my numbers, it's on arXiv at 2603.21852v2.
~ Ashish Kumar Verma 🫡