Parse, Don't Validate and Type-Driven Design in Rust

Parse, Don't Validate and Type-Driven Design in Rust

246

by todsacerdoti

hutao

Note that the division-by-zero example used in this article is not the best example to demonstrate "Parse, Don't Validate," because it relies on encapsulation. The principle of "Parse, Don't Validate" is best embodied by functions that transform untrusted data into some data type which is correct by construction.

Alexis King, the author of the original "Parse, Don't Validate" article, also published a follow-up, "Names are not type safety" [0] clarifying that the "newtype" pattern (such as hiding a nonzero integer in a wrapper type) provide weaker guarantees than correctness by construction. Her original "Parse, Don't Validate" article also includes the following caveat:

> Use abstract datatypes to make validators “look like” parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to “fake” a parser from a validator.

So, an abstract data type that protects its inner data is really a "validator" that tries to resemble a "parser" in cases where the type system itself cannot encode the invariant.

The article's second example, the non-empty vec, is a better example, because it encodes within the type system the invariant that one element must exist. The crux of Alexis King's article is that programs should be structured so that functions return data types designed to be correct by construction, akin to a parser transforming less-structured data into more-structured data.

[0] https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

Sharlin

Even the newtype-based "parse, don't validate" is tremendously useful in practice, though. The big thing is that if you have a bare string, you don't know "where it's been". It doesn't carry with it information whether it's already been validated. Even if a newtype can't provide you full correctness by construction, it's vastly easier to be convinced of the validity of an encapsulated value compared to a naked one.

For full-on parse-don't-validate, you essentially need a dependent type system. As a more light-weight partial solution, Rust has been prototyping pattern types, which are types constrained by patterns. For instance a range-restricted integer type could be simply spelled `i8 is 0..100`, or a nonempty slice as `[T] is [_, ..]`. Such a feature would certainly make correctness-by-construction easier in many cases.

The non-empty list implemented as a (T, Vec<T>) is, btw, a nice example of the clash between practicality and theoretical purity. It can't offer you a slice (consecutive view) of its elements without storing the first element twice (which requires immutability and that T: Clone, unlike normal Vec<T>), which makes it fairly useless as a vector. It's okay if you consider it just an abstract list with a more restricted interface.

rhdunn

12h

It's also useful to wrap/tag IDs in structured types. That makes it easier to avoid errors when there are multiple type parameters such as in the Microsoft graph API.

humkieufj

11h

[flagged]

spockz

12h

Coming from Haskell, I loved Agda 2 as a dependent type language. Is there any newer or more mainstream language that has added dependent types?

throw_await

10h

Typescript has something that can be used as dependent types, but it wasn't intended as a language feature, so the Syntax is not as ergonomic as Agda: https://www.hacklewayne.com/dependent-types-in-typescript-se...

codethief

That whole blog is one big and fascinating rabbit hole into type theory! Thanks for the link!

doctor_phil

12h

Idris is slightly more mainstream I would say, but not wildy so. If you like the Haskell interop then I'd recommend staying with Agda.

Scala 3 is much more mainstream and has path dependent types. I've only used Scala 2, and there the boilerplate for dependent types was frustrating imo, but I've heard its better in 3.

spockz

Ah yes Idris rings a bell. I’ll try that one again.

Scala 3 indeed is more mainstream but it seems also on the way out. At least here in corporate world it is replaced by Kotlin and Java 21+ for a large part.

Comment was deleted :(

CodeBit26

15h

I agree, 'correct by construction' is the ultimate goal here. Using types like NonZeroU32 is a great simple example, but the real power comes when you design your entire domain logic so that the compiler acts as your gatekeeper. It shifts the mental load from run-time debugging to design-time thinking.

rapnie

You can also search for "make invalid states impossible/unrepresentable" [0] to find more info on related practices. See "domain modeling made functional" [0] as a nice example

[0] https://geeklaunch.io/blog/make-invalid-states-unrepresentab...

[1] https://www.youtube.com/watch?v=2JB1_e5wZmU

hutao

The phrasing that I hear more often is "make illegal states unrepresentable"; both the submitted article and Alexis King's original article use this phrase. At least according to https://fsharpforfunandprofit.com/posts/designing-with-types..., it originates from Yaron Minsky (a programmer at Jane Street who is prominent in the OCaml community).

EDIT: Parent comment was edited to amend the "impossible/unrepresentable" wording

rapnie

Yes, sorry. I thought to add some resources to it, or it would be a too vague comment and found the better phrasing.

fspeech

Yes division is a poor example. It's a poor separation of concerns to try to wrap at this level without usage context. To see the point try to wrap overflows on arithmetic functions.

kibwen

> To see the point try to wrap overflows on arithmetic functions.

The invariant is less obvious, but you could still do this if you really wanted. The observation is that addition of two fixed-size nonnegative integers cannot overflow unless at least one is greater than half the range. So your `non_overflowing_addition` function would need take two inputs of type `NonNegativeLessThanHalfMaxValue`, where the constructor for the latter type enforces the invariant. Multiplication is similar, but with the square root of the range (and I suppose `NonNegativeLessThanSqrtMaxValue` could be a subtype of `NonNegativeLessThanHalfMaxValue` if you want to be fancy).

hutao

Note that addition also won't overflow if one addend is greater than half the range, but the other addend is still small enough (e.g. for the range -128 to 127, adding 65 + 12 will not overflow, even though 65 is greater than half of 127).

Your intention of having the restricted domain "NonNegativeLessThanHalfMaxValue" is probably so that both addends have the same domain. If you go down that route, perhaps you'll also want the closure property, meaning that the range should also belong to the same set. However, then you have to deal with the overflow problem all over again...

The real point is that when adding two N-bit integers, the range must be N+1 bits, because the "carry bit" is also part of the output. I think this is a scenario where "Parse, Don't Validate" can't easily help, because the validity of the addition is intrinsically a function of both inputs together.

dang

Recent and related: Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=46960392 - Feb 2026 (172 comments)

also:

Parse, Don’t Validate – Some C Safety Tips - https://news.ycombinator.com/item?id=44507405 - July 2025 (73 comments)

Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=41031585 - July 2024 (102 comments)

Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)

Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)

Parsix: Parse Don't Validate - https://news.ycombinator.com/item?id=27166162 - May 2021 (107 comments)

Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)

Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)

(p.s. these links are just to satisfy extra-curious readers - no criticism is intended! I add this because people sometimes assume otherwise)

hacker_homie

[flagged]

cadamsdotcom

Parsing over validation, and newtypes for everything, fall over when you don’t know the full range of possibilities that can occur in the wild.

It is a handy way to prevent divide by zero as in the article, or to have fun with lambda calculus by asking the type system if 3 + 4 == 8. You can reason about the full range of inputs. Same for file format parsing - making as many failure modes as possible fail as early as possible!

But be VERY wary of using them to represent business logic or state machines that allow only the transitions you believe can exist at this point in time. You just don’t know what wild things people will want to do in business logic, and if your software can’t handle those scenarios, people will just work around it and your stuff no longer matches reality.

jaggederest

You can go even further with this in other languages, with things like dependent typing - which can assert (among other interesting properties) that, for example, something like

    get_elem_at_index(array, index)

cannot ever have index outside the bounds of the array, but checked statically at compilation time - and this is the key, without knowing a priori what the length of array is.

"In Idris, a length-indexed vector is Vect n a (length n is in the type), and a valid index into length n is Fin n ('a natural number strictly less than n')."

Similar tricks work with division that might result in inf/-inf, to prevent them from typechecking, and more subtle implications in e.g. higher order types and functions

satvikpendem

Rust has some libraries that can do dependent typing too, based on macros. For example: https://youtube.com/watch?v=JtYyhXs4t6w

Which refers to https://docs.rs/anodized/latest/anodized/

hmry

14h

Very cool and practical, but specs aren't dependent typing. (I actually think specs are probably more useful than dependent types for most people)

Dependent typing requires:

- Generic types that can take runtime values as parameters, e.g. [u8; user_input]

- Functions where the type of one parameter depends on the runtime value of another parameter, e.g. fn f(len: usize, data: [u8; len])

- Structs/tuples where the type of one field depends on the runtime value of another field, e.g. struct Vec { len: usize, data: [u8; self.len] }

satvikpendem

Sadly I'm not sure Rust will ever get those sorts of features.

hmry

They've gone the Haskell route of adding a billion features to a non-dependent type system instead.

Not that I blame them, nobody has figured out practical dependent typing yet. (Idris is making good progress in the field though)

VorpalWay

How does that work? If the length of the array is read from stdin for example, it would be impossible to know it at compile time. Presumably this is limited somehow?

marcosdumay

If you check that the value is inside the range, and execute some different code if it's not, then congratulations, you now know at compile time that the number you will read from stdin is in the right range.

mdm12

One option is dependent pairs, where one value of the pair (in this example) would be the length of the array and the other value is a type which depends on that same value (such as Vector n T instead of List T).

Type-Driven Development with Idris[1] is a great introduction for dependently typed languages and covers methods such as these if you're interested (and Edwin Brady is a great teacher).

[1] https://www.manning.com/books/type-driven-development-with-i...

dernett

Not sure about Idris, but in Lean `Fin n` is a struct that contains a value `i` and a proof that `i < n`. You can read in the value `n` from stdin and then you can do `if h : i < n` to have a compile-time proof `h` that you can use to construct a `Fin n` instance.

smj-edison

21h

I've heard this can be a bit of a pain in practice, is that true? I can imagine it could slow me down to construct a proof of an invariant every time I call a function (if I understand correctly).

jaggederest

17h

I haven't worked significantly with lean but I'm toying with my own dependently typed language. generally you only have to construct a proof once, much like a type or function, and then you reuse it. Also, depending on the language there are rewriting semantics ("elaboration") that let you do mathematical transformations to make two statements equivalent and then reuse the standardized proof.

ratorx

It doesn’t have to be a compile time constant. An alternative is to prove that when you are calling the function the index is always less than the size of the vector (a dynamic constraint). You may be able to assert this by having a separate function on the vector that returns a constrained value (eg. n < v.len()).

jaggederest

If the length is read from outside the program it's an IO operation, not a static variable, but there are generally runtime checks in addition to the type system. Usually you solve this as in the article, with a constructor that checks it - so you'd have something like "Invalid option: length = 5 must be within 0-4" when you tried to create the Fin n from the passed in value

rq1

23h

Imagine you read a value from stdin and parse it as:

Maybe Int

So your program splits into two branches:

1. Nothing branch: you failed to obtain an Int.

There is no integer to use as an index, so you can’t even attempt a safe lookup into something like Vect n a.

2. Just i branch: you do have an Int called i.

But an Int is not automatically a valid index for Vect n a, because vectors are indexed by Fin n (a proof carrying “bounded natural”).

So inside the Just i branch, you refine further:

3. Try to turn the runtime integer i into a value of type Fin n.

There are two typical shapes of this step:

* Checked conversion returning Maybe (Fin n)

If the integer is in range, you get Just (fin : Fin n). Otherwise Nothing.

Checked conversion returning evidence (proof) that it’s in range

For example: produce k : Nat plus a proof like k < n (or LTE (S k) n), and then you can construct Fin n from that evidence.

(But it’s the same basically, you end up with a “Maybe LTE…”

Now if you also have a vector: xs : Vect n a

… the n in Fin n and the n in Vect n a are the same n (that’s what “unifies” means here: the types line up), so you can do: index fin xs : a

And crucially:

there is no branch in which you can call index without having constructed the Fin n first, so out-of-bounds access is unrepresentable (it’s not “checked later”, it’s “cannot be expressed”).

And within _that_ branch of the program, you have a proof of Fin n.

Said differently: you don’t get “compile-time knowledge of i”; you get a compile-time guarantee that whatever value you ended up with satisfies a predicate.

Concretely: you run a runtime check i < n. _ONCE_

If it fails, you’re in a branch where you do not have Fin n.

If it succeeds, you construct fin : Fin n at runtime (it’s a value, you can’t get around that), but its type encodes the invariant “in bounds”/check was done somewhere in this branch.

esafak

I wish dependent types were more common :(

strawhatguy

The alternative is one type, with many functions that can operate on that type.

Like how clojure basically uses maps everywhere and the whole standard library allows you to manipulate them in various ways.

The main problem with the many type approach is several same it worse similar types, all incompatible.

fiddlerwoaroof

Yeah, there's something of a tension between the Perlis quote "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures" and Parse, don't validate.

The way I've thought about it, though, is that it's possible to design a program well either by encoding your important invariants in your types or in your functions (especially simple functions). In dynamically typed languages like Clojure, my experience is that there's a set of design practices that have a lot of the same effects as "Parse, Don't Validate" without statically enforced types. And, ultimately, it's a question of mindset which style you prefer.

strawhatguy

There's probably a case for both. Core logic might benefit from hard types deep in the bowels of unchanging engine.

The real world often changes though, and more often than not the code has to adapt, regardless of how elegant are systems are designed.

fiddlerwoaroof

22h

Coalton ( https://coalton-lang.github.io ) is the sort of thing I like: a Haskell-style language hosted inside a very dynamic one with good interop.

strawhatguy

19h

Yes it's quite the blend!

packetlost

I don't really get why this is getting flagged, I've found this to be true but more of a trade off than a pure benefit. It also is sort of besides the point: you always need to parse inputs from external, usually untrusted, sources.

doublesocket

Agree with this. Mismatching types are generally an indicator of an underlying issue with the code, not the language itself. These are areas AI can be helpful flagging potential problems.

Rygian

This sounds like the "stringly typed language" mockery of some languages. How is it actually different?

Comment was deleted :(

Kinrany

24h

It's not an alternative.

Start with a more dynamic type, do stuff that doesn't care about the shape, parse into a more precise type, do stuff that relies on the additional invariants, drop back into the more dynamic type again.

slopinthebag

24h

I find a balance is important. You can do nominal typing in a structural type system with branding, and you can kinda do structural typing in a nominal type system, but it's not as ergonomic. But you should probably end up doing a mix of both.

marcosdumay

There are more than two alternatives, since functions can operate in more than one type.

noitpmeder

This reminds me a bit of a recent publication by Stroustrup about using concepts... in C++ to validate integer conversions automatically where necessary.

https://www.stroustrup.com/Concept-based-GP.pdf

  {
     Number<unsigned int> ii = 0;
     Number<char> cc = '0';
     ii = 2; // OK
     ii = -2; // throws
     cc = i; // OK if i is within cc’s range
     cc = -17; // OK if char is signed; otherwise throws
     cc = 1234; // throws if a char is 8 bits
  }

unixpickle

23h

The `try_roots` example here is actually a _counterexample_ to the author's main argument. They explicitly ignore the "negative discriminant" case. What happens if we consider it?

If we take their "parse" approach, then the types of the arguments a, b, and c have to somehow encode the constraint `b^2 - 4ac >= 0`. This would be a total mess--I can't think of any clean way to do this in Rust. It makes _much_ more sense to simply return an Option and do the validation within the function.

In general, I think validation is often the best way to solve the problem. The only counterexample, which the author fixates on in the post, is when one particular value is constrained in a clean, statically verifiable way. Most of the time, validation is used to check (possibly complex) interactions between multiple values, and "parsing" isn't at all convenient.

sbszllr

I was thinking a similar thing when reading the article. Often, the validity of the input depends on the interaction between some of them.

Sure, we can follow the advice of creating types that represent only valid states but then we end up with `fn(a: A, b: B, c: C) transformed into `fn(abc: ValidABC)`

MarcLore

10h

This pattern maps beautifully to API design too. Instead of validating a raw JSON request body and hoping you checked everything, you parse it into a well-typed struct upfront. Every downstream function then gets guaranteed-valid data without redundant checks. The Rust ecosystem makes this almost painless with serde + custom deserializers. I've seen codebases cut their error-handling surface area by 60% just by moving from validate-then-use to parse-at-the-boundary.

bbkane

I try to do this in Go as well, but the amount of pointers used to represent optional fields (more generally the lack of algebraic types) is making it less effective/ more verbose than I'd like

AxiomLab

15h

This exact philosophy is why I started treating UI design systems like compilers.

Instead of validating visual outputs after the fact (like linting CSS or manual design reviews), you parse the constraints upfront. If a layout component is strictly typed to only accept discrete grid multiples, an arbitrary 13px margin becomes a compile error, not a subjective design debate. It forces determinism.

Garlef

11h

curious: what kind of tooling would you use here?

ubixar

C# gets close to this with records + pattern matching, F# discriminated unions are even better for this with algebraic data types built right in. A Result<'T,'Error> makes invalid states unrepresentable without any ceremony. C# records/matching works for now, but native DUs will make it even nicer.

fph

The article quickly mentions implementing addition:

```

impl Add for NonZeroF32 { ... }

impl Add<f32> for NonZeroF32 { ... }

impl Add<NonZeroF32> for f32 { ... }

```

What type would it return though?

alfons_foobar

Would have to be F32, no? I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...

MaulingMonkey

> Would have to be F32, no?

Generally yes. `NonZeroU32::saturating_add(self, other: u32)` is able to return `NonZeroU32` though! ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... )

> I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...

`NonZeroU32::checked_add(self, other: u32)` basically does this, although I'll note it returns an `Option` instead of a `Result` ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... ), leaving you to `.map_err(...)` or otherwise handle the edge case to your heart's content. Niche, but occasionally what you want.

alfons_foobar

> `NonZeroU32::saturating_add(self, other: u32)` is able to return `NonZeroU32` though!

I was confused at first how that could work, but then I realized that of course, with _unsigned_ integers this works fine because you cannot add a negative number...

fph

You'd still have to check for overflow, I imagine.

And there are other gotchas, for instance it seems natural to assume that NonZeroF32 * NonZeroF32 can return a NonZeroF32, but 1e-25 * 1e-25 = 0 because of underflow.

WJW

10h

I imagine it would be something like Option<NonZeroF32>, since -2.0 + 2.0 would violate the constraints at runtime. This gets us the Option handling problem back.

I think the article would have been better with NonZeroPositiveF32 as the example type, since then addition would be safe.

Comment was deleted :(

barnacs

10h

Every time you introduce a type for a "value invariant" you lose compatibility and force others to make cumbersome type conversions.

To me, invalid values are best expressed with optional error returns along with the value that are part of the function signature. Types are best used to only encode information about the hierarchy of structures composed of primitive types. They help define and navigate the representation of composite things as opposed to just having dynamic nested maps of arbitrary strings.

mrkeen

> They help define and navigate the representation of composite things as opposed to just having dynamic nested maps of arbitrary strings.

What would you say to someone who thinks that nested maps of arbitrary strings have maximum compatibility, and using types forces others to make cumbersome type conversions?

barnacs

If the fields of a structure or the string keys of an untyped map don't match then you don't have compatibility either way. The same is not true for restricting the set of valid values.

edit: To put it differently: To possibly be compatible with the nested "Circle" map, you need to know it is supposed to have a "Radius" key that is supposed to be a float. Type definitions just make this explicit. But just because your "Radius" can't be 0, you shouldn't make it incompatible with everything else operating on floats in general.

the__alchemist

The examples in question propagate complexity throughout related code. I think this is a case I see frequently in Rust of using too many abstractions, and its associated complexities.

I would just (as a default; the situation varies)... validate prior to the division and handle as appropriate.

The analogous situation I encounter frequently is indexing, e.g. checking if the index is out of bounds. Similar idea; check; print or display an error, then fail that computation without crashing the program. Usually an indication of some bug, which can be tracked down. Or, if it's an array frequently indexed, use a (Canonical for Rust's core) `get` method on the whatever struct owns the array. It returns an Option.

I do think either the article's approach, or validating is better than runtime crashes! There are many patterns in programming. Using Types in this way is something I see a lot of in OSS rust, but it is not my cup of tea. Not heinous in this case, but I think not worth it.

This is the key to this article's philosophy, near the bottom:

> I love creating more types. Five million types for everyone please.

cmovq

Dividing a float by zero is usually perfectly valid. It has predictable outputs, and for some algorithms like collision detection this property is used to remove branches.

kibwen

The problem is that "division" in the context of computation refers to multiple different operations, so that we all end up talking past each other. Some people want a `division_where_zero_is_an_illegal_denominator`, some people want `division_but_return_zero_if_denominator_is_zero`, and I'm sure other people want weirder things. Honestly, I'm coming around to the idea that for any precise and reliable system, the fact that we only have a single "operator" (+ - * / etc) that we arbitrarily assign to one of the many possible operations that we want to do does more harm than good. Consider how many different ways there are just to add a number (panic on overflow, error value on overflow, saturate on overflow, wrap on overflow, return tuple with optional carry flag on overflow...).

woodruffw

I think “has predictable outputs” is less valuable than “has expected outputs” for most workloads. Dividing by zero almost always reflects an unintended state, so proceeding with the operation means compounding the error state.

(This isn’t to say it’s always wrong, but that having it be an error state by default seems very reasonable to me.)

Comment was deleted :(

sam0x17

btw the “quoth” crate makes it really really easy to implement scannerless parsing in rust for arbitrary syntax, use it on many of my projects

IshKebab

Interesting looking crate. You don't seem to have any examples at all though so I wouldn't say it makes it easy!

qsera

15h

>"Parse, Don't Validate,"

Ah, that pretentious little blog post that was taken up by many as gospel, for some mysterious reason...

This tells me that any idea, even if stupid (or obvious in this case) ones, can go mainstream, if you rhyme it right.

That blog post is like a "modern art" painting by some famous author that totally look a child has flingned some paint to a wall, but every person who looks at it tries so hard to find some meaning that every one end up finding "something"..

In fact the top comment right now express that their interpretation of blog post is different from the author's...

Amusing!

throw310822

10h

I only half understand this stuff, but all this encapsulation of values so that they are guaranteed to remain valid across manipulations... isn't it called Object Oriented Programming?

mrkeen

Was the original blog post wrong?

qsera

were you validating during parsing before?

Crafted by Rajat

Source Code

hckrnws

Parse, Don't Validate and Type-Driven Design in Rust