Comparing Python Type Checkers: Typing Spec Conformance

Comparing Python Type Checkers: Typing Spec Conformance

113

by ocamoss

extr

Wow, quite surprising results. I have been working on a personal project with the astral stack (uv, ruff, ty) that's using extremely strict lint/type checking settings, you could call it an experiment in setting up a python codebase to work well with AI. I was not aware that ty's gaps were significant. I just tried with zuban + pyright. Both catch a half dozen issues that ty is ignoring. Zuban has one FP and one FN, pyright is 100% correct.

Looks like I will be converting to pyright. No disrespect to the astral team, I think they have been pretty careful to note that ty is still in early days. I'm sure I will return to it at some point - uv and ruff are excellent.

roflcopter69

22h

This is the way. For now, pyright it's also 100% pyright for me. I can recommend turning on reportMatchNotExhaustive if you're into Python's match statements but would love the exhaustiveness check you get in Rust. Eric Traut has done a marvellous job working on pyright, what a legend!

But don't get me wrong, I made an entry in my calendar to remind me of checking out ty in half a year. I'm quite optimistic they will get there.

athoscouto

For big codebases pyright can be pretty slow and memory hungry. Even though ty is still a WIP, I'm adopting it at work because of how fast it is and some other goodies (e.g. https://docs.astral.sh/ty/features/type-system/#intersection...)

p1necone

20h

Say what you will about Microsoft, but their programming language people consistently seem to make very solid decisions.

zdimension

19h

Microsoft started as a programming language company (MS-BASIC) and they never stopped delivering serious quality software there. VB (classic), for all its flaws, was an amazing RAD dev product. .NET, especially since the move to open-source, is a great platform to work with. C# and TS are very well-designed languages.

Though they still haven't managed to produce a UI toolkit that is both reliable, fast, and easy to use.

zeratax

I assume this is pretty rare, but ty sometimes finds real issues that are actually allowed by the spec, like:

  def foo(a: float) -> str:
    return a.hex()

  foo(false)

is correct according to PEP 484 (when an argument is annotated as having type float, an argument of type int is acceptable) but this will lead to a runtime error. mypy sees no type error here, but ty does.

persedes

Article is a nice write up of https://htmlpreview.github.io/?https://github.com/python/typ...

(glad they include ty now)

martinky24

I've been using ty on some previously untyped codebases at work. It does a good job of being fast and easy to use while catching many issues without being overly draconian.

My teammates who were writing untyped Python previously don't seem to mind it. It's a good addition to the ecosystem!

tfrancisl

And it makes it infinitely easier for them to get with the times and start typing their code!

rirze

I am worried about the false negatives/positive rate however. Hope it improves.

notatallshaw

My understand is Astral's focus for ty has been on making a good experience for common issues, whereas they plan for very high compliance but difficult or rare edge cases aren't are prioritized.

Compliance suite numbers are biased towards edge cases and not the common path because that's where a lot of the tests need to be added.

My advise is to see how each type checker runs against your own codebase and if the output/performance is something you are happy with.

dcreager

> My understand is Astral's focus for ty has been on making a good experience for common issues, whereas they plan for very high compliance but difficult or rare edge cases aren't are prioritized.

I would say that's true in terms of prioritization (there's a lot to do!), but not in terms of the final user experience that we are aiming for. We're not planning on punting on anything in the conformance suite, for instance.

JackSlateur

My favorite typing-related stuff is typeguard (https://pypi.org/project/typeguard/), with its pytest plugin

nas

13h

Just an FYI, for people looking at the low pass rates for mypy and ty and concluding they must not be very useful. These test suites are checking many odd corners of the typing spec.

For "normal" Python code, I find mypy does pretty good. Certainly I find it helpful, especially on a large code base and when working with other developers of various experience levels.

The reason I prefer pyrefly over mypy is mostly because of speed. Better accuracy is nice but speed it the killer feature. Given the quality of uv and ruff and the experience of the team working on ty, I'm quite confident it's going to be great in that respect as well.

ocamoss

> people looking at the low pass rates for mypy and ty and concluding they must not be very useful

Yeah, that would be the wrong takeaway from this blog. The point of the blog was to add context to what the conformance results mean and clarify their limitations, since I saw quite a few people sharing links to the tracker online w/o context.

bko

10h

I dont know, I've seen some truly simple things check out as ok in mypy

Example

def foo(bar: bool) -> bool:

  if bar:
 
    m = True
 
  return m

No error that m is defined conditionally? What's going on?

maleldil

ty and zuban also don't give an error. pyright and pyrefly do.

ddxv

I've used mypy forever and never even tried these others. Looking at them though it looks like it's worth trying out Zuban or Pyright? Is there a noticeable benefit when switching between different checkers?

rirze

If you care about correctness, unless you pick pyright, don't bother at the moment. If you're creating a new project and looking for a promise for better faster typing, then pick one of Zuban, Pyrefly, or ty.

x187463

Speed, especially in larger codebases.

winrid

22h

Mypy still best for Django

lastofus

22h

As a long time Django user that wants to start using typing, can you elaborate on why mypy is still the way to go?

infamia

16h

There's a really nice typing plug-in for mypy that's been around a long time: https://github.com/typeddjango/django-stubs

It is very disappointing that these new type checkers don't support plug-ins, so things like django-stubs aren't possible. That means you're stuck with whatever is delivered with these new type checkers. It must be really difficult since none of them support plug-ins. Some of these newer type checkers promise support for Django, but you're stuck with what they (will) have on offer. Also, you'll likely want typing for other libs you might use.

ocamoss

16h

Pyrefly's Django support is documented here: https://pyrefly.org/en/docs/django/

I believe Zuban also has some form of Django support, but I'm unable to locate the docs

vesselapi

21h

[dead]

Scene_Cast2

Are there any good static (i.e. not runtime) type checkers for arrays and tensors? E.g. "16x64x256 fp16" in numpy, pytorch, jax, cupy, or whatever framework. Would be pretty useful for ML work.

ocamoss

We're working on statically checking Jaxtyping annotations in Pyrefly, but it's incomplete and not ready to use yet :)

ainch

19h

This would be an insta-switch feature for me! Jaxtyping is a great idea, but the runtime-only aspect kills it for me - I just resort to shape assertions + comments, but it's a pretty poor solution.

A follow-up question: Google's old `tensor_annotations` library (RIP) could statically analyse operations - eg. `reduce_sum(Tensor[Time, Batch], axis=0) -> Tensor[Batch]`. I guess that wouldn't come with static analysis for jaxtyping?

etbebl

17h

Check out optype (specifically the optype.numpy namespace). If you use scipy, scipy-stubs is compatible and the developer of both is very active and responsive. There's also a new standalone stubs library for numpy called numtype, but it's still in alpha.

ainch

19h

Jaxtyping is the best option currently - despite the name it also works for Torch and other libs. That said, I think it still leaves a lot to be desired. It's runtime-only, so unless you wire it into a typechecker it's only a hint. And, for me, the hints aren't parsed by Intellisense, so you don't see shape hints when calling a function - only when directly reading the function definition.

Personally, I also think the syntax is a little verbose: for a generic shape hint you need something like `Shaped[Array, "m n"]`. But 95% of the time I only really care about the shape "m n". It doesn't sound like much, but I recently tried hinting a codebase with jaxtyping and gave up because it was adding so much visual clutter, without clear benefits.

Comment was deleted :(

dcreager

There have been some early proposals to add something like that, but none of them have made it very far yet. As you might imagine, it's a hard problem!

westurner

- /?hnlog pycontract icontract https://westurner.github.io/hnlog/ :

From https://news.ycombinator.com/item?id=14246095 (2017) :

> PyContracts supports runtime type-checking and value constraints/assertions (as @contract decorators, annotations, and docstrings).

> Unfortunately, there's yet no unifying syntax between PyContracts and the newer python type annotations which MyPy checks at compile-type.

Or beartype.

Pycontracts has: https://andreacensi.github.io/contracts/ :

  @contract
  def my_function(a : 'int,>0', b : 'list[N],N>0') -> 'list[N]':
  
  @contract(image='array[HxWx3](uint8),H>10,W>10')
  def recolor(image):

For icontract, there's icontract-hyothesis.

parquery/icontract: https://github.com/Parquery/icontract :

> There exist a couple of contract libraries. However, at the time of this writing (September 2018), they all required the programmer either to learn a new syntax (PyContracts) or to write redundant condition descriptions ( e.g., contracts, covenant, deal, dpcontracts, pyadbc and pcd).

  @icontract.require(lambda x: x > 3, "x must not be small")
  def some_func(x: int, y: int = 5) -> None:

icontract with numpy array types:

  @icontract.require(lambda arr: isinstance(arr, np.ndarray))
  @icontract.require(lambda arr: arr.shape == (3, 3))
  @icontract.require(lambda arr: np.all(arr >= 0), "All elements must be non-negative")
  def process_matrix(arr: np.ndarray):
      return np.sum(arr)

  invalid_matrix = np.array([[1, -2, 3], [4, 5, 6], [7, 8, 9]])
  process_matrix(invalid_matrix)
  # Raises icontract.ViolationError

westurner

14h

Parquery/icontract: https://github.com/Parquery/icontract

mristin/icontract-hypothesis: https://github.com/mristin/icontract-hypothesis :

> The result is a powerful combination that allows you to automatically test your code. Instead of writing manually the Hypothesis search strategies for a function, icontract-hypothesis infers them based on the function's precondition. This makes automatic testing as effortless as it goes.

pschanely/CrossHair: An analysis tool for Python that blurs the line between testing and type systems https://github.com/pschanely/CrossHair :

> If you have a function with type annotations and add a contract in a supported syntax, CrossHair will attempt to find counterexamples for you: [gif]

> CrossHair works by repeatedly calling your functions with symbolic inputs. It uses an SMT solver (a kind of theorem prover) to explore viable execution paths and find counterexamples for you

refactor_master

18h

How does Zuban manage to be developed by what appears to be a single person without megacorp backing, yet be mere inches behind pyright at this stage?

pgwalsh

Using VSCodium I was having issues with python type checkers for quite a while. I did the basedpyright thing for a while but that was painful. It's a bit too based for me, and I'm not sure i'd call it based. Right now I have uv, ruff, and ty and I'm happy with it. It's super easy to update and super fast. I didn't realize the coverage wasn't as good as some others but I still like it. I may have to try pyrefly. Never heard of it until this post, so thank you.

Neywiny

18h

This is great and I'll try out pyright ASAP on my current codebase. The people who wrote it evidently didn't have any type checking running (despite I think 3+ linters??) so it's a nightmare of

> "well the checker accurately reports it will be type X in an error case not Y"

> "but we never get type X"

> "Then we don't have good enough coverage"

It's so easy in vscode, but it isn't on by default like the c/c++ one I guess because too much legacy code would cause infinite errors. And the age old problem of .pyi files lying about types.

IshKebab

Interesting. This is the first I've heard of Zuban.

The fact that Mypy fails so badly matches my experience. It would be interesting to see exactly where Pyright "fails". It's been so reliable to me I wouldn't be 100% surprised if these are deliberate deviations from the spec, where it is dumb.

Pay08

I still can't get over the utter idiocy in Python's type hints being decorative. In what world does x: int = "thing" not give someone in the standardisation process pause?

dcreager

Can you elaborate what you mean by decorative?

If you run a type checker like ty or pyright they're not decorative — you'll get clear diagnostics for that particular example [1], and any other type errors you might have. You can set up CI so that e.g. blocks PRs from being merged, just like any other test failure.

If you mean types not being checked at runtime, the consensus is that most users don't want to pay the cost of the checks every time the program is run. It's more cost-effective to do those checks at development/test/CI time using a type checker, as described above. But if you _do_ want that, you can opt in to that using something like beartype [2].

[1] https://play.ty.dev/905db656-e271-4a3a-b27d-18a4dd45f5da

[2] https://github.com/beartype/beartype/

Pay08

22h

Exactly my point. If I need to run 300 external tools for a language feature to be worth a damn, why is it a language feature?

wiseowise

22h

‘uvx ty check’ or ‘uvx pyrefly check’. That’s hardly 300 external tools.

Spivak

In C-ish languages the statement

    int x = "thing"

is perfectly valid. It means reserve a spot for a 32 bit int and then shove the pointer to the string "thing" at the address of x. It will do the wrong thing and also overflow memory but you could generate code for it. The type checker is what stops you. It's the same in Python, if you make type checking a build breaker then the annotations mean something. Types aren't checked at runtime but C doesn't check them either.

lefra

In C, int may be as small as 16 bits You may get 32 bits (or more) but it's not guaranteed. I don't see how you get a memory overflow though?

I'd be surprised if a compiler with -Wall -Werror accepts to compile this.

Trying to cast back the int to a char* might work if the pointers are the same size as int on the target platform, but it's actually Undefined Behaviour IIRC.

Pay08

22h

I guess an overflow would be possible if the size of a point and int differs.

Pay08

22h

It's valid in C, due to semantics around pointers. Try that in Java and you'll quickly find that it's not valid in "C-ish languages". C absolutely checks types, it's just weakly typed. Python doesn't check types at all, which I wouldn't have a problem with, if the language didn't have type annotations that sure look like they'll do something.

tialaramex

21h

It won't "overflow memory".

This says there will be an immutable array of six bytes, with the ASCII letters for "thing" in the first five and then the sixth is zero, this array can be coerced to the pointer type char* (a pointer to bytes) and then (though a modern C compiler will tell you this is a terrible idea) coerced to the signed integer type int.

The six byte array will end up in the "read only data" section of the executable, it doesn't "overflow memory" and isn't stored in the x. Even if you gave x a more sensible type "char*" that word "thing" isn't somehow stored in your variable, it's a pointer.

So, this isn't the same at all and you don't understand C as well as you thought you did.

Edited: fix escaping bold markers

Pay08

13h

I'm fairly certain that the C standard doesn't specify that string literals should be placed into .rodata, just that changing mutating them is UB.

tialaramex

11h

That's true, the systems where C was created do not have the relevant features, and I would expect they can't even "protect" that text so that although it's UB it would have worked fifty years ago to attempt the mutation whereas today that will segfault on a Unix.

Spivak

18h

I was talking about the int being 32 bits and the pointer being 64 bits but go off. If you did a naive codegen of this without type checking where the compiler just said "yes ma'am blindly copying the value to &x" then you would clobber adjacent memory. That's the point I'm making, you rely on the type checker to make the types actually mean things and give you safety guarantees.

It feels stronger is languages where you can't even produce a running program if type checking fails but it's conceptually the same.

tialaramex

11h

Python does have strong types, it's just that it's dynamically typed - the variables don't have assigned types in Python itself (hence type annotations and third party type checking). C claims to have strong types but it is weakly checked and full of unwise coercions - however it is statically typed and so variables have types.

If you want to see a language which does not have types you want the predecessor of C, B.

Imagining into existence a variant of C where assignment causes arbitrary memory overwrites isn't about type checking, that's not a "naive codegen" it's nonsense. If that was your point then you didn't do a good job of communicating it and it's still wrong.

badlibrarian

It's a community that delayed progress for a decade while they waited for everyone to put parenthesis on the print statement. Give 'em enough time and they'll figure out best practices.

Daishiman

It's the complete opposite. The objective of type hints is that they're optional precisely because type hints narrow the functionality of the language. And evidenced by the fact that different type checks have different heuristics for determining what is a valid typed program and what isn't, it seems that the decision is correct.

No type system will allow for the dynamism that Python supports. It's not a question of how you annotate types, it's about how you resolve types.

hrmtst93837

Optional on paper, sure. Once you publish shared libs or keep a nontrivial repo usable across teams, type hints stop feeling optional fast, because the minute mypy, pyright, and Pyre disagree on metaprogramming or runtime patching you get three incompatible stories about the same program and a pile of contraditions instead of signal. Python can stay dynamic, yet this setup mostly buys busywork for CI and false confidence for humans.

Pay08

22h

Nobody is saying they are mandatory, and I'm actually a big fan of gradual typing. My point is that they do nothing.

However, type hints reducing the functionality of the language isn't true either.

Crafted by Rajat

Source Code

hckrnws

Comparing Python Type Checkers: Typing Spec Conformance