A good programming language doesn't make it easy to write correct programs. It makes it hard to write incorrect ones. (Jón Fairbairn)
The same holds for many related things: Libraries, frameworks, type systems, etc.
Resaon: SW is written only once, but read, debugged, extended, and refactored dozens of times.
openFile ∷ FilePath → IOMode → IO Handle
int open(const char *, int, ...);
# 2 long pages of prose:
# blah blah blah blah open blah blah blah file blah blah
# blah mode blah blah blah example blah blah blah blah
# blah note blah blah example blah blah buffering blah
Comments are a sign of our inability to express our thoughts in code. (Robert C. Martin a.k.a. Uncle Bob)
So let's express more things in code! Use types instead of naming conventions or comments, which are incorrect and/or outdated most of the time, anyway.
When people say “but most business logic bugs aren’t type errors,” I just want to show them how to make bugs into type errors. (Matt Parsons)
In other words, avoid code like:
def foo(id: int, what: str, why_not: Any, flag: bool):
# ...
Code smell: Primitive obsession
While most programmers are concerned with getting more of their code to compile, we type-level programmers are trying our best to prevent code from compiling. (Sandy Maguire)
Always remember: Types are there to help you, not to annoy you. When you are “fighting the type checker”, you are actually fighting your very own code.
Simple, but mostly sufficient view:
Much more complicated stuff is possible, e.g.:
Some Python examples:
_HOSTS: dict[str, Any] = {}
def connect(user: UserType | None, # old: Optional[UserType]
force_authuser: UserId | None) -> None:
# ...
Some Haskell examples:
pi ∷ Double
pi = 3.1415926535
answer = (41 ∷ Integer) + 1
magnitude ∷ RealFloat a ⇒ Complex a → a
magnitude c = ...
Some Rust examples:
enum IpAddrKind { V4, V6 }
struct IpAddr { kind: IpAddrKind, address: String }
fn valid(addr: &IpAddr) -> bool { ... }
valid(IpAddr{ kind: IpAddrKind::V4,
address: "127.0.0.1".to_string() });
Some TypeScript examples:
type IpAddrKind = 'V4' | 'V6';
type IpAddr = { kind: IpAddrKind; address: string };
function valid (addr: IpAddr): boolean { ... }
valid({ kind: 'V4', address: '127.0.0.1' })
Slightly frustrating theoretical result:
Rice's Theorem: All non-trivial, semantic properties of programs are undecidable.
But approximations are good enough in practice.
Nevertheless: There are always programs which don't go wrong, but are rejected by a type checker.
The alternatives are worse: Non-terminating checker, lying checker, ...
In Python:
class Bool(Enum):
FALSE = 0; TRUE = 1
Color = Enum('Color', 'RED GREEN BLUE')
BoolOrColor = Bool | Color # not really a new type
AgeOrShoeSize = int | int # not really tagged
# Before Python 3.10, use Union[X, Y] instead of X | Y
In Haskell:
data Bool = False | True
data Color = Red | Green | Blue
data BoolOrColor = MkBool Bool | MkColor Color
data AgeOrShoeSize = Age Int | ShoeSize Int
data Maybe a = Nothing | Just a -- type constructor with args
data Either a b = Left a | Right b -- *the* sum type
foo = bar False (MkBool True) 4711 (Just 42) -- example
In Rust:
enum Bool { Yay, Nay }
enum Color { Red, Green, Blue }
enum BoolOrColor { Bool(Bool), Color(Color) }
enum AgeOrShoeSize { Age(u16), ShoeSize(u16) }
let boc = BoolOrColor::Bool(Bool::Yay);
let aos = AgeOrShoeSize::Age(32);
In TypeScript:
type Bool = 'yay' | 'nay'; // not really a new type
enum Color { Red, Green, Blue }
type BoolOrColor = Bool | Color; // not really a new type
type AgeOrShoeSize = number | number;// not really tagged
let boc: BoolOrColor = 'yay'; // string w/o annotation
boc = Color.Green
const aos: AgeOrShoeSize = 32
A more type-safe variant of AgeOrShoeSize:
class AgeOrShoeSize:
pass
class Age(AgeOrShoeSize):
def __init__(self, a: int):
self.a = a # We can even add validation here...
class ShoeSize(AgeOrShoeSize):
def __init__(self, s: int):
self.s = s # ... and here!
More verbose than int|int, but now we can distinguish 42, Age(42), and ShoeSize(42)!
Scary words for a simple concept:
In Python:
# Not really a new type, just our old (,,) friend
Coord3D = tuple[float, float, float]
# At least the fields have a name now...
NameAndAge = namedtuple('NameAndAge', 'name age')
# Look, mom: Fields have a name *and* a type!
class BoolAndColor(NamedTuple):
b: bool; c: Color
# A full-blown class, could even validate arguments
class Pair: # not the full story, more on this later...
def __init__(self, a, b):
self.a = a; self.b = b
Note: Types created by namedtuple and NamedTuple are subclasses of tuple, leading to some obscurities:
bac = BoolAndColor(True, Color.RED)
b = bac[0] # This one looks easy, but...
x = bac[funny(42)] # ... what type does x have?
for y in bac:
... # Even worse: y's type will change during looping!
Data classes: Mutable NamedTuples with defaults
@dataclass # lots of configuration options possible here
class Person: # not related to tuple, not indexable
age: int; name: str = "Baldrick"
p = Person(age=75, name="Tony") # with kwargs
p.age = 42 # mutable by default
p = Person(11) # positional + default
In Haskell:
data Coord3D = Coord3D Float Float Float
data BoolAndColor = BoolAndColor Bool Color
data Pair a b = Pair a b -- already built-in as (a, b)
-- usage:
f = g (Coord3D 1 2 3) (BoolAndColor True Red) (Pair 1 'a')
In Rust:
struct RGB { r: u8, g: u8, b: u8 }
let color = RGB { r: 11, g: 22, b: 33 };
struct Coord3D(f32, f32, f32); // tuple struct, new type
let point = Coord3D(1.0, 2.0, 3.0); // e.g. point.0 is OK
struct WTF { what: String, why_not: Vec<f64>, flag: bool }
Excursion: Why does Rust have enum and struct?
enum Foo { Bar { x: i32, y: f32 } } // just 1 variant
let b = Foo::Bar { x: 42, y: 3.1 }; // effectively a struct
It's just syntactic sugar for a common case:
struct Bar { x: i32, y: f32 }
let c = Bar { x: 42, y: 3.1 };
Strictly speaking, struct is not necessary, but it's handy and expresses the intention more clearly.
In TypeScript:
type RGB = { r: number; g: number; b: number };
const rgb: RGB = { r: 1.0, g: 2.0, b: 3.0 }
type NameAngAge = [string, number] // tuple type
const lk: NameAngAge = ['Lemmy', 70] // e.g. lk[0] is OK
But something is different in TypeScript...
type RG = { r: number; g: number };
const rg: RG = rgb
More about this later.
If a product type \(A\) consists of types \(B_1, B_2, \ldots\), its cardinality is \(|A| = \prod_{i=1}^n |B_i|\), hence “product” type!
For our previous example: \(|BoolAndColor| = |Bool| \times |Color| = 2 \times 3 = 6\)
We have already seen sum types and product types, what might be missing? sum ⇒ product ⇒ ...?
Repeating addition gives you a product, and repeating multiplication gives you exponentiation.
But what is an exponential type? A very scary name, but you have seen them in action before: Functions!
But what is the connection with exponentiation?
Consider total functions of the type:
def f(b: bool) -> Color:
# ...
How many of those can you write?
You can map False to any of the 3 colors, the same holds for True, so there are \(3^2 = 9\) possible functions.
In general: \(|a \to b| = |b|^{|a|}\), hence “exponential” type!
In general, an algebraic data type is a sum of products.
In Haskell:
data Shape = Circle { center ∷ Coord2D, radius ∷ Float }
| Rectangle Coord2D Coord2D -- without field names
| Polygon { points ∷ [Coord2D] }
| ParametricCurve { f ∷ Float → Coord2D }
In Rust, enum is your swiss army knife:
enum Shape {
Circle { center: Coord2D, radius: f32 },
Rectangle(Coord2D, Coord2D), // without field names
Polygon { points: Vec<Coord2D> },
ParametricCurve { f: fn(f32) -> Coord2D },
}
In TypeScript you use a shared tag field and a union:
type Loading = { state: 'loading' };
type Failure = { state: 'failure'; code: number };
type Success = { state: 'success'; response: string };
type State = Loading | Failure | Success;
In OO languages without union types, you have to revert to subclassing:
class Shape:
pass
class Circle(Shape):
def __init__(self, center: Coord2D, radius: float) -> None:
# ...
How can you implement ADTs in Python?
class Circle: ...
class Rectangle: ...
Shape = Circle | Rectangle # 👍
What = str | None
WhyNot = str | What | list[str] | list[What] # 🙀
Which alternative is "x"? What is ["y"]? Or [] 🤷
Constructing values for algebraic data types is easy: Call one of their constructors. 😲 But how can we get back their parts?
An elegant way is pattern matching, which mirrors the construction of values.
area ∷ Shape → Float
area (Circle _ r) = pi * r * r
area (Rectangle (Coord2D x1 y1) (Coord2D x2 y2)) =
abs (x1 - x2) * abs (y1 - y2)
...
Can be checked for exhaustiveness and redundancies.
In OO languages you can use abstract methods:
class Shape(abc.ABC):
@abc.abstractmethod
def area(self) -> float: ...
class Circle(Shape):
def area(self) -> float:
return math.pi * self.radius * self.radius
class Rectangle(Shape):
def area(self) -> float:
return (abs(self.point1.x - self.point2.x) *
abs(self.point1.y - self.point2.y))
Pro: Keeps e.g. Circle stuff together.
Con: Tears area stuff apart.
Another OO option: Type dispatch by hand
def area(s: Shape) -> float:
# s has type Shape here
if isinstance(s, Circle): # Must be visible to mypy!
# narrowing: s has type Circle in this branch
return math.pi * s.radius * s.radius
if isinstance(s, Rectangle):
# narrowing: s has type Rectangle in this branch
return (abs(s.point1.x - s.point2.x) *
abs(s.point1.y - s.point2.y))
# ... aaaand s has type Shape again! (for unions: rest)
raise TypeError(f"D'oh! 💩 {s}")
Pros and cons are switched: Now the area stuff is kept together, but e.g. the Circle stuff is torn apart.
Pattern matching is possible in Python ≥ 3.10, too:
def area(s: Shape) -> float:
match s:
case Circle(radius=r):
return math.pi * r * r
case Rectangle(point1=Coord2D(x=x1, y=y1),
point2=Coord2D(x=x2, y=y2)):
return abs(x1 - x2) * abs(y1 - y2)
case _:
raise TypeError(f"D'oh! 💩 {s}")
This is much easier to read and less error-prone than if cascades. A lot more fancy stuff is possible (guards, OR patterns, sequence patterns, ...) see PEP 634.
Pattern matching in Rust:
fn area(s: &Shape) -> f32 {
match s {
// struct matching of an alternative
Shape::Circle { radius, .. } => PI * radius * radius,
// tuple struct with nested struct, giving names
Shape::Rectangle(Coord2D { x: x1, y: y1 },
Coord2D { x: x2, y: y2 }) => {
(x1 - x2).abs() * (y1 - y2).abs()
}
... // rest of the alternatives
}
}
Handy if/let combination for a single case:
if let Shape::Circle { center, radius } = s { ... }
No direct support for pattern matching in TypeScript, so you have to do the dispatch by hand via the tag:
function describe (s: State): string {
switch (s.state) {
case 'loading':
return 'still loading' // s: Loading
case 'failure':
return `failed with ${s.code}` // s: Failure
case 'success':
return `responded with ${s.response}` // s: Success
}
}
Note the type narrowing going on here. Déjà-vu? 😉
For a variation of the first technique (via abstract methods), you can use the visitor pattern: Define the dispatching once and for all, so you can easily hook in functionality later.
From Scott Meyer's “My Most Important C++ Aha! Moments...Ever”:
Visitor lets you define a new operation without changing the classes of the elements on which it operates.
Type dispatch by hand in Python has some problems:
Solution since Python ≥ 3.10: Type guards, which are predicates which return typing information instead of a plain bool.
Same idea available in TypeScript under the name “type predicates”.
Python example, no cast() or Any needed:
def is_str_list(x: object) -> TypeGuard[list[str]]:
return (isinstance(x, list) and
all(isinstance(e, str) for e in x))
def hurzify(x: object) -> None:
if is_str_list(x):
x.append("hurz") # x has type list[str] here...
# ... and just object again after the if.
TypeScript example:
function isFish(pet: Fish | Bird): pet is Fish {
return (pet as Fish).swim !== undefined;
}
XX XX XX say: Unary functions are all we need!
Idea: Instead of a n-ary function, use a unary one returning yet another (n-1)-ary function.
In Python:
f = lambda x, y: x + y # f(1, 2) ⇒ 3
g = lambda x: lambda y: x + y # g(1)(2) ⇒ 3
In Haskell:
f ∷ (Int, Int) → Int
f = λ(x, y) → x + y -- f (1, 2) ⇒ 3
g ∷ Int → Int → Int -- same as: Int → (Int → Int)
g = λx → λy → x + y -- g 1 2 ⇒ 3
Can we automatically convert f into g and vice versa?
Yes, we can! In Python:
def curry(f):
return lambda x: lambda y: f(x, y)
def uncurry(g):
return lambda x, y: g(x)(y)
# curry(f)(1)(2) ⇒ 3
# uncurry(g)(1,2) ⇒ 3
i.e. the “uncurried” function f and the “curried” function g are isomorphic (“of equal shape”), they differ only superficially. Notation: f ≅ g
All functions are curried in Haskell, so you can e.g. shift the equals sign around:
f1 = λ(x, y) → x + y
f2 (x, y) = x + y
g1 = λx → λy → x + y
g2 x = λy → x + y
g3 x y = x + y
There is no need to add trailing parameters just for passing them on (η-conversion):
inc1 y = g 1 y
inc2 = g 1
inc2 is an example of partial application, i.e. giving a function only some of its arguments.
Very useful technique:
lookup ∷ [(String, Int)] -> String → Maybe Int
lookup pairs key = ...
socialIds ∷ [(String, Int)]
socialIds = [("John Doe", 1234), ("Jane Doe", 5678)]
getSocialId ∷ String → Maybe Int
getSocialId = lookup socialIds
Because Python doesn't offer currying out of the box, you have to write the glue code for yourself or use the functools module (much better!).
from functools import partial
sub = lambda x, y: x - y
foo = partial(sub, 100) # foo(3) ⇒ 97
dec = partial(sub, y=1) # dec(3) ⇒ 2
Similar helper for methods (partialmethod) plus much more: Caching, dispatching, ... Have a look!
C++ has a general way to bind parameters via placeholders, partial application is just a special case.
#include <iostream>
#include <functional>
using namespace std::placeholders;
bool gt(int x, int y) { return x > y; }
int main() {
auto gt42 = std::bind(gt, _1, 42);
auto lt42 = std::bind(gt, 42, _1);
auto lt = std::bind(gt, _2, _1);
std::cout << gt42(3) << lt42(9) << lt(3, 4) << std::endl;
return 0;
}
Same alternative as everywhere: Good ol' lambdas...
We have already seen that curried and uncurried functions are effectively the same thing, with curry and uncurry converting between them.
Can we generalize this to other types s and t?
Yes, we can! We need to find 2 functions with:
to ∷ s → t
from ∷ t → s
id_s = from ∘ to -- must be identity on s
id_t = to ∘ from -- must be identity on t
If this holds, s and t are isomorphic (s ≅ t).
Show that the following types are isomorphic:
data AgeOrShoeSize = Age Int | ShoeSize Int
data Foo = Foo Bool Int -- effectively (Bool, Int)
to ∷ AgeOrShoeSize → Foo
to (Age a) = Foo False a
to (ShoeSize s) = Foo True s
from ∷ Foo → AgeOrShoeSize
from (Foo False a) = Age a
from (Foo True s) = ShoeSize s
So AgeOrShoeSize ≅ Foo. Which one do you prefer?
Fun fact: The type Int is not relevant, so we have just proved that a + a = 2 × a, but more on this later...
Show that the following type and a are isomorphic:
data Bar a = Bar (() → a)
to ∷ Bar a → a
to (Bar f) = f ()
from ∷ a → Bar a
from x = Bar (λ() → x)
So there is no distinction between a value and a (pure) program computing this value.
Another fun fact: We have just proved that a1 = a.
Often we want a short notation for our existing types.
“typedefs” and “type aliases” in C++ :
typedef int Age; // old-skool for: using Age = int;
using Names = std::vector<std::string>;
“Type aliases” in Python:
Permissions = dict[str, int]
“Type synonyms” in Haskell :
type ShowS = String → String
“Type aliases” in TypeScript:
type ID = number | string;
“Type aliases” in Rust :
type Point = (i32, i32);
Doesn't create new types, just a kind of macros. Continuing the Rust example above:
type AgeAndShoeSize = (i32, i32);
let p: Point = (12, 34);
let a: AgeAndShoeSize = p; // looks weird, but works
Often a type synonym is not enough, so we really need to create a new type, but still keep the overhead low.
Age = NewType('Age', int) # effectively a 'subclass' of int
# creates def Age(x: int): return x
def to_str(a: Age) -> str: ...
def from_str(s: str) -> Age: ...
x = to_str(42) # Error!
y = to_str(Age(42)) # OK
z = from_str('one') + 12 # OK, but questionable
print(type(Age(42))) # ⇒ <type 'int'>
Types constructed by NewType keep the underlying representation, they are just a marker for the type checker. Use at least this instead of type synonyms!
Haskell's newtype creates a real, brand-new type with the same representation:
newtype Age = Age Int
to_str ∷ Age → String
to_str a = ...
from_str ∷ String → Age
from_str s = ...
x = to_str 42 -- Error!
y = to_str (Age 42) -- OK
z = from_str 'one' + 12 -- Error!
If you want to lift some operations of the wrapped type (e.g. equality), you can use a deriving clause.
In Rust you can use a one-element tuple struct to emulate a data type renaming:
struct Age(i32);
let age = Age(42);
Accessing the wrapped value is possible in two ways:
let age_prim1: i32 = age.0; // tuple access, slightly dirty
let Age(age_prim2) = age; // pattern matching, nicer
TypeScript doesn't have a built-in mechanism, but emulations with “unique symbol” are possible.
In C++ overloading is resolved via the parameters and const-ness of a function, not via the return type.
struct Foo {
int bar(float x) { return 42; }
void bar(const char *c) { /* format hard disk >:-) */ }
const char *bar(float x) const { return "Hi!"; }
};
All overloads should better be related semantically somehow, otherwise things get really confusing.
Overload resolution gets quite “interesting” with implicit conversions and multiple parameters...
Traditional linkers don't really understand C++, so the compiler has to dumb down the names a bit. You can use “nm --syms” to inspect object files:
0000000000000000 T _ZN3Foo3barEf 0000000000000010 T _ZN3Foo3barEPKc 0000000000000020 T _ZNK3Foo3barEf
Adding the “--demangle” flag, you get:
0000000000000000 T Foo::bar(float) 0000000000000010 T Foo::bar(char const*) 0000000000000020 T Foo::bar(float) const
Look up the gory details in the Itanium C++ ABI! 😱
mypy offers a way to improve types of “overloads”:
@overload
def bar(x: float) -> int: ... # just for mypy, not runtime
@overload
def bar(x: str) -> None: ... # just for mypy, too
def bar(x: float | str) -> int | None:
if isinstance(x, float): return 42
if isinstance(x, str): return None
raise TypeError()
i: int = bar(3.14) # doesn't work without @overload
n: None = bar('Hurz!') # this neither
Avoid this: You can easily lie in the @overloads, and mypy will still be happy. Remember Rices's theorem!
Sometimes something less ad hoc is more preferable:
class Greeter(Protocol):
def hi(self) -> str: ...
class Gangsta:
def hi(self) -> str: return 'Yo wazzzup, Bro?'
class Bavarian: # NOTE: A Bavarian is not a Gangsta!
def hi(self) -> str: return 'Servus!'
def meet(g: Greeter) -> None: print(g.hi())
meet(Gangsta()) # OK
meet(Bavarian()) # OK
meet(42) # ERROR, ints can't say hi...
Tons of predefined protocols: Iterable[T], Sized, ...
Traits in Rust are basically the same idea:
trait Greeter { fn say_hi(&self) -> String; }
struct Gangsta;
impl Greeter for Gangsta {
fn hi(&self) -> String { "Yo wazzzup, Bro?".to_string() }
}
struct Bavarian;
impl Greeter for Bavarian {
fn hi(&self) -> String { "Servus!".to_string() }
}
fn meet(g: &impl Greeter) { println!("{}", g.say_hi()) }
meet(&Gangsta); // OK
meet(&Bavarian); // OK
meet(&42); // ERROR, trait bound not satified
Type classes originated from Haskell (1989):
class Greeter a where { hi :: a -> String }
data Gangsta = SchoollyD | IceT
instance Greeter Gangsta where { hi _ = "Yo wazzzup, Bro?" }
data Bavarian = Alois | Sepp
instance Greeter Bavarian where { hi _ = "Servus!" }
meet :: Greeter a => a -> IO ()
meet g = putStrLn (hi g) -- cooler: 😎 meet = putStrLn . hi
meet IceT -- OK
meet Sepp -- OK
meet 42 -- ERROR
What is the most general type of swap?
swap (x, y) = (y, x)
All of the following signatures would be OK:
swap ∷ (Int, Bool) → (Bool, Int)
swap ∷ (Char, Double) → (Double, Char)
swap ∷ (Float → Bool, String) → (String, Float → Bool)
-- etc. etc.
But none of these is really satisfying, we would like to state something more general...
Observation: swap doesn't really care about the actual types of the first and second part of the pair.
It works for all types contained in the pair, but it doesn't only swap the values, it swaps the types, too.
Idea: Types can have parameters, just like functions:
swap ∷ (a, b) → (b, a)
The type variables are implicitly universally quantified:
swap ∷ ∀a b. (a, b) → (b, a)
In Python: Same idea, but a bit more ceremony due to legacy/syntactical reasons:
from typing import Generic, TypeVar
A = TypeVar('A')
B = TypeVar('B')
class Pair(Generic[A, B]):
def __init__(self, fst: A, snd: B) -> None:
self.fst = fst
self.snd = snd
def swap(p: Pair[A, B]) -> Pair[B, A]:
return Pair(p.snd, p.fst)
Same idea in Java, but type variables are more local:
class Pair<A, B> {
public A fst; public B snd;
public Pair(A fst, B snd) {
this.fst = fst; this.snd = snd;
}
}
public class Main {
public static <X, Y> Pair<Y, X> swap(Pair<X, Y> p) {
return new Pair<>(p.snd, p.fst);
}
public static void main(String[] args) {
Pair<Integer, Boolean> p = swap(new Pair<>(true, 42));
System.out.println(p.fst);
}
}
Generic data types, functions and traits in Rust:
struct Pair<A, B> { fst: A, snd: B }
fn swap<A, B>(p: Pair<A, B>) -> Pair<B, A> {
match p {
Pair { fst, snd } => Pair { fst: snd, snd: fst },
}
}
fn swap<A, B>(p: Pair<A, B>) -> Pair<B, A> {
let Pair { fst, snd } = p;
Pair { fst: snd, snd: fst }
}
fn swap<A, B>(Pair { fst, snd }: Pair<A, B>) -> Pair<B, A> {
Pair { fst: snd, snd: fst }
}
Remember: (Pure & total) functions must work for all types parameter substitutions, not just for some.
Implement all functions with the following types:
f ∷ a → a
g ∷ a → b → a
h ∷ a → b
You have no real choice:
f x = x -- a.k.a. id
g x y = x -- a.k.a. const
h x = ??? -- nope...
(More fun stuff in “Theorems for free!” by P. Wadler)
One can define a partial ordering on types: A type S is more general than a type T if there is a substitution for the type variables in S making it equal to T.
type S a b c = [(a, b )] → c → Bool
type T x = [(x, Int)] → x → Bool
S is more general than T: \([a \mapsto x, b \mapsto Int, c \mapsto x]\)
“T is more special than S.” Or: “T is subsumed by S.”
The ubiquitous Hindley/Milner/Damas type system inferes the most general type of a term. 👍
Running example:
class Animal:
def breed(self): pass
class Pet(Animal):
def play(self): pass
class Cat(Pet):
def purr(self): pass
Pet is a subtype of Animal, notation: Pet <: Animal
Mnemonic: There are fewer pets than animals.
XX says: “Use behavioral subtyping!”
Liskov substitution principle (LSP), scary formulation:
Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S <: T.
Class hierarchies are basically meaningless without it!
In other, simpler words: Do not use implementation hierarchies, use conceptual hierarchies.
LSP in our running example:
“Whenever you expect an animal or a pet, you should be happy with a cat.” 😽
Basic idea: As long as something has all the fields and/or methods I expect, I'm happy with it!
Not many details here, just hinting at some topics:
template <typename T>
int foo(T f) { return f.bar(); }
How can we type the following function, assuming that we don't want to have mixed types for x and y?
def longer(x, y):
return x if len(x) >= len(y) else y
Fully parametric polymorphism doesn't work:
T = TypeVar('T')
def longer(x: T, y: T) -> T: # Yay! "No mixing" is enforced.
return x if len(x) >= len(y) else y
Claim: longer will work for all types T, which is a lie. There are objects without a __len__ method.
Next attempt, using the Sized protocol:
def longer(x: Sized, y: Sized) -> Sized:
return x if len(x) >= len(y) else y
longer('abc', 'de') # returns Sized
longer([1, 2], [3]) # returns Sized
longer('fg', [4]) # Hmmm...
Not a good solution, either:
Type bounds to the rescue!
T = TypeVar('T', bound=Sized) # a class works here, too
def longer(x: T, y: T) -> T:
return x if len(x) >= len(y) else y
longer('abc', 'de') # returns str
longer([1, 2], [3]) # returns list[int]
longer('fg', [4]) # mypy complains!
(PRO TIP: If confused, ask mypy via reveal_type()!)
Using type classes for type bounds in Haskell:
max ∷ Ord a ⇒ a → a → a
max x y = if x >= y then x else y
Bounded type parameters in Rust:
fn max<T: PartialOrd>(x: T, y: T) -> T {
if x > y { x } else { y }
}
Alternative syntax:
fn max<T>(x: T, y: T) -> T
where T: PartialOrd
{
if x > y { x } else { y }
}
Just syntactic sugar, more readable when there are more and/or complicated type bounds
Bounded type parameters in Java:
public class Main {
public static <T extends Comparable<T>> T max(T x, T y) {
return x.compareTo(y) >= 0 ? x : y;
}
public static void main(String[] args) {
System.out.println(max("foo", "bar")); /* ⇒ "foo" */
System.out.println(max(123, 456)); /* ⇒ 456 */
System.out.println(max("huh?", 789)); /* nope! */
}
}
In C++: Templates with concepts & constraints
“Bounded polymorphism” with unrelated classes:
AnyStr = TypeVar('AnyStr', str, bytes)
def concat(x: AnyStr, y: AnyStr) -> AnyStr:
return x + y
concat('a', 'b') # OK, returns str
concat(b'a', b'b') # OK, returns bytes
concat(1, 2) # Error: neither str nor bytes
concat('foo', b'bar') # Error: mixed types
Compare this to:
StrOrBytes = str | bytes
def broken(x: StrOrBytes, y: StrOrBytes) -> StrOrBytes:
return x + y # Error: "+" can't concatenate str and bytes
Value restrictions can't be mixed with bound=...
Scary name for a basic question: What happens when parametric polymorphism and subtyping meet?
The LSP says: It's OK to pass an S when a T is expected and S <: T holds. (Remember the cats! 😺)
Now: Given a type Foo with a parameter, when is it OK to pass a Foo[S] when a Foo[T] is expected?
In other words: What is the relationship between Foo[S] and Foo[T] when S <: T?
Haskell's function type syntax is straightforward:
type funny = Int → Bool → Float
Modern C++'s syntax is finally bearable:
typedef float (*argl)(int, bool); // 😱
using funny = std::function<float(int, bool)>; // 😎
Python functions are uncurried, so no surprises here:
Funny = Callable[[int, bool], float]
Neither C++ nor Python let you reuse these synonyms for function definitions themselves ⇒ redundancy
PetProducer = Callable[[], Pet]
def usePP(pp: PetProducer) -> None:
pp().breed()
pp().play()
def produceAnimal() -> Animal: return Animal()
def producePet() -> Pet: return Pet()
def produceCat() -> Cat: return Cat()
usePP(produceAnimal) # Error! Animals can't play.
usePP(producePet) # OK
usePP(produceCat) # OK
Formally: Functions are covariant in their return type:
PetConsumer = Callable[[Pet], None]
def usePC(pc: PetConsumer) -> None:
pc(Pet())
pc(Cat())
def consumeAnimal(a: Animal) -> None: a.breed()
def consumePet(p: Pet) -> None: p.play()
def consumeCat(c: Cat) -> None: c.purr()
usePC(consumeAnimal) # OK
usePC(consumePet) # OK
usePC(consumeCat) # Error! Pets can't purr.
So functions are contravariant in their argument type:
PetTransformer = Callable[[Pet], Pet]
def usePetTransformer(pt: PetTransformer) -> None:
pt(Pet()).breed(); pt(Pet()).play()
pt(Cat()).breed(); pt(Cat()).play()
def transformAnimal(a: Animal) -> Animal: a.breed(); return a
def transformPet(p: Pet) -> Pet: p.play(); return p
def transformCat(c: Cat) -> Cat: c.purr(); return c
usePetTransformer(transformAnimal)# Error: Animals can't play
usePetTransformer(transformPet) # OK
usePetTransformer(transformCat) # Error: Pets can't purr.
If a type is used in an argument position and a return position, the function is invariant in this type. There is no subtyping relationship whatsoever.
Containers are producers/consumers too:
All the variance results from functions apply here, too:
Examples from the standard Python library:
Actually, we have seen all of this before, because you can view containers as (tuples of) functions.
An extremely simple example, a one-element box:
def makeBox(p: Pet) -> tuple[PetProducer, PetConsumer]:
box = p
def put(v: Pet) -> None:
nonlocal box
box = v
return (lambda: box, put)
get, put = makeBox(Cat())
get() # ⇒ Cat
put(Pet())
get() # ⇒ Pet
Remember from logic: \(A \implies A \lor B\), so we have:
Special case:
The consequences are well-known:
ConsumePetProducer = Callable[[Callable[[], Pet]], None] # 🤯
# Haskell: type ConsumePetProducer = (() → Pet) → ()
def useCPP(cpp: ConsumePetProducer) -> None:
cpp(producePet)
cpp(produceCat)
def useAP(ap: Callable[[], Animal]) -> None: ap().breed()
def usePP(pp: Callable[[], Pet]) -> None: pp().play()
def useCP(cp: Callable[[], Cat]) -> None: cp().purr()
useCPP(useAP) # OK
useCPP(usePP) # OK
useCPP(useCP) # Error: Pets can't purr. 🤔
Pet is used in a contravariant position here, but why?
Elementary school math to the rescue!
For a type \(a \rightarrow b\) we have seen that \(a\) is used in a contravariant position and \(b\) is used in a covariant one.
Think about variance as a kind of “sign”:
Then use the sign rules from elementary school:
(+) × (+) = (+), (-) × (+) = (-), (+) × (-) = (-), (-) × (-) = (+)
Consider the function type:
So \(a\) and \(c\) are used in covariant positions, and \(b\) is used in a contravariant position.
mypy ties variance to type variables, not positions:
T_co = TypeVar('T_co', covariant=True)
T_contra = TypeVar('T_contra', contravariant=True)
T_inv = TypeVar('T_inv')
As a consequence, generic classes are very inflexible:
class Stack(Generic[T_inv]):
def __init__(self) -> None: self.items: list[T_inv] = []
def push(self, v: T_inv) -> None: self.items.append(v)
def pop(self) -> T_inv: return self.items.pop()
def empty(self) -> bool: return not self.items
Stack can only be used in an invariant way! 😢
Looking back at our box example, one can work around this mypy flaw, but it's verbose:
class ReadableBox(abc.ABC, Generic[T_co]):
@abc.abstractmethod
def get(self) -> T_co: ...
class WritableBox(abc.ABC, Generic[T_contra]):
@abc.abstractmethod
def put(self, v: T_contra) -> None: ...
class MyBox(ReadableBox[T_inv], WritableBox[T_inv]):
def __init__(self, v: T_inv): self._v = v
def get(self) -> T_inv: return self._v
def put(self, v: T_inv) -> None: self._v = v
This effectively triples the number of needed classes.
Java ties variance to the use site, which is much more flexible.
A covariant example:
ArrayList<? extends Pet> pp; // a Pet producer
pp = new ArrayList<Animal>(); // ERROR
pp = new ArrayList<Pet>(); // OK
pp = new ArrayList<Cat>(); // OK
Animal a = pp.get(0); // OK
Pet p = pp.get(0); // OK
Cat c = pp.get(0); // ERROR
pp.add(new Animal()); // ERROR
pp.add(new Pet()); // ERROR
pp.add(new Cat()); // ERROR
A contravariant example:
ArrayList<? super Pet> pc; // a Pet consumer
pc = new ArrayList<Animal>(); // OK
pc = new ArrayList<Pet>(); // OK
pc = new ArrayList<Cat>(); // ERROR
Animal a = pc.get(0); // ERROR
Pet p = pc.get(0); // ERROR
Cat c = pc.get(0); // ERROR
pc.add(new Animal()); // ERROR
pc.add(new Pet()); // OK
pc.add(new Cat()); // OK
Finally an invariant example:
ArrayList<Pet> ppc; // a Pet producer/consumer
ppc = new ArrayList<Animal>(); // ERROR
ppc = new ArrayList<Pet>(); // OK
ppc = new ArrayList<Cat>(); // ERROR
Animal a = ppc.get(0); // OK
Pet p = ppc.get(0); // OK
Cat c = ppc.get(0); // ERROR
ppc.add(new Animal()); // ERROR
ppc.add(new Pet()); // OK
ppc.add(new Cat()); // OK
Although Unions are not real algebraic data types, you can still dispatch on them, “peeling off” cases:
U = int | str | None
def strange(u: U) -> int:
# u has type int | str | None
if isinstance(u, int):
return u // 2 # u has type int
# u has type str | None
if u is None:
return 1234 # u has type None
# u has type str
return u.count('foo')
You can abuse assert to introduce typing backdoors:
U = int | str | None
def pray(u: U) -> int:
# u has type int | str | None
assert isinstance(u, str)
# now u has type str
return u.count('foo')
Don't do this! Fix your type, it's too permissive, i.e. wrong. Otherwise you wouldn't need the assert...
Casting is even worse, you won't get an exception early. Guaranteed debugging fun!
The language spec defines
assert expression
to be equivalent to
if __debug__: # True if -O commandline option not used
if not expression: raise AssertionError
Mypy assumes that __debug__ is always true, as you can see in the pray() example.
This assumption is wrong in general and can bite you!
You can use Type to talk about types, not values:
def twoPets(t: Type[Pet]) -> tuple[Pet,Pet]:
return t(), t()
twoPets(Animal) # ERROR! Type is covariant in its argument.
twoPets(Pet) # ⇒ tuple[Pet,Pet]
twoPets(Cat) # ⇒ tuple[Pet,Pet]
A more precise typing for this would be:
P = TypeVar('P', bound=Pet)
def twoPets(t: Type[P]) -> tuple[P, P]:
return t(), t()
twoPets(Pet) # ⇒ tuple[Pet,Pet]
twoPets(Cat) # ⇒ tuple[Cat,Cat]
Dispatching on values of type Type can be done via issubclass:
U = Type[Pet] | Type[Exception]
def obscure(u: U) -> int:
# u has type Type[Pet] | Type[Exception]
if issubclass(u, Pet):
u().play() # u has type Type[Pet] in this branch
return 42
# u has type Type[Exception]
return len(u('foo').args)
Basically the same idea as we have seen before with isinstance for values.
It is just meant as an escape hatch at the border to “untyped land”, easing transition of legacy code.
It claims to fulfill all your wishes, which is a lie:
cake: Any = 42
cake = "Trust me, I'm an int! >:-)"
doh: int = cake
ooops: list[float] = cake.a_funny_method(True, 3.14, None)
It is equivalent to casting (should be avoided, too!):
T = TypeVar('T')
def myCast(t: Type[T], v: Any) -> T: return v
s: str = myCast(str, 42)
If mypy doesn't find any stubs (.pyi files) and is told to ignore that fact, we get “Any”s all over the place:
import urllib3 # type: ignore[import]
http = urllib3.PoolManager()
r = http.request('GET', 'http://tribe29.com/robots.txt')
h = urllib3.HurzManager() # mypy is happy, but...
reveal_type((urllib3, http, r, h)) # ⇒ tuple[Any,Any,Any,Any]
To fix this, find stubs in the typeshed repo on GitHub, or write stubs for yourself, or help mypy locally:
can_read1 = r.readable() # ⇒ type is Any
can_read2: bool = r.readable() # ⇒ type is bool
Writing stub files is easy: Just use the normal Python 3 syntax, but leave out “code” parts, replacing them with an ellipsis (...):
x: int # no assignment
def funky(code: str) -> int: ... # no function body
class FooClass:
def __init__(self, v: int) -> None: ...
# no default value for b, just the fact that there is one
def meth(self, a: bool, b: int = ...) -> str: ...
No need to type everything, just the subset used!
Place the .pyi file for a module at the corresponding place somewhere below a directoy in the MYPYPATH.
Using dict[str,Any] all over the place has various severe problems:
Refactoring from dict[str,Any] to something saner is non-trivial, so let's do it in small steps:
A TypedDict is a hint for the type checker, at runtime it's just a plain old dict.
p1 = {'name': 'John Doe', 'age': 42} # old-skool dict
class Person(TypedDict): # actually not a real class...
name: str
age: int
p2: Person = {'name': 'John Doe', 'age': 42}
p3: Person = {'name': 'John Doe'} # ERROR! missing key age
p4 = Person(name='John Doe', age=42)
Remember: All TypedDicts are just dicts, so you can't distinguish them at runtime via isinstance. Tag them somehow if you must.
TypedDicts use a variation of structural typing:
def usePerson(pers: Person) -> None: pass
usePerson({'name': 'Jane Doe'}) # ERROR! missing key age
usePerson({'name': 'Jane Doe', 'age': 40})
usePerson({'name': 'Jane Doe', 'age': 40, 'x': 12}) # ERROR!
If all keys are optional, use total=False:
class Person2(TypedDict, total=False):
name: str; age: int
def usePerson2(pers: Person2) -> None: pass
usePerson2({'name': 'Jane Doe'})
usePerson2({'name': 'Jane Doe', 'age': 40})
usePerson2({'name': 'Jane Doe', 'age': 40, 'x': 12}) # ERROR!
If only some keys are optional, use subclassing. Or wait for Python 3.11, which provides (Not)Required.
From “The Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to Expunge Them”:
Shotgun parsing is a programming antipattern whereby parsing and input-validating code is mixed with and spread across processing code — throwing a cloud of checks at the input, and hoping, without any systematic justification, that one or another would catch all the “bad” cases.
Trying to find invalid input in the middle of processing it is bad:
Cunning idea: Let's just validate everything upfront!
But this is still bad:
An example using a JSON value with key "color" and possible values "red", "green", or "blue":
value = json.loads('{"color": "red"}') # type is Any
def validate(j: Any) -> None:
c = j.get('color')
if c is None:
raise ValueError('missing key')
if c not in ['red', 'green', 'blue']:
raise ValueError('wrong color')
def use(j: Any) -> int: # clueless type 🤡
return 42 if j['colour'] == 'bleu' else 314
validate(value) # 👍
use(value) # 💥
Make invalid state unrepresentable:
Color = Enum('Color', 'RED GREEN BLUE')
def parse(j: Any) -> Color:
c = j.get('color')
if c is None: # for better error message only
raise ValueError('missing key')
if c == 'red': return Color.RED
if c == 'green': return Color.GREEN
if c == 'blue': return Color.BLUE
raise ValueError('wrong color')
def use(c: Color) -> int:
return 42 if c == Color.BLUE else 314
use(parse(value))
Use the data in the way you want, not in the way given.