Friday, October 1, 2010

Rust Features I: Type Inference

As the Rust programming language slowly starts to get up and running, I thought I'd start a series of short blog posts to show off some of the more interesting features of the language. I'm going to do my best to restrain my enthusiasm and present only bits of the language that are currently working in the compiler. (Nevertheless, all of this can change at any time as the language evolves! This goes double for syntax; we have to make sure the semantics of the language is nailed down before we can start designing how the language looks and feels.)

I'll start with a simple feature: type inference. Along with structural typing, this allows Rust to express some pretty powerful concepts without the boilerplate seen in other languages. Rust features what I call feed-forward type inference, which is summed up by this rule: Variables have no type until they are assigned a value, at which point their type can't change.

Let's take a look at some simple examples to see how this works.

    auto x = 5;
auto y = "foo";


Here, we see the auto keyword, which declares a variable and leaves the compiler to figure out the type. In this form, it's similar to var in C# or Go. Because a value is immediately assigned to the variable, Rust infers that x has the type int and that y has the type string.

Now for a more interesting example:

    auto x;
if (localtime().hours >= 8) {
x = "awake!"
} else {
x = "asleep, go away."
}
log "I'm " + x;


Here, we didn't initialize x in its declaration, but this code still compiles and runs properly. This is because Rust is able to wait until x is assigned to determine its type. Unlike, say, C# (and, of course, languages like ML in which variables must be assigned a value at declaration time), Rust doesn't require that auto declarations be initially assigned a value to determine a type for them. At the same time, unlike C, the compiler will emit an error if you ever forget to assign a value to the variable in any branch of your code. For instance, if you omit the else branch above, the code won't compile.

Where type inference becomes especially interesting is in combination with structural record types. Take this very simple example (note that a couple of these APIs are currently missing, so this won't work if you try it right now):

    log "Enter two vectors in the format 'x1 y1 x2 y2' and I'll give you the dot product:"
auto fields = _str.split(input.read_line(), ' ');
auto ints = _vec.map(fields, _int.of_str); // convert to ints
auto a = rec(x=ints.(0), y=ints.(1));
auto b = rec(x=ints.(2), y=ints.(3));
auto ans = a.x * b.x + a.y * b.y;
log ans;


Take a look at the two lines that begin with auto a = and auto b =. They initialize a record that wasn't defined anywhere else in the program. The C99 equivalent of this would be:

    struct point {
int x;
int y;
};

...

point a = { ints[0], ints[1] };
point b = { ints[2], ints[3] };


C doesn't have structural typing, so we're forced to declare the point type up front. In contrast, Rust allows you to effectively pull new record types out of thin air at any time, by simply listing their keys and values inside a rec() constructor. Combined with type inference, this affords you much of the convenience of the (roughly-)equivalent idiomatic Python declaration:

    a = { 'x': ints[0], 'y': ints[1] }
b = { 'x': ints[2], 'y': ints[3] }


But you gain the type safety of Rust and performance comparable to the C version. (You do lose the dynamism that Python dictionaries provide, however; for example, a["x"] isn't allowed. For that, you'll want the hashmap type available in the standard library.)

Hopefully, these examples give you a feel for the type inference in Rust and what it can provide you. Although Rust isn't a scripting language, it's designed to bring much of the convenience and readability of scripting languages (dare I say "agile development"?) to large-scale projects.

6 comments:

  1. What happens in this case?

    log "do you want to add strings or ints?"
    auto use_str = input.readline == "strings"
    if (use_str) {
    a = "hello "
    b = "world!"
    } else {
    a = 1
    b = 1
    }

    log a+b

    The types of a and b are selectable at runtime, but then are bound to that variable for good.

    I am guessing that no, you don't support it. If that is the case, then it is somewhat misleading to say that the type assignment is deferred until assignment. Further, using auto just obscures the type of the variable for human readers, and it is just better to declare the type up front.

    If you do allow late binding of the type, then I guess it use mostly useful for subroutine parameters as a type of polymorphisim, eg

    sub add(auto a, auto b) {
    return a+b
    }

    ReplyDelete
  2. Hi frustum,

    In your first example, assuming that you've declared "auto a; auto b;" above, you're correct that this is not supported (unless you use the "any" type, but then addition via '+' will not be supported). In other words, the rule isn't flow-sensitive; as stated, I agree that wasn't as clear as it could have been.

    I don't agree, however, with the blanket statement that using "auto" hurts clarity. The success of dynamic languages has shown that programmers don't need type annotations on every variable to make sense of code.

    As for the last example, "auto" isn't supported for parameters. The main reason is that we want separate compilation without having to do whole-program HM style inference. Instead, you can use parametric polymorphism if you're fine with the types being opaque, or you can use "any", which would be what you'd want to write that "add" function. Both of these are for another post :)

    ReplyDelete
  3. Non-flow-sensitive types seem like a *feature*.

    ReplyDelete
  4. Seeing your example of structural record types, two key ideas come to mind for Rust.

    First of all, do you support algebraic data types with alternatives? For instance, an analogue to the Haskell "data Maybe a = Just a | Nothing"? Every time I use a language without such types, I find myself wanting them, and recreating them with tagged unions.

    Second, do you support pattern matching?

    ReplyDelete
  5. Yep, you can create discriminated unions/ADTs. In fact, they're the only way to create recursive types (although we'll probably introduce "self" types later for objects).

    We also have full ML-style pattern matching. Not all of it is implemented at the moment, but I'm pushing for feature parity with OCaml's "match" statement.

    ReplyDelete
  6. Maybe I missed this somewhere, but I'm curious, what is the ultimate aim or byproduct of Rust?

    Are we looking to begin building Mozilla technologies (e.g. Firefox) with it?

    ReplyDelete