My roots as a developer are in C and Assembly, and I remember fondly the time spent to learn those languages using Borland TASM or Turbo C under MS-DOS. Those languages, and the technology that surrounded them, forced me to understand the low-level architecture of computers, and I still consider those details extremely interesting.

Rust, just like C and C++, is a low-level language that exposes details of the underlying architecture, and to use it proficiently requires a certain understanding of concepts like memory management that are mostly ignored by high-level language like Python or JavaScript.

This post has been written for beginners who are used to dynamically typed languages. I will explore the intricacies of passing arguments to Rust functions, trying to clarify exactly what happens behind the scenes and why the language provides syntactic devices like & or mut.

For the sake of clarity, I will split the discussion into three separate topics: call conventions, ownership, and mutability. However, it is important to keep in mind that those concepts are not independent, and that the separation is purely a strategy to make them more digestible.

Computer architecture

Throughout the article, I will mention and show what happens at CPU level when code is written in a certain way.

You do not need to have previous knowledge of computer architecture to follow those sections, as I will explain what happens step by step, but I highly recommend to eventually become familiar with some concepts like the stack and CPU calling conventions. You will find some useful links in the Resources section at the end of the article.

Assembly and machine code

Formally, machine code is the sequence of binary values that are given to the CPU: everything in machine code is just a binary number. Assembly is a slightly higher-level language that uses mnemonics and register names to make it simpler for humans to read and write machine code.

For all practical purposes, in this article we can assume that Assembly and machine code are the same thing and use the two words interchangeably.

Call conventions

When we define a function, we list a set of parameters, that is values that have to be given to the function when we call it. As Rust is a statically typed language, functions need to declare the type of each parameter, and arguments need to match that.

Arguments and parameters

Strictly speaking, arguments and parameters are different. However, this explanation from the Rust book looks like a good informal approach.

We can define functions to have parameters, which are special variables that are part of a function’s signature. When a function has parameters, we can call it providing concrete values for those parameters. Technically, the concrete values are called arguments, but in casual conversation, people tend to use the words parameter and argument interchangeably for either the variables in a function’s definition or the concrete values passed in when you call a function.

In Rust, there are many built-in types like integers, floats, booleans, and an infinite amount of user-defined types, but when we define function parameters there are two macro categories in which they can fall: values and references.

This is the first and most important concept to learn. We can call a function passing values in only two different ways, regardless of the data type. Depending on the language and the speaker, such techniques are named in different ways, and in this article I will use call by value and call by reference. We call these two different strategies call conventions.

Values and references

Let's review this important distinction between values and references before we jump into the details of function calling.

Consider this situation: I want to give a friend a card for their birthday. I can hand them the card itself, passing the physical object.

Let's consider a different scenario: a friend wants to borrow my car to go shopping. It would be crazy for me to try and get my car to hand it physically. I will probably just tell my friend where the car is parked.

These two examples are exactly what happens when a computer manages data. Ultimately, data is just bytes stored somewhere in memory (typically in RAM, but the same is true for caches or long-term storage like hard drives). When I want to send data to another component in the system I can either give the data itself (value) or tell the component where to find the value (reference).

Memory location

Technically, a memory location is called address. Ultimately, it's just a number that identifies a specific location in memory that hosts data. A pointer or a reference is the type of a variable that contains a memory address. It is usually an unsigned integer whose size depends on the computer architecture. For an Intel x86-64 machine it's a 64 bit number (8 bytes).

Strictly speaking, while C pointers are literally just an integer, Rust references are structures that contain additional data, but for the sake of simplicity in this post we will ignore the difference. If you want to dig into this you can read the documentation.

This means that, strictly speaking, arguments can be passed only by value. References are values (addresses) that are used to find other values.

Why two different ways?

Why do computer provide two ways to pass arguments to functions?

Let's consider a real world example. I won a huge sum of money at the national lottery, and decided to get a safe deposit box at the bank and put the money there. To make sure I don't forget the code to open the safe I write it on a piece of paper. One day I need to get some money from the safe, but I cannot go myself, so I ask a friend to go, open the safe, and get the money for me. To do this, I have to give this person the piece of paper with the code, and they will give it back once they are done. As this happens multiple times, I decide that it might be simpler to copy the code on another piece of paper and to give that to them.

Everything works well, until one day the bank asks us to change the code every time we open the safe, as a security measure. Now, the one who opens the safe has to come up with a new code and update their own piece of paper. This means that the copy owned by the other person is instantly invalidated. Assuming we don't have any other way to communicate the change, we need to review our strategy. In this case there is only one way to deal with the situation. We need a single copy of the piece of paper and each one of us knows where it is hidden. This way, the one who changes the code can go and update the only copy of the note in existence.

At CPU level, when we call a function (the friend) we might need to pass data (the code to open the safe). As long as data is not changed we can safely copy it into the function's memory space (copy the code on another piece of paper), but as soon as the value changes we need a different method that allows the function to change the original value (the only copy of the note). We can pass the location of the data (the place where the note is hidden) so that the function can use it and change it.

In the first two parts of the article we will deal with read-only data, so for a while it will look like there is no need to use references. In the third part, we will introduce mutability that will finally justify them.

Pass by value in Rust

The following code shows how we can pass function arguments by value in Rust

struct Item {
    pub value: u16,
}

fn process(item: Item) -> u16{
    item.value 3
}

fn main() {
    let i = Item { value: 42 }; 1

    process(i); 2
}

As you can see, the code is extremely simple. The function main creates a variable of type Item 1 and calls the function process 2. This in turn extracts the field value 3 and returns it (which is ignored by main).

The function prototype accepts an argument of type Item. This means that the function wants to receive the physical data (the birthday card).

In this case we say that the caller passes the variable i by value.

Behind the scenes

Traditionally, we assume that the compiler will byte copy the value into the function stack, and this is still a valid mental model. However, remember that the compiler's task is to optimise code, so it might decide to do something completely different at machine language level.

We can see what happens in this case using the amazing Compiler Explorer. Please note that to use it you need to make the function main public, as the compiler expects a library, not an executable. See the section "Disassemble Rust" at the end of the post if you want to do it locally.

Using rustc 1.83.0, the code above becomes

example:process:
        mov     ax, di 6
        ret 7

example:main:
        push    rax 1
        mov     edi, 42 3
        call    example:process 5
        pop     rax 2
        ret

For those unfamiliar with (Intel) Assembly, let me clarify what happens here:

  • First of all, there is no concept of struct in machine code. Here, the compiler figures out there is only one field so the struct i is just the 16 bit value 42.
  • In example::main the code saves 1 and restores 2 the 64-bit register rax, which is used by convention to store a function's return value (low-level calling convention).
  • The value of the variable i is stored in the register edi 3. Here, the compiler decided not to use the stack, as the value of the variable is ultimately just a single u16 that can be hosted by a register.
  • The function example:process is called 5. The code of the function in Rust is very simple, and it needs only to extract the field value. In Assembly, there is no field, and the return value of the function is basically just its input. The function stores the output in the register ax 6 which is the lower 16-bit part of rax.
  • The function returns 7.

Leaving aside the complexity of the low-level architecture and conventions, the main concept we need to retain is: passing parameters by value copies the value of the arguments.

Again, keep in mind that the compiler writes machine code with optimisation in mind, so what we do in Rust is not always reflected in what happens at CPU level. Later, we will see an example of this.

Pass by reference in Rust

The following code is a slight modification of the previous example and shows how we can pass function arguments by reference in Rust

struct Item {
    pub value: u16,
}

fn process(item: &Item) -> u16{ 1
    item.value
}

pub fn main() {
    let i = Item { value: 42 };

    process(&i); 2
}

As you can see, the only difference is that the function process accepts &Item 1, that is a reference to Item. The function call is modified accordingly, passing &i 2, which is the reference to i.

Here, we are doing what we did when we lent a car. Instead of handing the car itself we tell the recipient where to find it.

Automatic referencing and dereferencing

The code above might surprise those who are used to code in C/C++, as the function process receives &Item but then reads one of the field as if the variable was an Item.

Formally, we should first dereference the pointer, which in Rust can be done with *item and then access the field. Indeed, the following code works

fn process(item: &Item) -> u16{
    (*item).value
}

In C/C++ the syntax (*item).value can be written item->value, but Rust doesn't have the arrow operator.

Rust has a feature called automatic referencing and dereferencing that can automatically add &, &mut, or * to match the required data type. In this case, it automatically transforms item into *item.

It is extremely important to remember that Rust silently adjusts calls. The feature is useful, as it simplifies the syntax of the language, but if we want to understand what happens behind the scenes we need to be aware of this behaviour.

You can read more about automatic referencing and dereferencing in chapter 5.3 of the Rust book.

Behind the scenes

This time, the mental model is that we are handing the function the location of our data, without copying it to new memory locations. Let's see what happens at CPU level.

Again, using the Compiler Explorer and rustc 1.83.0, the code above becomes

example:process:
        mov     ax, word ptr [rdi] 6
        ret 7

example:main:
        push    rax 1
        mov     word ptr [rsp + 6], 42 3
        lea     rdi, [rsp + 6] 4
        call    example:process 5
        pop     rax 2
        ret

The machine code is clearly different from the previous version. Let's have a deep look:

  • As happened before, the code saves 1 and restores 2 the 64-bit register rax, which is used by convention to store a function's return value (calling convention).
  • The value 42 is pushed onto the stack 3. This call looks complicated because of byte alignment, but it basically moves the value 42 at the stack address stored into the stack register rsp. Let's decompose it
    • In Intel Assembly mov [rsp], 42 would move the value 42 to the address stored in the register rsp. The value is not moved into the register. Rather, the value in the register is used as an address.
    • The CPU is very picky when it comes to the size of data that we want to move, so the code clarifies that we want to treat the address as the space that hosts 2 bytes (a word that corresponds to 16 bits) with mov word ptr [rsp], 42.
    • Last, the Intel calling convention wants the stack pointer rsp to be constantly aligned to a 16-byte boundary. The function main already pushed rax, which advanced the stack pointer by 8 bytes, and we are going to store 2 bytes (word), so we are missing 6 bytes to keep the stack pointer aligned. This is the reason why the code above uses [rsp + 6] instead of just [rsp].
  • The address stored at rsp + 6 is loaded into the register rdi 4. The instruction lea (Load Effective Address) calculates the result of [rsp + 6] as an address and loads it into rdi. This is different from what mov would do, which is to copy the value at that address.
  • As before, the function process is called 5.
  • The code of the function is different as well because now rdi doesn't contain a value but the address of the value. Thus, the function uses word ptr [rdi] 6 to move the 16 bits value at that address into ax (which is still the conventional register for the return value) and returns 7.

Once again, the Assembly code is complicated because of conventions and low-level architectural details, but the main concept is: passing parameters by address gives the function the location of the data and not the data itself.

The compiler's role

I mentioned several times that the compiler's task is to optimise code, which means that the machine code might or might not correspond to what we wrote in Rust (or any other high level language). A simple example is the following

struct LargeItem {
    value: [u16; 1024 * 1024], 2
}

fn process(item: LargeItem) -> u16 {
    item.value[0]
}

pub fn main() {
    let li = LargeItem {
        value: [0; 1024 * 1024],
    };

    process(li); 1
}

Here, we are passing the variable li by value 1, but the size of the type is rather huge. It's a struct of 2 MiB (2 bytes * 232) 2 which cannot be stored into a register. This means that, even though in Rust we pretend we pass the value, the compiler is forced to call the function passing the address. The relevant part of the machine code this time is

example:process:
        mov     ax, word ptr [rdi]
        ret

example:main:
        # Calls to memset and memcpy
        # to set up the large struct
        ...
        lea     rdi, [rsp + 8] 1
        call    example:process
        ...
        ret

where I omitted the rest of the code that deals with the memory initialisation of the large struct.

As you can see, the compiler ignores our "directive" to pass by value and uses the address of the struct 1 as it did with the Rust code that uses references. The function extracts the first value of the array item.value, so the machine code reads the value at the address 2.

The bottom line is: both passing by value and passing by reference can result in the same machine code.

This might sound surprising, but if you think about it, that's exactly the reason why we use a compiler and why it's such a fascinating and important piece of software.

As I mentioned before, the reason why we should pass arguments by reference instead of by value will be clear once we introduce mutability. To get there, however, we first need to discuss another features of the language: ownership.

Ownership

This is the second aspect of function calls that I want to discuss, together with call conventions and mutability. Once again, such concepts are interconnected and are separated here just for the sake of clarity.

To discuss ownership, let's once again have a look at real world examples.

In the first scenario, I have a car (once again!) that I don't use any more, so I sell it. From today, I cannot drive that car any more, as it belongs to someone else.

The second situation is: I own a holiday home in a beautiful place, and a friend asked me to use it for a coupe of weeks. The ownership of the house doesn't change, but for a while it will be used by another person.

The third case is similar to a previous example. I annotated the address of a shop on a piece of paper. A friend expresses interest in the same shop, so I copy the note and give it to them. Now, we both have the same information in two different physical locations.

The difference between the cases is clear. Once the car is sold it's not part of my possession any more. I can't drive it, but at the same time I don't have to pay insurance or to deal with it when it's time to demolish it. In the second case, the house is still mine, and I am responsible for council tax and repairs, but for a while it will be given to another. In the third case, each one of us owns their own copy of the information and is responsible for the data.

These three cases can be connected with the way function arguments are treated in Rust. When we pass variables to a function we need to consider the ownership of those variables (actually, the ownership of the data stored in those variables).

In Rust, each argument can be passed in one of the three possible ways:

  • Copy: we make a copy of value and end up with two owners (like the address of the shop).
  • Move: we transfer ownership (like selling the car).
  • Borrow: we lend the value to a function but we want to have it back (like we did with the house).

It is important to understand that ownership is an additional check introduced by the Rust compiler, and that there is no such a thing at CPU level.

Copy: two values and two owners

Let's consider the following code

fn process(value: u16) -> u16 {
    value * 2
}

fn main() {
    let v: u16 = 42; 1

    process(v); 2

    println!("Value {}", v); 3
}

Here, we initialise a value v 1 and we pass it by value to the function process 2.

The value of v is copied into the variable value when the function is called. This means that when we call process there are two independent areas of memory:

  • The area labelled v which is owned by main.
  • The area labelled value which is owned by process.

Both areas of memory (variables) contain the same value initially, but they are independent so they can change without affecting each other. We will see an example when we discuss mutability later.

There are two important facts to consider here:

  1. We can use v 3 after we called process. As we copied the value into the function, our variable is still accessible.
  2. The variable has been copied automatically by Rust. This happens because v is a simple type and implements the trait Copy out of the box.

The Copy trait

In Rust, the Copy trait is associated with bitwise copy, that is a trivial memory copy between memory areas. Going back to real world examples we can think of a photocopy of a document, where the two copies are exactly identical at the end of the process.

This is not always what we want and this is where the trait Clone comes into play. I won't discuss it further in this article, make sure you read what the Rust book says about Memory and Allocation.

Some simple types in Rust implement the trait Copy out of the box: integers, floats, booleans, and char are among those. If you are unsure, you can always check the documentation. For example, u32 implements Copy as stated here.

It's important to remember that a struct doesn't implement Copy automatically. This was the case originally, but it was changed around 2014, and you can read a long and detailed explanation in RFC #19.

Move: transfer ownership

What happens when a variable is passed by value and its type doesn't implement the Copy trait? In Rust, the variable is moved to the function and the ownership is transferred (selling the car).

Let's have a look at the following code

struct Item {
    pub value: u16,
}

fn process(item: Item) -> u16 {
    item.value
}

fn main() {
    let i = Item { value: 42 };

    process(i);

    // This fails: value moved to process()
    println!("Item value {}", i.value);
}

The compiler won't accept it and will return the following error

    |
    |     let i = Item { value: 42 };
    |         - move occurs because `i` has type `Item`,
    |           which does not implement the `Copy` trait
    |
    |     process(i);
    |             - value moved here
...
    |     println!("Item value {}", i.value);
    |                               ^^^^^^^ value borrowed
    |                                       here after move
    |

After what we said in the previous sections, I believe the error messages are very clear.

When we called process passing i by value we gave up ownership of the variable, so it is not acceptable to use it after that instruction. We are basically trying to drive the car after we sold it.

Implementing Copy for structs

As you see, the Copy trait is explicitly mentioned. If we implement it for Item we should be able to go back to the previous case, with two copies of the value (i in main and item in process). When we define a struct that contains only types that implement Copy we can ask Rust to implement the trait for us using #[derive(Copy, Clone)].

#[derive(Copy, Clone)]
struct Item {
    pub value: u16,
}

fn process(item: Item) -> u16 {
    item.value
}

fn main() {
    let i = Item { value: 42 };

    process(i);

    // This succeeds: value copied to process()
    println!("Item value {}", i.value);
}

This code compiles as now Item can be copied when passed as an argument to process. The only field of Item is a u16, a type that implements the trait Copy, so it is sufficient to derive the trait to implement it for the structure.

To prove that ownership is a protection mechanism that exists only in Rust, it is interesting to compare the Assembly code for the two cases of a struct that doesn't implement Copy and for one that does. The code is exactly the same for both cases (we saw it in the first example of the article).

Borrow: lend the value to a function

So far we saw two cases that happen when we pass arguments by value. We learned however that we can also pass arguments by reference, so let's have a look at what happens in that case.

struct Item {
    pub value: u16,
}

fn process(item: &Item) -> u16 {
    item.value
}

fn main() {
    let i = Item { value: 42 };

    process(&i);

    println!("Item value {}", i.value);
}

Here, we pass to the function a reference to the value, just like we did with the house in the example. We are basically telling the function that it is allowed to access the value for a while, but that we retain ownership. The code compiles without errors.

This seems to be exactly what happened when we passed by value a type that implements Copy, so what is the difference? As we said before, when data is read-only passing arguments by value and by reference might produce identical results, and using references might look like an unnecessary complication.

Sooner or later, however, we will need to change the values of our variables. Time to discuss mutability.

Mutability

In Rust, variables are immutable unless they are declared as mutable. However, as it happened for ownership, it is important to remember that at CPU level everything is mutable.

Mutability, like ownership, is a feature that the compiler introduces to help us to write code that is more correct. Forcing us to declare a variable as mutable gives us the chance to ask ourselves if we need that and to avoid bugs. Mutability and ownership work together to ensure that we write safer code.

While the role of mut in front of variables is usually simple to grasp, mutable references might prove more complicated. For this reason, to discuss mutability we will consider separately the case of arguments passed by value and arguments passed by reference.

Mutability and arguments passed by value

Looking back at two of the real world examples in the previous section, we can quickly understand what happens if we introduce mutability with parameters passed by value.

When I sell the car (pass by value, move) or give away the address of the shop (pass by value, copy), the new owner is free to do whatever they want with the object they receive. They can decide to paint the car, or to burn the note, and those actions are not affecting me at all.

The same happens with Rust variables. If the variable is moved (because it doesn't implement Copy), the ownership is transferred to the function, which is free to do whatever it wants.

struct Item {
    pub value: u16,
}

fn process(mut item: Item) {
    item.value += 1;
}

fn main() {
    let i = Item { value: 42 };

    // Variable moved
    process(i);
}

If the variable is copied (because it implements Copy) the function receives a copy of the value, and once again it's free to do whatever it wants with it.

#[derive(Copy, Clone)]
struct Item {
    pub value: u16,
}

fn process(mut item: Item) {
    item.value += 1;
}

fn main() {
    let i = Item { value: 42 };

    // Variable copied
    process(i);

    // i.value is 42
    println!("Value {}", i.value);
}

Here, we can access i after the call because it implements Copy, but the function changed the value of the copied value and not that of the original variable.

The two examples show what we said at the beginning of the article: passing arguments by value doesn't allow us to change the original variables. In Rust, when a variable is passed by value mutability can be ignored.

Declaring an argument as mutable

There is an important point to clarify. Let's have a look at the following code

fn process(mut value: u16) {
    value += 1;
}

fn main() {
    let i: u16 = 42;

    process(i);
}

As you see, the variable i is passed by value. This means that the function process is free to do whatever it wants with the argument value. However, i is not mutable, so how can the function increment value?

The syntax mut value in the function signature means: create a local mutable variable that will host a value coming from the outside.

This is paramount to understand: mut value doesn't mean that the argument passed in main has to be mutable. It just means that the argument will be hosted by a mutable local variable.

As a matter of fact, the following code compiles

fn process(mut value: u16) {
    value += 1;
}

fn main() {
    let mut i: u16 = 42;

    process(i);
}

But the compiler gives us a warning

warning: variable does not need to be mutable
    |
    |     let mut i: u16 = 42;
    |         ----^
    |         |
    |         help: remove this `mut`

Mutable arguments by reference

When a friend borrows my house, I am expecting them to vacate it after a while, and its management is my responsibility. While they live there, I might not like the fact that they paint the walls or change furniture. I need to be clear if I am giving them an object that they can alter or not.

This is why in Rust we can pass arguments to functions by reference in two different ways. We can pass a normal reference (&) or a mutable reference (&mut).

Mutable references

The name "mutable reference" can be misleading, as &mut is a reference that allows to mutate the referenced variable. The reference itself, as any other variable, is immutable unless stated otherwise.

We already saw how to pass a normal reference

fn process(value: &u16) -> u16 {
    *value
}

fn main() {
    let i: u16 = 42;

    process(&i);

    // i is 42
    println!("Item value {}", i);
}

Please note that everything is coherent here. The variable i is not mutable (because we don't need it), and is passed by (immutable) reference to the function. This means that the function receives a reference to a value and Rust knows that it is not allowed to change the referenced value.

If we want to change the value inside the function we need to alter the code in many places

fn process(value: &mut u16) { 1
    *value += 1
}

fn main() {
    let mut i: u16 = 42; 3

    process(&mut i); 2

    // i is 43
    println!("Item value {}", i);
}

The function process needs to receive &mut u16 1 because we want to change the referenced value. This means that main has to call the function passing &mut i 2 and not &i, as the types need to be coherent. We cannot create a mutable reference to an immutable value, though, so i has to be mutable as well 3.

The functional way

There is a last option that is a standard strategy in functional languages, where often variables cannot be declared as mutable at all.

struct Item {
    pub value: u16,
}

fn process(mut item: Item) -> Item { 2
    item.value += 1;
    item  3
}

fn main() {
    let mut i = Item { value: 42 };

    // Variable moved
    // Reassigned to retain ownership
    i = process(i); 1

    println!("Value {}", i.value)
}

Here, we pass the variable i by value 1, which will copy or move the variable (depending on Copy). The function process declares the parameter as mutable 2 and changes its value. Then, it returns the same type it accepted 3 and the caller reassigns the old variable 1, that at this point has to be mutable.

A quick recap

As you can see, it is very hard to separate call conventions, ownership, and mutability. They are all different aspects of function arguments in Rust, and they work together to ensure the code is safe. Before wrapping up, it might be useful to summarise how we decide the correct strategy.

Do you want to modify the original value?

  • YES: Variable has to be mutable. Pass by mutable reference OR return and reassign.
  • NO: Do you need to retain ownership?
    • YES: Implement Copy. Pass by value OR pass by reference.
    • NO: Pass by value OR pass by reference.

Disassemble Rust

If you want to disassemble the Rust code on your machine (for example to explore a different architecture) you can follow these steps.

  • Install cargo-binutils.
  • Create a project with Cargo: cargo new proto
  • Open Cargo.toml and turn off debug for the profile dev
Cargo.toml
[package]
name = "proto"
version = "0.1.0"
edition = "2021"

[dependencies]

[profile.dev]
debug=0
  • Use llvm-objdump options and awk to get the output you want. For example
$ cargo objdump -- \
   --disassemble \
   --x86-asm-syntax=intel \
   --demangle \
   --no-show-raw-insn \
   --no-print-imm-hex \
   | awk -v RS="" '/^[[:xdigit:]]+ <proto::/'

The awk script is useful to isolate the functions in your code only, skipping the boilerplate that the compiler has to put into an executable. Make sure you mention the correct name of your script if you use something else than proto.

Final words

This was quite a ride! I hope it was useful, it definitely was to me to clarify in my head the available options and the reasons behind them. Happy coding!

Resources