Programming Languages and Paradigms -- Part 1

A programming paradigm is a fundamental style of programming used to classify programming languages.

Main Programming Paradigms and Sub-Paradigms

Imperative

  • Structured (or procedural)

    • Eg. C
  • Object Oriented

    • Encapsulation of code and the associated data into objects
    • Suit to problems of huge amounts of state/operations
      • GUI, simulators, video games, business management software
    • Easy to organize large codebases
    • Eg. C#, C++, Java, Python
  • Concurrent

    • Shared-memory threads/processes in C/C++, Java, Python, etc.
    • Message passing (for example MPI) in C/C++/FORTRAN
    • Semi-automatic loop paralelization with libraries such as OpenMP
    • GPU programming with CUDA
    • Use cases: HPC, distributed computing, graphic processing, etc.
  • Describes sequences of statements manipulating the program state

    • describes how to obtain the computation results
  • Uses advanced control flow operations

    • Loops, conditionals, procedures
    • easier to describe/reason about complex/large programs
  • Uses execution threads/processes to describe interleaving and/or parallel computation flows

    • Example in C:
      1
      2
      3
      4
      static void *thread_function(void *argument) {     
      int id = *(int *)argument;
      for(int i=0; i<10; i++)
      printf("Thread %d running on core %d\n", id, sched_getcpu()); }

      Declarative

  • Functional

    • First-class/higher-order functions
    • Loops implemented with recursion
    • Pure functions have no side-effects
    • Languages: Haskell, Scala, F#, etc.
  • Concurrent

    • Less need for synchronization
    • Use cases: distributed applications, web services, etc.
  • Describes the meaning/result of computations

  • High level of abstraction Code can easily become convoluted

  • Languages: HTML, SQL, XML, CSS, Latex (non-Turing complete)

  • Usage: document rendering, structured data storage and manipulation

  • Calling and composing functions to describes the program

Multi-Paradigm Languages

  • Haskell: purely functional and does not allow OO style
  • OCaml: mainly functional, allows OO and imperative constructs
  • C, C++: imperative but allow some functional idoms:
  • Many languages are multi-paradigm

Introduction to C

Pros:

  • Syntax
  • Low level
  • Fast
  • Controlled memory footprint (occupation)控制内存占用

    Cons:

  • Large area for making mistakes
  • Lack of memory safety, risks of undefined behavior at runtime

    Hello World

    1
    2
    3
    4
    5
    6
    7
    8
    9
    #include <stdio.h> // to get access to functions from external libraries
    // will run first when the program is executed
    // returns an integer and takes no parameters
    int main() {
    // print text on the standard output
    printf("hello, world!\n");
    // return to go back to the calling context, and exit if returning from main
    return 0;
    }
    Assuming the code is in listing1.c, to compile and run on the linux command line:
    1
    2
    3
    gcc listing1.c -o listing1
    ./listing1
    hello, world! (This is the output)

    C – Variables, Types, Printing to the Console

    Variables:

    1
    2
    3
    4
    5
    6
    7
    #include <stdio.h> 
    int main() {
    int a; // declare a of type int (signed integer)
    a = 12; // set the value of a to 12
    printf("a: %d\n", a); // print the value of a
    return 0;
    }
  • Variable:name,type,value
  • Must be declared before being used
    • Declare with <variable type><variable name>
  • A variable’s name should start with a letter or underscore
    • It can contain a combination of letters/numbers/underscores
      • E.g: int _x; int sum; int combination_int3;

        Types:

  • Defines the amount of space in memory allocated to the variable
  • Compiler checks the validty of operations on the variable at build time
  • 3 basic types categories: integers, floating-point numbers, characters
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    // INTEGER
    int a; // signed integer takes 4 bytes in memory on the Intel x86-64 architecture
    // CHARACTERS
    char my_char_variable;
    my_char_variable = 'x'; // using single quotes for characters
    // FLOATS
    float my_float_var;
    my_float_var = 12.4;
    // MIXING INTEGERS AND FLOATS
    int int1 = 2;
    float float1 = 3.8;
    int int2 = int1 + float1; // int2 = 5;
    int int3 = int1 * float1; // int3 = 7;
    /* when mixed with floats in arithmetics, integers are promoted to floats */
    /* when stored in an integer variable, floats are _truncated_ */
  • Types and Qualifiers
    • Types: char, int, float, double
      • The function sizeof can be used to get the exact size of a type on a given manchine
        1
        2
        3
        int so_short = sizeof(short int);     
        int so_int = sizeof(int);
        printf("size of short:%d bytes\n", so_short);
    • Qualifers: signed, unsigned(positive), short, long
    • Determines the storage size in memory
    • Exact implementation is architecture dependent 依赖于架构
      1
      2
      3
      4
      5
      6
      7
      8
      9
      short int a;              // signed, at least 16 bits: [−32,767,        +32,767] i
      int b; // signed, at least 16 bits: [−32,767, +32,767]
      unsigned int c; // unsigned: [0, +65,535]
      long int d; // at least 32 bits: [−2,147,483,647, +2,147,483,647]
      unsigned long int e; // unsigned: [0, +4,294,967,295]
      long long int f; // at least 64 bits: [-9x10^18, +9x10^18]
      long long unsigned int g; // unsigned: [0, +18x10^18]
      float h; // storage size unspecified, generally 32 bits
      double i; // storage size unspecified, generally 64 bits

      Printing to the Console: printf

  • Accepts one or more arguments: format string + variable list
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    int int_var = -1; 
    unsigned int uint_var = 12;
    long int lint_var = 10;
    float float_var = 2.5;
    double double_var = 2.5;
    char char_var = 'a';
    char string_var[] = "hello";
    printf("Integer: %d\n", int_var);
    printf("Unsigned integer: %u\n", uint_var);
    printf("Long integer: %ld\n", lint_var);
    printf("Float: %f\n", float_var);
    printf("Double: %lf\n", double_var);
    printf("Characters: %c\n", char_var);
    printf("String: %s\n", string_var);
    printf("Several variables: %d, %lf, %s\n", int_var, double_var, string_var);
    // Format string accepts escaped character such as \n for line break
  • Make sure to use the right marker because the compiler won’t tell you the mistakes and incorrect values
    • It will show warning like:”format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long long unsigned int’ “
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      int my_variable = -42; 
      printf("-42 printed with %%d: %d\n", my_variable);
      printf("-42 printed with %%u: %u\n", my_variable); // -42 interpreted as an unsigned integer
      // The outcome is 4294967254
      unsigned long long int ull = 5467513055454315;
      printf("ull printed with %%llu: %llu\n", ull);
      printf("ull printed with %%u: %u\n", ull); // interprets only 4 bytes vs. 8 for llu
      // The outcome is 2507777131
      printf("ull printed with %%d: %d\n", ull); // first 4 bytes, interpreted as unsigned
      // The outcome is -1787190165

      Arrays, Strings, Command Line Arguments

      Arrays

      1
      2
      3
      4
      5
      6
      7
      8
      9
      int array[4];     // declaration 
      array[0] = 10; // write a slot
      array[1] = 20;
      array[2] = 30;
      array[3] = 40;
      int x = array[3]; // read a slot
      printf("array[%d] contains %d\n", 0, array[0]); //array[0] contains 10
      printf("array[%d] contains %d\n", 1, array[1]); //array[1] contains 20
      printf("array[%d] contains %d\n", 2, array[2]); //array[2] contains 30
  • Arrays are stored contiguously in memory;
  • Multi-Dimensional Arrays
    1
    2
    int my_2d_array[3][2]; 
    my_2d_array[0][0] = 0;
  • Static initialization allows to omit the size (of the first dimension)
    1
    2
    int array[] = {1, 2, 3, 4, 5}; // do not have the size 
    int array_2d[][2] = {{1, 2}, {3, 4}, {5, 6}};
  • Character arrays, i.e. Strings
    • String: array of characters that ends with the special \0 end of string character marker
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      char string1[6] = "hello"; 
      char string2[] = "hello";
      char string3[6];
      string3[0] = 'h';
      string3[1] = 'e';
      string3[2] = 'l';
      string3[3] = 'l';
      string3[4] = 'o';
      string3[5] = '\0'; // Important here
      printf("%s\n", string1); // hello
      printf("%s\n", string2); // hello
      printf("%s\n", string3); // hello

      Command Line Arguments

      1
      gcc test.c -o program
      gcc is program; the rest are arguments
  • main parameters:
    • argc (integer)
    • argv: array containing the command line arguments
1
2
3
4
5
6
7
8
9
int main(int argc, char **argv) { // 'char ** argv' means 'char argv[][]'     
printf("Number of command line arguments: %d\n", argc);

// This is a bit dangerous...
printf("argument 0: %s\n", argv[0]);
printf("argument 1: %s\n", argv[1]);
printf("argument 2: %s\n", argv[2]);
return 0;
}
  • If the command line input is: ./test one two three

The output is:

1
2
3
4
Number of command line arguments: 4
argument 0: ./test
argument 1: one
argument 2: two
  • If the command line input is: ./test one

The output is:

1
2
3
4
Number of command line arguments: 2
argument 0: ./test
argument 1: one
argument 2: (null)
  • If the command line input is: ./test

The output is:

1
2
3
4
Number of command line arguments: 1
argument 0: ./test
argument 1: (null)
argument 2: LC_ALL=C.UTF-8

This is quite DANGEROUS

  • Arguments are strings, to convert to integer use atoi:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    #include <stdio.h> 
    #include <stdlib.h> // needed for atoi
    /* Sum the two integers passed as command line integers */
    int main(int argc, char **argv) {
    int a, b;
    // Once again, dangerous!
    a = atoi(argv[1]);
    b = atoi(argv[2]);
    printf("%d + %d = %d\n", a, b, a+b);
    return 0;
    }
    • To convert to a floating-point number (type double) use atof

      Conditionals and Loops in C

  • Statements end with ; and are executed sequentially

    Conditionals

  • Condition: <, <=, >, >=, ==!=
  • “false” is 0, “true” is everything except 0
  • Boolean operations:&& AND, || OR, ! negation
    • Operators precedence优先级: !first, then&&, the||
    • Use parentheses to modify/clarify evaluation order
  • Switch statement
    • Test the equality of an integer against a list of values
    • Use break to avoid fallthrough坠落 in cases
    • default path taken when none of the other cases matches
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      switch(x) {     
      case 1:
      printf("x is 1\n");
      break;
      case 2:
      printf("x is 2\n");
      break;
      case 3:
      printf("x is 3\n");
      break;
      default:
      printf("x is neither 1, nor 2, nor 3\n"); }

      Loops

  • While loop
    • while... checks the condition before entering the loop body
    • do... while checks after a first iteration
  • For Loops
    • for (<init statement> ; <condition>; <update statement>) {<body>}
      • <init statement> executed once before the first iteration
      • <condition> checked before each iteration
      • <update statement> executed after each iteration
      • E.g for(int i = 0; i<10; i = i + 1) {...}
    • break simply exits the loop
    • continue directly goes to the next iteration

      Back to fix command line potential danger

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      int print_help_and_exit(char **argv) {     
      printf("Usage: %s <number 1> <number 2>\n", argv[0]);
      exit(-1); }
      /* Sum the two integers passed as command line integers */
      int main(int argc, char **argv) {
      int a, b;
      // To make sure there are only two numbers in command line
      if(argc != 3)
      print_help_and_exit(argv);
      a = atoi(argv[1]);
      b = atoi(argv[2]);
      printf("%d + %d = %d\n", a, b, a+b);
      return 0; }

      Braces in Conditional and Loops

  • Can omit(ignore) the braces if conditional/loop body is a single statement

    Functions

  • Contains name, zero or more parameters with types, and a return type
    1
    2
    3
    4
    int add_two_integers(int a, int b) {     
    int result = a + b;
    return result;
    }
  • The absence缺席 of return value can be indicated with the void type
    1
    2
    3
    void print_hello(void) {     
    printf("hello!\n");
    return; // we can even omit this as the function does not return anything }

    Function Calls

  • Function parameters and return value are passed as copies
    1
    2
    3
    4
    5
    6
    7
    8
    9
    void my_function(int parameter) {     
    parameter = 12; // does not update x in main
    }
    int main() {
    int x = 10;
    my_function(x);
    printf("x is %d\n", x); // x is 10
    return 0;
    }

    Forward Declarations

  • Forward declaration, also called function _prototype_ 原型
  • The compiler parses解析 source files from top to bottom
    1
    2
    3
    4
    5
    6
    7
    8
    int add(int a, int b); 
    int main(){
    ....
    }
    /* The actual function definition */
    int add(int a, int b) {
    return a + b;
    }

    Variable Scope and Lifetime

  • Global variables declared outside functions
    • visible at the level of the entire source file
  • Try to limit the use of globals as much as possible
    • A lot of global state makes it harder to reason about the program
  • Within a a function, a declared local variable is accesible within its enclosing braces (a block of code)
    1
    2
    3
    4
    5
    int j; 
    for(j=0; j<10; j++) {
    printf("In loop body, j is %d\n", j);
    }
    printf("Out of loop body, j is %d\n", j); // Out of loop body, j is 10

    Custom Types and Data Structures in C

    Custom Types

  • Use typedef to alias别名 a type
    1
    2
    3
    4
    5
    6
    7
    8
    9
    #include <stdio.h> 
    // 'my_int' is now equivalent to 'long long unsigned int'
    // To make it shorter
    typedef long long unsigned int my_int;
    int main(int argc, char **argv) {
    my_int x = 12;
    printf("x is: %llu\n", x);
    return 0;
    }

    Custom Data Structures

  • Use struct to define a data structure with named fields
  • Use struct <struct name> to refer to it as a type
  • Access the fields of a variable with <variable_name>.<field_name>
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    #include <stdio.h> 
    #include <string.h> // needed for strcpy (string copy)
    // Definition of the struct:
    struct person {
    char name[10];
    float size_in_meters;
    int weight_in_grams;
    };
    void print_person(struct person p) {
    /* Access fields with '.' */
    printf("%s has a size of %f meters and weights %d grams\n", p.name, p.size_in_meters, p.weight_in_grams);
    }
    int main(int argc, char **argv) {
    // declare a variable of the struct type:
    struct person p1;
    // sets the variable field
    p1.size_in_meters = 1.6;
    p1.weight_in_grams = 60000;
    strcpy(p1.name, "Julie");
    struct person p2 = {"George", 1.8, 70000};
    print_person(p1);
    print_person(p2);
    return 0;
    }
  • Fields are place consecutively连续地 in memory (compiler may insert padding)
  • Structures can be typedef
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    #include <stdio.h> 
    #include <string.h> // needed for strcpy
    struct s_person {
    /* fields declaration ... */
    };
    typedef struct s_person person;
    void print_person(person p) { /* ... */}
    int main(int argc, char **argv) {
    person p1;
    person p2 = {"George", 1.8, 70000};
    /* ... */ }
    A faster way to define structures:
    1
    2
    3
    typedef struct {
    /* fields declaration ... */
    } person;

    Enumeration

  • User defined type for assigning textual names to integer constants
  • The compiler assigns an integer constant to each value of an enum
    1
    2
    3
    4
    5
    enum color{
    RED,
    BLUE,
    GREEN
    }
  • Enumerations can be typedef‘d

typedef enum color mycolor

or in once:

1
2
3
4
5
typedef enum {     
BLUE,
/* ... */
} color;

Introduction to Pointers

Program Memory Layout

  • All the program’s code and data is present somewhere in memory
  • The area of memory accessible by the program is the address space
    • Large array of contiguous bytes
    • Addresses are indexes in that array
  • This is Virtual memory:
    • Address space size independent from the amount of RAM
    • Each program gets its own address space
  • Address of a variable: the address in memory of the first byte storing that variable
  • Use the & operator to get the address of a variable
  • With modern processeors an address is a 64 bit value so we need the right format specifier: ‘%lu’ or ‘%lx’ if we want to print in hexadecimal

    Pointers

  • Pointer: variable that contains an address(possibly of another variable)
    • Declaration: <pointed type> * <pointer name>
      1
      2
      int v = 42; 
      int *ptr = &v;
  • Access the pointed value with *
    1
    2
    printf("pointer value:   0x%lx\n", ptr);  // 0xF123
    printf("pointed value: %d\n", *ptr); // 42

    Pointers Applications

    Allow a Function to Access the Calling Context

  • Can’t change a fuction’s parameter unless to use pointer
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    void add_one(int *param) {
    *param++; // access the pointed value
    }
    int main(int argc, char **argv) {
    int x = 20;
    printf("before call, x is %d\n", x); // print 20
    add_one(&x); // address of x
    printf("after call, x is %d\n", x); // print 21
    return 0;
    }
  • A function can ‘return’ more than a single value if…
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    // we want this function to return 3 things: the product and quotient of n1 by n2, 
    // as well as an error code in case division is impossible
    int multiply_and_divide(int n1, int n2, int *product, int *quotient) {
    /* Can't divide if n2 is 0 */
    if(n2 == 0) return -1;
    *product = n1 * n2;
    *quotient = n1 / n2;
    return 0;
    }
    int main(int argc, char **argv) {
    int p, q, a = 10, b = 5;
    if(multiply_and_divide(a, b, &p, &q) == 0) {
    printf("10*5 = %d\n", p); // 10*5 = 50
    printf("10/5 = %d\n", q); // 10/5 = 2
    }
    }

    Getting over C FUnction Call Limitations

  • Assume we want to write a function updating a large struct variable
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    typedef struct {
    // lots of large (8 bytes) fields:
    double a; double b; double c; double d; double e; double f;
    } large_struct;

    large_struct f(large_struct s) { // very inefficient in terms of performance and memory usage!
    s.a += 42.0;
    return s;
    }//set a large struct and only change one variable in it

    int main(int argc, char **argv) {
    large_struct x = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};
    large_struct y = f(x);
    printf("y.a: %f\n", y.a); // 43.0000
    }
  • Huge performance loss and memory footprint increase!
  • Solution: With pointer: maintain a single copy of the variable
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    typedef struct { 
    double a; double b; double c; double d; double e; double f;
    } large_struct;
    void f(large_struct *s) { // now takes a pointer parameter
    (*s).a += 42.0; // dereference to access x
    return;
    }
    int main(int argc, char **argv) {
    large_struct x = {1, 2, 3, 4, 5, 6};
    f(&x); // pass x's address
    printf("x.a: %f\n", x.a);
    return 0;
    }
    This is much faster and efficient

    C Arrarys are Pointers!

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    void negate_int_array(int *ptr, int size) { // functions taking arrays as parameters
    // also need the size to iterate properly
    for(int i=0; i<size; i++)
    // use square brackets like a standard array
    ptr[i] = -ptr[i];
    }

    int main(int argc, char **argv) {
    int array[] = {1, 2, 3, 4, 5, 6, 7};
    negate_int_array(array, 7); // to get the pointer just use the array's name
    for(int i=0; i<7; i++)
    printf("array[%d] = %d\n", i, array[i]); // the result will be negative
    return 0;
    }
  • We can call the elemnt in array by using *(arrayname + index)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    int int_array[2] = {1, 2};
    double double_array[2] = {4.2, 2.4};
    char char_array[] = "ab";
    printf("int_array[0] = %d\n", int_array[0]); //=1
    printf("int_array[1] = %d\n", int_array[1]); //=2
    printf("*(int_array+0) = %d\n", *(int_array+0)); // pointer arithmetic!, =1
    printf("*(int_array+1) = %d\n", *(int_array+1)); // +1 means + sizeof(array type), =2
    printf("double_array[0] = %f\n", double_array[0]); //=4.200
    printf("double_array[1] = %f\n", double_array[1]); //=2.400
    printf("*(double_array+0) = %f\n", *(double_array+0));//=4.200
    printf("*(double_array+1) = %f\n", *(double_array+1));//=2.400
    printf("char_array[0] = %c\n", char_array[0]); //=a
    printf("char_array[1] = %c\n", char_array[1]); //=b
    printf("*(char_array+0) = %c\n", *(char_array+0)); //=a
    printf("*(char_array+1) = %c\n", *(char_array+1)); //=b

    Pointers and Structures

  • . takes precedence优先级 over *
  • s->x is a shortcut for (*s).x
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    typedef struct {
    int int_member1;
    int int_member2;
    int *ptr_member;
    } my_struct;
    /* declare and initialise ms of type my_struct ... */
    my_struct *p = &ms;
    (*p).int_member1 = 1; // don't forget the parentheses! . takes precedence over *
    p->int_member2 = 2; // s->x is a shortcut for (*s).x
    p->ptr_member = &(p->int_member2);
    printf("p->int_member1 = %d\n", p->int_member1); //1
    printf("p->int_member2 = %d\n", p->int_member2); //2
    printf("p->ptr_member = %p\n", p->ptr_member); //0x7fff38692954
    printf("*(p->ptr_member) = %d\n", *(p->ptr_member)); //2
    p->int_member2 is (*p).int_member2 = 2; &(p->int_member2) is the address of this variable; *(p->ptr_member) is the actual value of this address point at

    Pointer Chains

  • A pointer is a variable and has itself an address, i.e. a location in memory, so it can be pointed to!
    1
    2
    3
    4
    5
    6
    7
    int value = 42;         // integer
    int *ptr1 = &value; // pointer of integer
    int **ptr2 = &ptr1; // pointer of pointer of integer
    int ***ptr3 = &ptr2; // pointer of pointer of pointer of integer
    printf("ptr1: %p, *ptr1: %d\n", ptr1, *ptr1);
    printf("ptr2: %p, *ptr2: %p, **ptr2: %d\n", ptr2, *ptr2, **ptr2);
    printf("ptr3: %p, *ptr3: %p, **ptr3: %p, ***ptr3: %d\n", ptr3, *ptr3, **ptr3, ***ptr3);
    The output is:
    1
    2
    3
    ptr1: 0x7ffc7e51775c, *ptr1: 42
    ptr2: 0x7ffc7e517760, *ptr2: 0x7ffc7e51775c, **ptr2: 42
    ptr3: 0x7ffc7e517768, *ptr3: 0x7ffc7e517760, **ptr3: 0x7ffc7e51775c, ***ptr3: 42

    Dynamic Memory Allocation

    Motivation

  • Many scenarios where an array size is determined only at runtime
  • Variable size arrays are rarely used due to support and space restrictions
    • Are not supported for all versions of C, if we try a very large number, we will get an error “segmentation fault”
  • So we need another solution
  • Static data: large size, all allocations fixed at compile time. Store global variables
  • Stack: small size, most allocations fixed at compile time.
  • Heap: lagre size, all allocations made dynamically at runtime

    Allocating on the Heap: malloc

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    #include <stdio.h>
    #include <stdlib.h> // needed for malloc
    int process_array(int *array, int size) { /* do something with array */ }
    int main(int argc, char **argv) {
    int *heap_array;
    /* ... */
    int size = atoi(argv[1]);
    heap_array = malloc(size * sizeof(int));
    if(heap_array == NULL) return -1;
    process_array(heap_array, size);
    free(heap_array); //free up memory when you done with it
    return 0;
    }
  • malloc allocates a chunk of memory on the heap and returns a pointer to it
    • Memory is contiguous and can be used as an array
  • Returns NULL (0) if the allocation fails
    • Always check the return value
  • void *: generic pointer type
    • Void pointer is a specific pointer type – void * – a pointer that points to some data location in storage, which doesn’t have any specific type. Void refers to the type. Basically the type of data that it points to is can be any. If we assign address of char data type to void pointer it will become char Pointer, if int data type then int pointer and so on. Citation
  • size_t: long unsigned int
    1
    2
    void *malloc(size_t size); 
    void free(void *ptr);

    Multidimensional Arrays and malloc

  • 3 x 2 int array
    double array in heap and stack
1
2
3
4
5
6
7
8
9
10
11
12
13
int a = 3; int b = 2;
int **array;
array = malloc(a * sizeof(int *)); // return the pointer of 1d array pointers
if(array == NULL) return -1;
for(int i=0; i<a; i++) {
array[i] = malloc(b * sizeof(int)); //return the pointer of numbers
if(array[i] == NULL) return -1;
}
array[2][0] = 12;
/* ... */
for(int i=0; i<a; i++)
free(array[i]);
free(array);

Memory Leaks

  • Don’t forget to free memory!
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    void print_ten_integers() {
    int *array = malloc(10 * sizeof(int));
    if(!array) {
    printf("cannot allocate memory ...\n");
    return;
    }
    for(int i=0; i<10; i++) {
    array[i] = rand()%100;
    printf("%d ", array[i]);
    }
    printf("\n");
    /* array is neve free, leaking 10*sizeof(int) of memory each iteration */
    }
    int main(int argc, char **argv) {
    int iterations = atoi(argv[1]);
    for(int i=0; i<iterations; i++)
    print_ten_integers();
    return 0;
    }
  • Can use Valgrind to detect memory leaks

    The C Standard Library

  • Already saw: printf, malloc, atoi, etc
  • can use man <function name> to find function description, prototype, required headers, return value in a Linux terminal

    String Copy

    The format of String copy:
    1
    2
    char *strcpy(char *dest, const char *src); 
    char *strncpy(char *dest, const char *src, size_t n);
    Example:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    int main(int argc, char **argv) {
    char *string1 = "hello";
    char *string2 = string1; // this is not a string copy!
    char string3[10];

    /* not super safe, what happen if string1 > string3? */
    strcpy(string3, string1);

    /* better */
    strncpy(string3, string1, 10);

    printf("string1 @%p: %s\n", string1, string1); // string1 @0x55bcccbf98a4: hello
    printf("string2 @%p: %s\n", string2, string2); // string1 @0x55bcccbf98a4: hello
    printf("string3 @%p: %s\n", string3, string3); // string3 @0x7ffda803dbee: hello
    return 0;
    }

    String Concatenation 连结

    The format of String Concatenation:
    1
    2
    char *strcat(char *dest, const char *src);
    char *strncat(char *dest, const char *src, size_t n);
    Example:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    #include <stdio.h>
    #include <string.h>

    int main(int argc, char **argv) {
    char world[6] = "world";
    char s1[32];
    char s2[32];

    strcpy(s1, "hello ");
    strcpy(s2, "hello ");

    strcat(s1, world); // not very safe
    strncat(s2, world, 32); // better

    printf("s1: %s\n", s1); //s1: hello world
    printf("s2: %s\n", s2); //s2: hello world
    return 0;
    }

    Format-based String Creation

    Format is:

int sprintf(char *str, const char *format, ...);
Fill a string with printf-like specifiers 用类似 printf 的说明符填充字符串

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <stdio.h>

int main(int argc, char **argv) {
int a = 12;
float b = 4.5;
char *s = "hello";
char string[64];

sprintf(string, "a is %d, b is %f, s is %s\n", a, b, s);
printf("%s", string); // a is 12, b is 4.500000, s is hello

return 0;
}

String Manipulation

  • Get string length with strlen
  • Compare two strings with strcmp, returns 0 if they match
    The formats of them are:

size_t strlen(const char *s)

int strcmp(const char *s1, const char *s2);

Example:

1
2
3
4
5
6
7
8
9
#include <string.h>
/* ... */
char *s1 = "hello"; char *s2 = "hello";
char *s3 = "not hello";
printf("strcmp(s1, s2) returns: %d\n", strcmp(s1, s2)); // 0
printf("strcmp(s1, s3) returns: %d\n", strcmp(s1, s3)); // -6
printf("length of s1: %d\n", strlen(s1)); // 5
printf("length of s2: %d\n", strlen(s2)); // 5
printf("length of s3: %d\n", strlen(s3)); // 9

Console Input

Formats:

1
2
char *fgets(char *s, int size, FILE *stream);
int scanf(const char *format, ...);
  • fgets() can read from any open file, but scanf() only reads standard input.
  • fgets() reads ‘a line of text’ from a file; scanf() can be used for that but also handles conversions from string to built in numeric types.
  • Many people will use fgets() to read a line of data and then use scanf() to dissect it. From stackoverflow

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int main(int argc, char **argv) {
int int1, int2;
double double1;
float float1;
char s[128];
printf("Please input a string:\n");
fgets(s, 128, stdin);
printf("Please input an integer:\n");
scanf("%d", &int1);
printf("Please input a float:\n");
scanf("%lf", &double1); /* make sure to us %lf for double and %f for float */
printf("Please enter an integer and a float separated by a space\n");
scanf("%d %f", &int2, &float1);
printf("You have entered: %d, %d, %lf, %f, and %s\n", int1, int2, double1,
float1, s);
return 0;
}

Memory Manipulation

  • Writing bytes to memory, Copying memory
    Formats:
    1
    2
    void *memset(void *s, int c, size_t n);
    void *memcpy(void *dest, void *src, size_t n);
    memset sets for char (or other type) variable, make n size of it to c

memcpy copy from src to dest in the size of n
Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <stdio.h>
#include <string.h> // needed for memcpy and memset
#include <stdlib.h>
int main(int argc, char **argv) {
int buffer_size = 10;
char *ptr1 = malloc(buffer_size);
char *ptr2 = malloc(buffer_size);
if(ptr1 && ptr2) {
memset(ptr1, 0x40, buffer_size); // 0x40 is ascii code for @
memcpy(ptr2, ptr1, buffer_size);
for(int i=0; i<buffer_size; i++) {
printf("ptr1[%d] = %c\n", i, ptr1[i]); //ptr1[0] = @
printf("ptr2[%d] = %c\n", i, ptr2[i]); //ptr2[0] = @ all are set to @!
}
}
return 0;
}

Math functions

1
2
3
4
5
double ceil(double x); // the ceil function returns the smallest integer that is greater than or equal to x
float ceilf(float x); // the same result as ceil
double sqrt(double x); // square root
double pow(double x); // power
double cos(double x); //cosine function

Example:

1
2
3
4
5
6
7
8
9
10
#include <stdio.h>
#include <math.h> // needed for math functions
int main(int argc, char **argv) {
printf("ceil 2.5: %f\n", ceil(2.5)); //ceil 2.5: 3.000000
printf("ceilf 2.5: %f\n", ceilf(2.5)); //ceilf 2.5: 3.000000
printf("floor 2.5: %f\n", floor(2.5)); //floor 2.5: 2.000000
printf("2^5: %f\n", pow(2, 5)); //2^5: 32.000000
printf("sqrt(4): %f\n", sqrt(4)); //sqrt(4): 2.000000
return 0;
}

Time

  • Sleeping
    • format
      1
      2
      unsigned int sleep(unsigned int seconds); // parameter as second
      int usleep(useconds_t usec); //suspends the current process for the number of microseconds passed to it
    • 1 s = 1000000 µs
    • Example
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      #include <stdio.h>
      #include <unistd.h> // needed for sleep and usleep
      int main(int argc, char **argv) {
      printf("hello!\n");
      printf("Sleeping for 2 seconds ...\n");
      sleep(2);
      printf("Now sleeping for .5 seconds ...\n");
      usleep(500000);
      return 0;
      }
  • Current Time
    • format
      1
      time_t time(time_t *tloc);   // time_t is generally a long long unsigned int
    • Example
      1
      2
      3
      4
      5
      6
      7
      8
      #include <stdio.h>
      #include <time.h> // needed for time()
      int main(int argc, char **argv) {
      unsigned long long t = time(NULL);
      printf("Number of seconds elapsed since the epoch (01/01/1970):\n");
      printf("%llu\n", t); // 1650917906
      return 0;
      }
  • Measuring Execution Time
    • Format
      1
      2
      3
      4
      5
      6
      int gettimeofday(struct timeval *tv, struct timezone *tz);
      struct timeval {
      time_t tv_sec; /* seconds (type: generally long unsigned) */
      suseconds_t tv_usec; /* microseconds (type: generally long unsigned) */
      };
      // A pointer to a struct timeval as parameter
    • Example
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      #include <stdio.h>
      #include <sys/time.h> // needed for gettimeofday
      int main(int argc, char **argv) {
      struct timeval tv, start, stop, elapsed;
      gettimeofday(&tv, NULL);
      printf("Seconds since the epoch: %lu.%06lu\n", tv.tv_sec, tv.tv_usec); // get second and microsecond
      gettimeofday(&start, NULL); // set a start time stamp
      for(int i=0; i<1000000000; i++); // implementing a large loop
      gettimeofday(&stop, NULL); // set an ending time stamp
      timersub(&stop, &start, &elapsed); // stop-start = elapsed
      printf("Busy loop took %lu.%06lu seconds\n", elapsed.tv_sec,
      elapsed.tv_usec);
      return 0;
      }

      File I/O

      1
      int open(const char *pathname, int flags, mode_t mode);
  • open creates a reference to a file: file descriptor
    • pathname: path of the file
    • flags:
      • Access mode: O_RDONLY, O_WRONLY, O_RDWR
      • create the file if it doesn’t exist: O_CREAT
      • Truncate截短 the file size to 0 if it exists: O_TRUNC
        1
        2
        3
        4
        // open /home/pierre/test, read-write mode, and create the file if it does not
        // exists with r/w/x permissions
        // I_IRWXU means read, write, execute/search by
        int file_descriptor = open("/home/pierre/test", O_RDWR | O_CREAT, S_IRWXU);
        1
        ssize_t read(int fd, void *buf, size_t count);
  • read attempts to read bytes from a file
    • fd is the file descriptor, previously created with open
    • buf is the address of the buffer that will receive the content road
    • count is the number of bytes to try to read
    • return the number of bytes actually read
      1
      2
      3
      4
      5
      6
      int bytes_read = (file_descriptor, buffer, 10);
      if(bytes_read == -1) {
      /* error */
      } else if(bytes_read != 10) {
      /* technically not an error */
      }
      1
      ssize_t write(int fd, const void *buf, size_t count);
  • write attemps to write count bytes from buf int the file corresponding to fd
    • Returns the number of bytes actually written
    • File system can be full
      1
      2
      3
      4
      5
      6
      int bytes_written = (file_descriptor, buffer, 10);
      if(bytes_written == -1) {
      /* error */
      } else if(bytes_written != 10) {
      /* technically not an error */
      }
      1
      int close(int fd)
    • close terminates file I/O on a given descriptor
      Example:
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      #include <stdio.h>
      #include <sys/types.h> // needed for open
      #include <sys/stat.h> // needed for open
      #include <fcntl.h> // needed for open
      #include <unistd.h> // needed for read and write
      #include <string.h>
      int main(int argc, char **argv) {
      int fd1;
      char *buffer = "hello, world!";
      fd1 = open("./test", O_WRONLY | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
      if(fd1 == -1) {
      printf("error with open\n");
      return -1;
      }
      /* write 'hello, world!' in the file */
      if(write(fd1, buffer, strlen(buffer)) != strlen(buffer)) {
      // write will return the number of bytes actualy written
      printf("issue writing\n");
      close(fd1); return -1;
      }
      /* write it again */
      if(write(fd1, buffer, strlen(buffer)) != strlen(buffer)) {
      printf("issue writing\n");
      close(fd1); return -1;
      }
      close(fd1);
      return 0;
      }
      The result is a txt file called “test” with “hello, world!hello, world!
      “ in it.

Read Example which assume the last example has executed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Assume ./test was previously created with the write example program
char buffer2[10];
int fd2 = open("./test", O_RDONLY, 0x0);
int bytes_read;
if(fd2 == -1) { printf("error open\n"); return -1; }
/* read 9 bytes */
if(read(fd2, buffer2, 9) != 9) { //read 9 bytes from fd2 to buffer 2
printf("error reading\n"); close(fd2); return -1;
}
/* fix the string and print it */
buffer2[10] = '\0'; //read: 'hello, wo'
printf("read: '%s'\n", buffer2);
/* read 9 bytes again */
bytes_read = read(fd2, buffer2, 9);
if(bytes_read != 9) {
printf("error reading\n"); close(fd2); return -1;
}
/* fix the string and print it */
buffer2[10] = '\0';
printf("read: '%s'\n", buffer2); //read: 'rld!hello'
close(fd2);

Random Numbers

  • V1: get numbers between 0 and RAND_MAX

    1
    2
    3
    4
    5
    6
    7
    8
    #include <stdio.h>
    #include <stdlib.h>
    int main(int argc, char **argv) {
    for(int i=0; i<10; i++)
    printf("%d ", rand());
    printf("\n"); // 1804289383 846930886 1681692777 1714636915 1957747793 424238335 719885386 1649760492 596516649 1189641421
    return 0;
    }
  • V2: numbers between 0 and 99 with modulo

    1
    2
    3
    4
    5
    6
    7
    8
    #include <stdio.h>
    #include <stdlib.h>
    int main(int argc, char **argv) {
    for(int i=0; i<10; i++)
    printf("%d ", rand()%100); // 83 86 77 15 93 35 86 92 49 21
    printf("\n");
    return 0;
    }
  • V3: display a different sequence each time we launch the program

    • makes use of the computer’s internal clock to control the choice of the seed. Since time is continually changing, the seed is forever changing.
    • void srand(unsigned int seed) seeds the random number generator used by the function rand.
    • seed − This is an integer value to be used as seed by the pseudo-random number generator algorithm.
    • Because the number is set after compiling the program!
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      #include <stdio.h>
      #include <stdlib.h>
      #include <time.h>

      int main(int argc, char **argv) {
      srand(time(NULL));

      for(int i=0; i<10; i++)
      printf("%d ", rand()%100);
      printf("\n");

      return 0;
      }

      Error Management

  • The variable errno can be used to get more information about the failure of many functions of the standard library

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <errno.h> // needed for errno and perror

    int main(int argc, char **argv) {
    int fd = open("a/file/that/does/not/exist", O_RDONLY, 0x0);

    /* Open always returns -1 on failure, but it can be due to many different
    * reasons */
    if(fd == -1) {
    printf("open failed! errno is: %d\n", errno);
    // the errno is 2
    /* errno is an integer code corresponding to a given reason. To format
    * it in a textual way use perror: */
    perror("open");
    }

    return 0;
    }
  • Errors are listed, more can find in the function man page

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     1   EPERM             Operation not permitted
    2 ENOENT No such file or directory
    3 ESRCH No such process
    4 EINTR Interrupted system call
    5 EIO Input/output error
    6 ENXIO No such device or address
    7 E2BIG Argument list too long
    8 ENOEXEC Exec format error
    9 EBADF Bad file descriptor
    10 ECHILD No child processes

    Classes and Objects in C++

  • Superset of C with many additional functionalities, in particular Object-Oriented Programming (OOP)

    • Everything we saw for C will also work in C++
  • 3 fundamental concepts of OOP:

    • Encapsulation of data and code manipulating it with classes and objects
    • Inheritance, that helps to reuse code for components of a program that share similar characteristics
    • Polymorphism多态性 lets the programmer define multiple implementation of functions with the same name

      Classes and Objects

  • Object Oriented Programming binds together some data and the code that is manipulating it into objects

  • The code for manipulating similar objects is defined once into a class

  • The programmer uses the class as a pattern to create (instantiate) as many objects as needed

  • By convention, class names start with a capital letter and use camel case

  • Member variables represent data attached to the object created from the class

    • Different objects of the same class can have different values for member variables
  • Member variables represent data attached to the object created from the class Member functions are functions, i.e. the code manipulating the data

    • Prototype原型 in the class, implemented outside with ::
  • Example:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    class SampleClass {
    public:
    int data;
    void add(int val);
    };
    void SampleClass::add(int val) {
    data = data + val;
    }
    int main(int argc, char **argv) {
    // create two objects, instances of SampleClass
    SampleClass obj1, obj2;
    obj1.data = 42;
    obj2.data = 12;
    obj1.add(58);
    obj2.add(8);
    printf("obj1 data: %d\n", obj1.data); //obj1 data: 100
    printf("obj2 data: %d\n", obj2.data); //obj2 data: 20
    return 0;
    }

    Encapsulation

  • Encapsulation: prevent direct access to part of the member variables/functions of a class from outside of this class

    • To force the outside code to use a well-defined interface, i.e. the member functions and (rarely) variables still available to it
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      #include <stdio.h>
      class BankAccount {
      // All attributes/methods set as private are unacessible from outside the class
      private:
      int _balance = 0;
      int _transaction_num = 0;

      public:
      void update_balance(int val);
      int get_balance();
      int get_transaction_num();
      };

      /* Add val to the balance and increment the number of transactions */
      void BankAccount::update_balance(int val) {
      _balance += val;
      _transaction_num++;
      }

      /* Return the current value of the balance */
      int BankAccount::get_balance() {
      return _balance;
      }

      /* Return the number of transactions */
      int BankAccount::get_transaction_num() {
      return _transaction_num;
      }

      int main(int argc, char **argv) {
      BankAccount ba;
      ba.update_balance(100); // payday 发薪日
      ba.update_balance(-20); // groceries 杂货
      printf("The account balance is %d after %d transactions\n",
      ba.get_balance(), ba.get_transaction_num());
      // output: The account balance is 80 after 2 transactions
      return 0;
      }
  • Have as much member variables private as possible

  • Have funcions to read/write these only if needed

    • Use accessors or getters and setters
    • Example:
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      #include <stdio.h>
      class MyClass {
      private:
      int _foo = 0;
      double _bar = 0.0;
      public:
      void set_foo(int val); // setter
      int get_foo(); // getter
      double get_bar(); // getter
      /* ... */
      };

      void MyClass::set_foo(int val) {
      _foo = val;
      }

      int MyClass::get_foo() {
      return _foo;
      }

      double MyClass::get_bar() {
      return _bar;
      }

      int main(int argc, char **argv) {
      MyClass c1, c2;
      c1.set_foo(42);
      c2.set_foo(24);
      printf("c1's foo: %d, c2's foo: %d\n", c1.get_foo(), c2.get_foo());
      // c1's foo: 42, c2's foo: 24
      return 0;
      }

      Constructors

  • Initialise member variables at object creation time with a constructor

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    #include <stdio.h>
    class BankAccount {
    // All attributes/methods set as private are unacessible from outside the class
    private:
    int _balance;
    int _transaction_num;

    public:
    BankAccount(int initial_balance);
    void update_balance(int val);
    int get_balance();
    int get_transaction_num();
    };

    /* Constructor */
    // now can set any value as intitial balance
    BankAccount::BankAccount(int initial_balance) {
    _balance = initial_balance;
    _transaction_num = 0;
    }

    Destructors

  • Member functions called when the object is destroyed
    I.e. when it goes out of scope for the stack-allocated objects we have seen this far

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    class BankAccount {
    // All attributes/methods set as private are unacessible from outside the class
    private:
    int _balance;
    int _transaction_num;
    char *_account_name;

    public:
    BankAccount(char *name, int initial_balance);
    ~BankAccount();
    void update_balance(int val);
    int get_balance();
    int get_transaction_num();
    char *get_name();
    };

    /* Constructor */
    BankAccount::BankAccount(char *name, int initial_balance) {
    _balance = initial_balance;
    _transaction_num = 0;
    _account_name =
    (char *)malloc((strlen(name) + 1) * sizeof(char));
    strcpy(_account_name, name);
    }

    /* destructor */
    BankAccount::~BankAccount() {
    free(_account_name);

    // printf here to see when the destructor is called:
    printf("destructor called\n");
    }

    /* Add val to the balance and increment the number of transactions */
    void BankAccount::update_balance(int val) {
    _balance += val;
    _transaction_num++;
    }

    /* Return the current value of the balance */
    int BankAccount::get_balance() {
    return _balance;
    }

    /* Return the number of transactions */
    int BankAccount::get_transaction_num() {
    return _transaction_num;
    }

    /* Return the name of the account */
    char *BankAccount::get_name() {
    return _account_name;
    }

    int main(int argc, char **argv) {
    char name[11] = "my_account";
    BankAccount ba(name, 100); //set name and initial balance

    ba.update_balance(100); // payday
    ba.update_balance(-20); // groceries

    printf("The account \"%s\" balance is %d after %d transactions\n",
    ba.get_name(), ba.get_balance(), ba.get_transaction_num());

    return 0;
    }
    // The output is:
    //The account "my_account" balance is 180 after 2 transactions
    // destructor called

    This Pointer

  • In any member function (including destructors and constructors), one can obtain a pointer to the current object with this

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    #include <stdio.h>
    class Test {
    private:
    int x;
    int y;

    public:
    Test(int x, int y);
    void print(void);
    };

    /* This constructor's parameter have the same names as the class member
    * variables */
    Test::Test(int x, int y) {
    this->x = x; // x corresponds to the parameter, this->x to the member
    this->y = y; // this->x equal to (*this).x
    }

    void Test::print() {
    printf("x: %d, y: %d\n", x, y);
    printf("x: %d, y: %d\n", this->x, this->y); // same thing
    }

    int main(int argc, char **argv) {
    Test t(1, 2);
    t.print();
    return 0;
    }

    C++: Dynamic Object Allocation and Dynamic Arrays

    Dynamic Object Allocation

  • malloc and free are available in C++

    • Good for allocating arrays/buffers of primitive原始 types(int, char, etc.)
  • new and delete to allocate objects dynamically

    • Use malloc and free under the hood
    • Allocate memory and invoke contructors/destructors
      • Should be preferred to malloc/free 优先于malloc/free

        Dynamic Arrays

  • vector class

  • Variable size arrays, their allocated memory (on the heap) is managed automatically by the language

  • They also embed their size

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    #include <vector> // c++ include
    // vector is in the std namespace
    using namespace std;
    /* apply a few random transactions on account pairs within the array
    * Note that we have a dynamic array which embeds its size so no need to pass
    * the number of accounts as parameter anymore
    * Also note that we pass a pointer to the vector rather than the vector
    * to avoid a costly copy with long vectors! */
    void rnd_transac(vector<BankAccount *> *b) {
    int num = rand()%10 + 1;
    int n = b->size(); // returns the amount of elements in the vector
    for(int i=0; i<num; i++) {
    BankAccount *source = (*b)[rand()%n]; // we can access a vector with []
    BankAccount *target = (*b)[rand()%n];
    int amount = rand()%15;
    target->update_balance(amount);
    source->update_balance(-amount);
    }
    }
    int main(int argc, char **argv) {
    // a vector of BankAccount pointers:
    vector<BankAccount *> accounts;
    int n;
    n = atoi(argv[1]);
    for(int i=0; i<n; i++) {
    char name[32];
    sprintf(name, "account %d", i);
    BankAccount *b = new BankAccount(name, 100);
    accounts.push_back(b); // add an element at the end of the vector
    }
    rnd_transac(&accounts);
    for(int i=0; i<n; i++) {
    BankAccount *b = accounts[i];
    printf(/* ... */);
    // this frees what is pointed by b but does not remove b from the
    // vector, we'll clear it out below as we are currently iterating on
    // that vector
    delete b;
    }
    accounts.clear(); // remove all elements from the vector
    }
    }

    Inheritance in C++

  • Goal: reusing code between similar classes

    • Class Person (Base class) — Class Student (Derived classes)
    •                  |___ Class Professor
      
  • Base class declaration:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    class Person {
    private:
    char *_name;
    int _id;
    public:
    Person(const char *name, int id);
    ~Person();
    char *get_name();
    int get_id();
    };
  • Derived classes declarations:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    class Student : public Person {
    private:
    double _gpa;
    public:
    Student(const char *name, int id,
    double gpa);
    void update_gpa(double new_gpa);
    double get_gpa();
    };
    class Professor : public Person {
    private:
    int _courses_taught;
    public:
    Professor(const char *name, int id,
    int courses_taught);
    void update_courses_taught(int n);
    int get_courses_taught();
    };
  • Methods implementations

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    Person::Person(const char *name, int id) {
    _name = new char[strlen(name)+1];
    strcpy(_name, name);
    _id = id;
    }
    Person::~Person() {
    delete [] _name;
    }
    char *Person::get_name() {
    return _name;
    }
    int Person::get_id() {
    return _id;
    }

    Student::Student(const char *name, int id, double gpa) : Person(name, id)
    { _gpa = gpa; }
    void Student::update_gpa(double new_gpa)
    { _gpa = new_gpa; }
    double Student::get_gpa()
    { return _gpa; }
    Professor::Professor(const char *name, int id, int courses_taught) : Person(name, id)
    { _courses_taught = courses_taught; }
    void Professor::update_courses_taught(int n)
    { _courses_taught = n; }
    int Professor::get_courses_taught()
    { return _courses_taught; }
  • usage in external code:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    int main(int argc, char **argv) {
    Person p1("James", 1000);
    Student p2("George", 1242, 3.8);
    Professor p3("Jane", 1019, 2);
    /* These methods is callable on all Person/Student/Professor object */
    printf("%s's id: %d\n", p1.get_name(), p1.get_id());
    printf("%s's id: %d\n", p2.get_name(), p2.get_id());
    printf("%s's id: %d\n", p3.get_name(), p3.get_id());
    /* Only callable on a Student object */
    printf("%s's GPA: %lf\n", p2.get_name(), p2.get_gpa());
    /* Only callable on a Professor object */
    printf("%s teaches %d courses\n", p3.get_name(), p3.get_courses_taught());
    return 0;
    }

    Public Inheritance and Members Visibility

  • Public members of the base class visible in the derived class

  • Private members of the base class not visible in the derived class

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    class Base {
    private: int b;
    public: int a;
    void f1() {
    printf("f1 called\n");
    a = 10; // working
    b = 10; // working
    }
    };
    class Derived : public Base {
    public:
    void f2() {
    printf("f2 called\n");
    f1(); // working
    a = 12; // working
    // b = 12; // can't do that,
    // b is not visible, because it's private
    }
    };

    Polymorphism in C++

  • Having several implementations for a function with a given name

  • Polymorphism

    • Function overloading compile-time polymorphism
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      // multiple implementation of the function print, differing in types and number of arguments
      void print(int x) {
      printf("an int: %d\n", x);
      }
      void print(char c) {
      printf("a char: %c\n", c);
      }
      void print(double d1, double d2) {
      printf("a couple of doubles:"
      "%lf and %lf\n", d1, d2);
      }
      int main(int argc, char **argv) {
      print(1);
      print('a');
      print(5.5, 1.2);
      return 0;
      }
      • A more complex example
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        #include <stdio.h>
        #include <string.h>

        class Person {
        private:
        char _name[128];
        int _id;
        char _postcode[7];

        public:
        Person(const char *name);
        Person(const char *name, int id);
        Person(const char *name, const char *postcode);
        Person(const char *name, int id, const char *postcode);
        void print();
        };

        Person::Person(const char *name) {
        strcpy(_name, name);
        _id = 0;
        _postcode[0] = '\0';
        }

        Person::Person(const char *name, int id) {
        strcpy(_name, name);
        _id = id;
        _postcode[0] = '\0';
        }

        Person::Person(const char *name, const char *postcode) {
        strcpy(_name, name);
        _id = 0;
        strcpy(_postcode, postcode);
        }

        Person::Person(const char *name, int id, const char *postcode) {
        strcpy(_name, name);
        _id = id;
        strcpy(_postcode, postcode);
        }

        void Person::print() {
        printf("name: %s, id: %d, postcode: %s\n", _name, _id, _postcode);
        }

        int main(int argc, char **argv) {
        Person p1("James");
        Person p2("Jane", 1234);
        Person p3("George", "M139PL");
        Person p4("Julie", 1235, "M188HN");

        p1.print();//name: James, id: 0, postcode:
        p2.print();//name: Jane, id: 1234, postcode:
        p3.print();//name: George, id: 0, postcode: M139PL
        p4.print();//name: Julie, id: 1235, postcode: M188HN
        return 0;
        }
    • Function overriding Runtime polymorphism
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      #include <stdio.h>

      class Base {
      public:
      void f();
      };

      class Derived : public Base {
      public:
      void f(); // overrides f from Base class
      };

      void Base::f() {
      printf("Base's f() called\n");
      }

      void Derived::f() {
      printf("Derived's f() called\n");
      }

      int main(int argc, char **argv) {
      Base b;
      Derived d;
      Base *base_ptr = &d;

      b.f(); // base f called
      d.f(); // derived f called
      base_ptr->f(); // compiler sees that base_ptr is of the type Base * so !Base's f() is called!
      return 0;
      }

      Virtual Function Overriding

  • A virtual function is a member function which is declared within a base class and is re-defined (overridden) by a derived class. When you refer to a derived class object using a pointer or a reference to the base class, you can call a virtual function for that object and execute the derived class’s version of the function. By GeeksforGeeks

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    #include <stdio.h>

    class Base {
    public:
    virtual void f();
    };

    class Derived : public Base {
    public:
    void f();
    };

    void Base::f() {
    printf("Base's f() called\n");
    }

    void Derived::f() {
    printf("Derived's f() called\n");
    }

    void call_f(Base *b) {
    b->f();
    }

    int main(int argc, char **argv) {
    Base b;
    Derived d;

    call_f(&b); // Base f() called
    call_f(&d); // Derive f() called
    return 0;
    }

    Abstract Classes

  • A class with a least one pure virtual function is named an abstract class

    1
    2
    3
    Class C {
    virtual void pure_virtual_function(int a, int b) = 0;
    }
  • Cannot be instantiated

  • Requires all derived classes to override all pure virtual functions

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    #include <stdio.h>

    class Animal {
    public:
    void sleep(); // all animals sleep similarly
    virtual void sound() = 0; // all animals make noise, but different noises
    };

    class Dog : public Animal {
    public:
    void sound();
    };

    class Cat : public Animal {
    public:
    void sound();
    };

    void Animal::sleep() {
    printf("zzzzz\n");
    }

    void Dog::sound() {
    printf("woof\n");
    }

    void Cat::sound() {
    printf("meow\n");
    }

    int main(int argc, char **argv) {
    Dog d;
    Cat c;

    d.sound(); // woof
    d.sleep(); // zzzzz
    c.sound(); // meow
    c.sleep(); // zzzzz
    return 0;
    }

    The Preprocessor

  • gcc src.c -o prog

  • src.c(source file) -preprocessor-> src-tmp.c (Preprocesses Source) -compiler-> prog(executable program)

  • Run by the C/C++ compiler transparently 透明地

  • Performs textual transformations in the sources

    • Include headers to access functions, structures, classes, and other definition from other source files
    • Expand tokens named macros into more complex bits of code
    • Conditionally enable/disable some bits of code

      Header Inclusion

  • Header files: source file with .h extension, contain what is necessary to use foreign libraries/source files

    • Headers should contain only declarations: functions prototypes, variables/custom types/structs/C++ classes declaration
    • No implementation!
      1
      2
      3
      // In the .c source file, include headers like that:
      #include <stdio.h> // Use <> to include a file from the default include path
      #include "my-custom-header.h" // Use "" for the local and user-supplied include directories
  • Can use whereis stdio.h to find location of stdio.h

    Macro Expansions

  • Textual substitutions, useful for compile-time defined constants

    • Generally indicated in capital letters as a convention
1
2
3
4
5
6
7
8
9
10
11
12
#include <stdio.h>
// If we want to change the array size,
// we only need to update this:
#define ARRAY_SIZE 10
int main(int argc, char **argv) {
int array[ARRAY_SIZE];
for(int i=0; i<ARRAY_SIZE; i++) {
array[i] = i;
printf("array[%d] = %d\n", i, array[i]);
}
return 0;
}
  • In general, try to have a less hardcoded values as possible
    • When you see that you are writing a constant value more then once, check if using a macro is possible!
    • Even if you have a constant used only once a macro can make it look more meaningful
  • Macros can also be used to write macro “functions”
  • Be careful with operator precedence! Use parentheses
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    #define SIZE_1  10
    #define SIZE_2 10
    // We can use a macro in another macro's definiiton:
    // More than 1 macro in another macro's definition? use parentheses
    #define TOTAL (SIZE_1 + SIZE_2)
    int main(int argc, char **argv) {
    // correct, expands to:
    // (10 + 10) * 2
    printf("total twice = %d\n",
    TOTAL * 2);
    return 0;
    }

    Modular Compilation

  • Large programs: need to organise the code in several source files
    • Linux kernel sources (52K source files!)
  • Modular compilation can break up the program code into different files representing well isolated compilation units?
  • Also, can’t run gcc <52K source files> each time we do a small update to one or a few source files
  • Automated compilation can automate efficiently the build process

    Compilation Phase

  • gcc -c src.c -o src.o, gcc src.o -o prog
  • src.c(source file) (text) -compiler-> src.o (Object file)(binary, non-executable) -linker-> prog(executable program) (binary, executable)

    Breaking down the program into modules

  • Hypothetical假设 example of a server application implemented into 3 .c source files, also named modules:
    breaking example
    1
    2
    3
    4
    gcc -c network.c -o network.o
    gcc -c parser.c -o parser.o
    gcc -c main.c -o main.o
    gcc main.o network.o parser.o -o prog
  • Each module (.c source file) exposes公开 an interface callable from the other source files
    • Should be as small as possible to hide internal code/data
      • Same principle as for C++ classes public/private member functions/variables
  • Assume the following interactions:
    interaction

    Header File for Modular Compilation

  • There is generally 1 header file per module, defining its interface.
  • Can be included in several .c files so must contain only declarations
    • Function prototypes, struct/typedefs/enums declarations, variable declarations
    • No definitions (i.e. function body/global variable initialisation)
  • For example network.h:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    #ifndef NETWORK_H // include guard
    #define NETWORK_H
    typedef struct {
    int id;
    char content[128];
    } request; // only declarations
    void init_network();
    int rcv_request(request *r);
    #endif /* NETWORK_H */
  • in network.c:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    /* std includes here */
    #include "network.h"
    // this function and variable are internal
    // so they are not declared in network.h
    // the keyword static force their use
    // to be only within the network.c file
    static void generate_request(request *r);
    static int request_counter = 0;
    void init_network() {
    /* init code here ... */
    }
    int rcv_request(request *r) {
    generate_request(r);
    /* ... */
    }
    static void generate_request(request *r) {
    /* ... */
    }
  • in parser.h
    1
    2
    3
    4
    5
    6
    #ifndef PARSER_H
    #define PARSER_H
    /* needed for the definition of request: */
    #include "network.h"
    void parse_req(request *r);
    #endif
  • in parser.c
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    #include <stdio.h>
    #include "parser.h"
    /* These two function are internal and their prototype is not in the header */
    static void internal1(request *r);
    static void internal2(request *r);
    void parse_req(request *r) {
    internal1(r);
    internal2(r);
    }
    static void internal1(request *r) {
    /* ... */
    }
    static void internal2(request *r) {
    /* ... */
    }
  • in main.c
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    #include <stdio.h>
    #include <unistd.h>

    #include "network.h"
    #include "parser.h"

    int main(int argc, char **argv) {
    request req;

    printf("[main] calling init_network()\n");
    init_network();

    while(1) {
    printf("[main] calling rcv_request()\n");
    if(rcv_request(&req) != 0)
    break;

    printf("[main] calling parse_req()\n");
    parse_req(&req);
    sleep(1);
    }

    printf("[main] exiting now\n");
    return 0;
    }

    C++ Source Management

  • Common practice to divide the source code of a C++ program into multiple files as follows:
    • one .h file per class or groups of related classes, containing class declarations
    • one .cpp file per class or group of related classes, containing member functions implementations

      Automated Compilation

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      # Initial build:
      gcc -c network.c -o network.o
      gcc -c parser.c -o parser.o
      gcc -c main.c -o main.o
      gcc main.o network.o parser.o -o prog
      # Assume we updated parser.c, to rebuild, we don't need to recompile everything:
      gcc -c parser.c -o parser.o
      gcc main.o network.o parser.o -o prog
      # Now we updated parser.h, remember it's included in both parser.c and main.c, so:
      gcc -c parser.c -o parser.o
      gcc -c main.c -o main.o
      gcc main.o network.o parser.o -o prog

      Makefiles: Automated Build and Dependency Management

  • Let’s consider the dependencies in our example program
    makefile1
    makefile2
  • Text file named Makefile in the local source directory
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    # The first rule is executed when the
    # command make is typed in the local folder:
    all: prog
    # executable deps and command to buil1d it:
    prog: main.o network.o parser.o
    gcc main.o network.o parser.o -o prog
    # network.o deps and command to build it:
    network.o: network.c network.h
    gcc -c network.c -o network.o
    parser.o: parser.c parser.h network.h
    gcc -c parser.c -o parser.o
    main.o: main.c network.h parser.h
    gcc -c main.c -o main.o
    # Special rule to remove all build files
    clean:
    rm -rf *.o prog
  • Rule Syntax:
    1
    2
    <target>: <dependencies>
    [TAB] <command to build target from deps>
  • Use make file
    • Type make in terminal
      1
      2
      3
      4
      gcc -c main.c -o main.o
      gcc -c network.c -o network.o
      gcc -c parser.c -o parser.o
      gcc main.o network.o parser.o -o prog
    • Type touch network.c in terminal to update network.c, then make
      1
      2
      gcc -c network.c -o network.o
      gcc main.o network.o parser.o -o prog
    • Type make clean to remove all build files
      1
      rm -rf *.o prog

      Type Conversion and Casting

      Implicit Type Conversions

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      #include <stdio.h>
      int main(int argc, char **argv) {
      char char_var = 12; // 1 byte, -128 -to 127
      int int_var = 1000; // 4 bytes, -2*10^9 to 2*10^9
      long long ll_var = 0x840A1231AC154; // 8 bytes, -9*10^18 to 9*10^18
      // here, char_var is first promoted to int, then the result of the first
      // addition is promoted to long long
      long long result = (char_var + int_var) + ll_var;
      printf("%lld\n", result); //2322860636554568
      return 0;
      }

      Integer Promotion

      integer promotion
  1. Same type? no promotion
  2. Same signed/unsigned attribute? lesser rank promoted to greater rank
  3. Unsigned rank >= other operand rank? signed promoted to the unsigned rank
  4. Signed type can represent all the values of the unsigned type? unsigned operand promoted to signed type
  5. Otherwise, both operands converted to the unsigned type corresponding to the signed operand’s type
    1
    2
    3
    int si = -1;
    unsigned int ui = 1;
    printf("%d\n", si < ui); // prints 0! si converted to unsigned int; 0 means false
  • Always remember the storage sizes of types
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    int main(int argc, char **argv) {
    int i = -1000;
    unsigned int ui = 4294967295;
    printf("%d\n", i + ui); // prints -1001
    // i: 11111111111111111111110000011000 (A)
    // ui: 11111111111111111111111111111111 (B)
    // i + ui: 111111111111111111111110000010111 (C)
    // final: 11111111111111111111110000010111 (D)
    // (A) originally 2's complement promoted to unsigned
    // (B) standard unsigned representation, max number an unsigned int can store
    // (C) addition result
    // (D) result overflows 32 bits as an int (expected by %d), loosing MSB
    // Solution: use %ld rather than %d to store the result on 64 bits
    return 0;
    }

    Integer to Floating-Point Conversion

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    int main(int argc, char **argv) {
    //prints 7: 25/10 rounds to 2; 2 * 15 = 30; 30/4 rounds to 7
    printf("%d\n", 25 / 10 * 15 / 4);
    // prints 7.5: 25/10 rounds to 2; 2*15 = 30; 30 gets converted to
    // 30.0 (double) and divided by 4.0 (double) giving result 7.5 (double)
    printf("%lf\n", 25 / 10 * 15 / 4.0);
    // prints 9.375: 25.0 / 10.0 (converted from 10) is 2.5, multiplied by 15.0
    // (converted from 15) gives 37.5, divided by 4.0 (converted from 4) gives
    // 9.375
    printf("%lf\n", 25.0 / 10 * 15 / 4);
    // prints garbage, don't try to interpret a double as an int!
    printf("%d\n", 25.0 / 10 * 15 / 4);
    return 0;
    }

    Parameter Passing

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    void f1(int i) {
    printf("%d\n", i);
    }
    void f2(double d) {
    printf("%lf\n", d);
    }
    void f3(unsigned int ui) {
    printf("%u\n", ui);
    }
    int main(int argc, char **argv) {
    char c = 'a';
    unsigned long long ull = 0x400000000000;
    f1(c); // prints 97 (ascii code for 'a')
    f2(c); // prints 97.0
    f3(ull); // overflows int ... prints 0 (lower 32 bits of 0x400000000000)
    return 0;
    }

    Type Casting 类型铸造

  • Casting allows to force the type of an expression
  • syntax: (type)expression
    1
    2
    3
    4
    5
    6
    7
    8
    // prints 3.75: 3 gets converted to 3.7500
    printf("%lf\n", (float)15/4);
    // prints 4: 2.5 converted to 2 (int), multiplied by 12 gives 24, divided
    // by 5 gives 4
    printf("%d\n", ((int)2.5 * 12)/5);
    // prints 4.8: 2*12 = 24, converted to 24.0, divided by 5.0 gives 4.8
    printf("%lf\n", ((int)2.5 * 12)/(double)5);
    return 0;
    1
    2
    3
    int si = -1;
    unsigned int ui = 1;
    printf("%d\n", si < (int)ui); // prints 1

    Generic Pointers

  • void * can be used as a generic pointer to pass data of different types
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    #include <stdio.h>

    typedef enum {
    CHAR,
    INT,
    DOUBLE
    } type_enum;

    void print(void *data, type_enum t) {
    switch(t) {
    case CHAR:
    printf("character: %c\n", *(char *)data);
    break;
    case INT:
    printf("integer: %d\n", *(int *)data);
    break;
    case DOUBLE:
    printf("double: %lf\n", *(double *)data);
    break;
    default:
    printf("Unknown type ...\n");
    }
    }

    int main(int argc, char **argv) {
    char c = 'x';
    int i = 42;
    double d = 11.5;

    print((void *)&c, CHAR); //character: x
    print((void *)&i, INT); //integer: 42
    print((void *)&d, DOUBLE); //double: 11.500000
    return 0;
    }

    Debugging with GDB

  • Compile with -g flag to include debug symbols
  • To launch gdb on a particular binary : gdb program in a shell
  • b test.c:12 places a breakpoint on the instruction at line 18 in test.c
  • b my_func places a breakpoint on the first instruction of my_func
  • b test.c:12 if x == 42 is a conditional breakpoint, will pop only if x is equal to 42 when the corresponding statement is executed
  • run starts the program
  • In single-step mode:
    • up and down allows to navigate the function call stack
    • next advances the execution, jumping over function calls
    • step advances the execution, diving into function calls
    • continue exits single step mode

      Case study: HPC

  • C/C++ are extensively used in High Performance Computing (HPC) because of their speed
  • Due to many reasons, including the fact that they gives the programmer control over the data memory layout

    Controlling Memory Layout

    memory example
  • For performance reasons it is very important to fit as much of the data set as possible in the cache
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    typedef struct {
    char c[60];
    int i;
    double d;
    } my_struct;
    #define N 100000000
    my_struct array[N];
    int main(int argc, char **argv) {
    struct timeval start, stop, res;
    my_struct s;
    gettimeofday(&start, NULL);
    /* Randomly access N elements */
    for(int i=0; i<N; i++)
    memcpy(&s, &array[rand()%N],
    sizeof(my_struct));
    gettimeofday(&stop, NULL);
    timersub(&stop, &start, &res);
    printf("%ld.%06ld\n", res.tv_sec,
    res.tv_usec); // takes 14.896576s
    return 0; }
  • Struct size: 60 + 4 (int) + 8 (double) = 72 bytes
    • Larger and not a multiple of the cache line size (64 bytes)
    • Most objects in the array will require to fetch 2 cache lines from main memory cacheLine
      1
      2
      3
      4
      5
      6
      typedef struct {
      char c[52]; // down from 60, we have 52 + 4 + 8 == 64 bytes i.e. a cache line
      int i;
      double d;
      } my_struct;
      my_struct array[N] __attribute__ ((aligned(64))); /* force alignment of the array itself */ //takes 13.696097s
  • Should be 25% faster, mine is not so explicit because I’m using sandbox50

    Case study - C Standard Library

  • Provides and implement all the functions from stdio.h, stdlib.h, etc.
  • C + a tiny bit of assembly
  • Default libc in most Linux distribution Glibc
  • Simpler: Musl
  • Application request services from the OS through system calls
  • Generally not made directly by user code but rather the libc
  • Upon syscall 在系统调用时: user/kernel “world switch”
  • System calls have a calling convention
    • Machine instructions used to trigger a syscall
  • On Intel x86-64, this convention is as follows:
  1. Syscall number in %rax
  2. Syscall parameters in order in %rsi, %rdx, %r10, %r8, and %r9
  3. Execute the syscall instruction
  4. Return value in %rax

    Musl’s printf Implementation

    printf("value: %d %lf\n", int_val, double_val);
  • In src/stdio/printf.c:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    #include <stdio.h>
    #include <stdarg.h>
    int printf(const char *restrict fmt, ...)
    {
    int ret;
    va_list ap; // variable argument list management
    va_start(ap, fmt);
    ret = vfprintf(stdout, fmt, ap); // stdout: data structure representing the std output
    va_end(ap);
    return ret;
    }
  • In src/stdio/vfprintf.c:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    /* lots of logic for format string parsing ... */
    int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap) {
    /* ... */
    ret = printf_core(f, fmt, &ap2, nl_arg, nl_type);
    /* ... */
    }
    static int printf_core(FILE *f, const char *fmt, va_list *ap, /* ... */) {
    /* ... */
    if (f) out(f, a, l);
    /* ... */
    }
    static void out(FILE *f, const char *s, size_t l)
    {
    if (!(f->flags & F_ERR)) __fwritex((void *)s, l, f);
    }
  • All these transforms something like “hello: %d, %lf”, 12, 12.0” into “hello: 12, 12.00000”
  • In src/stdio/fwrite.c:
    1
    2
    3
    4
    5
    6
    size_t __fwritex(const unsigned char *restrict s, size_t l, FILE *restrict f)
    {
    /* ... */
    size_t n = f->write(f, s, i); /* this is actually a call to __stdout_write(f, s, i);
    /* ... */
    }
  • In src/stdio/__stdout_write.c:
    1
    2
    3
    4
    5
    6
    7
    8
    size_t __stdout_write(FILE *f, const unsigned char *buf, size_t len)
    {
    struct winsize wsz;
    f->write = __stdio_write;
    if (!(f->flags & F_SVB) && __syscall(SYS_ioctl, f->fd, TIOCGWINSZ, &wsz))
    f->lbf = -1;
    return __stdio_write(f, buf, len);
    }
  • In src/stdio/__stdio_write.c:
    1
    2
    3
    4
    5
    6
    size_t __stdio_write(FILE *f, const unsigned char *buf, size_t len)
    {
    /* ... */
    cnt = syscall(SYS_writev, f->fd, iov, iovcnt);
    /* ... */
    }
  • In C, printing to the standard output corresponds to a write (or writev) syscall on a file descriptor representing the standard output]
  • In src/internal/syscall.h:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    /* ... */
    #define __SYSCALL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
    #define __SYSCALL_NARGS(...) __SYSCALL_NARGS_X(__VA_ARGS__,7,6,5,4,3,2,1,0,)
    #define __SYSCALL_CONCAT_X(a,b) a##b
    #define __SYSCALL_CONCAT(a,b) __SYSCALL_CONCAT_X(a,b)
    #define __SYSCALL_DISP(b,...) __SYSCALL_CONCAT(b,__SYSCALL_NARGS(__VA_ARGS__))(__VA_ARGS__)
    #define __syscall(...) __SYSCALL_DISP(__syscall,__VA_ARGS__)
    #define syscall(...) __syscall_ret(__syscall(__VA_ARGS__))
    /* ... */
    1
    2
    3
    // This is a set of macros that transforms "syscall(SYS_writev, f->fd, iov, iovcnt); we saw
    // earlier into:
    __syscall_3(SYS_writev, f->fd, iov, iovcount);
  • We now reach the architecture-specific part, let’s look at the x86-64 code, in arch/x86_64/syscall_arch.h:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    static __inline long __syscall3(long n, long a1, long a2, long a3) {
    unsigned long ret;
    __asm__ __volatile__ ( "syscall" : // 5) the syscall instruction
    "=a"(ret) : // 6) we'll get the return value in rax
    "a"(n), // 1) syscall number in rax
    "D"(a1), // 2) argument 1 in rdi
    "S"(a2), // 3) argument 2 in rsi
    "d"(a3) : // 4) argument 3 in rdx
    "rcx", "r11", "memory");
    return ret;
    }
    1
    2
    3
    4
    5
    6
    7
    401bae:    41 be 14 00 00 00        mov    $0x14,%r14d
    ...
    401beb: 48 63 7b 78 movslq 0x78(%rbx),%rdi
    401bef: 49 63 d5 movslq %r13d,%rdx
    401bf2: 4c 89 f0 mov %r14,%rax
    401bf5: 48 89 ee mov %rbp,%rsi
    401bf8: 0f 05 syscall

    Printing Without Libc

  • Now that we know how to print on stdout, we can make a syscall directly from our program without involving libc
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    /* Without libc the default entry point is _start */
    void _start() {
    unsigned long ret;
    /* write(1, "hello!\n", 7); */
    __asm__ __volatile(
    "syscall" : // the syscall instruction
    "=a"(ret) : // we'll get the return value in rax
    "a"(1), // syscall number (1 for write)
    "D"(1), // argument1: file descriptor (1 for stdout)
    "S"((long)"hello!\n"), // argument2: char array to print
    "d"(7) : // argument3: number of bytes to write
    "rcx", "r11", "memory");
    /* exit(0); */
    __asm__ __volatile( "syscall" : : // syscall instruction
    "a"(60), // exit's syscall number
    "D"(0) : // exit parameter: 0
    "rcx", "r11", "memory");
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    0000000000001000 <_start>:
    1000: 55 push %rbp
    1001: 48 89 e5 mov %rsp,%rbp
    1004: 48 8d 35 f5 0f 00 00 lea 0xff5(%rip),%rsi # argument 2, address of "hello!"
    100b: b8 01 00 00 00 mov $0x1,%eax # syscall number, 1 == write
    1010: bf 01 00 00 00 mov $0x1,%edi # argument 1, fd 1 == stdout
    1015: ba 07 00 00 00 mov $0x7,%edx # argument 3, 7 bytes to write
    101a: 0f 05 syscall
    101c: 48 89 45 f8 mov %rax,-0x8(%rbp)
    1020: b8 3c 00 00 00 mov $0x3c,%eax # syscall number, 60 == exit
    1025: ba 00 00 00 00 mov $0x0,%edx # argument 1: 0
    102a: 89 d7 mov %edx,%edi
    102c: 0f 05 syscall
    102e: 90 nop
    102f: 5d pop %rbp
    1030: c3 retq

    Case Study - OS Kernel

  • C is the default language to write OS kernels

    System Calls in a Nutshell

  1. Stack pointer (in CPU) -> Task stack (in Memory)
    lnstr.pointer (in CPU) -> Task code (in Memory)
  2. Stack pointer (in CPU) -> Kernel stack (in Memory)
    lnstr.pointer (in CPU) -> Kernel code (in Memory)
  3. CPU push all registers to Kernel stack
  4. Kernel stack pop all registers to CPU
  5. Stack pointer (in CPU) -> Task stack (in Memory)
    lnstr.pointer (in CPU) -> Task code (in Memory)

    Context Switches in a Nutshell

  6. Stack pointer (in CPU) -> Task A’s stack (in Memory)
    lnstr.pointer (in CPU) -> Task A’s code (in Memory)
  7. Stack pointer (in CPU) -> Task A’s Kernel stack (in Memory)
    lnstr.pointer (in CPU) -> Kernel code (in Memory)
  8. CPU push all registers to Task A’s Kernel stack
  9. Task B’s Kernel stack pop all registers to CPU
  10. Stack pointer (in CPU) -> Task B’s Kernel stack (in Memory)
    lnstr.pointer (in CPU) -> Kernel code (in Memory)
  11. Stack pointer (in CPU) -> Task B’s stack (in Memory)
    lnstr.pointer (in CPU) -> Task B’s code (in Memory)

    System Call and Context Switch Implementations in a Real OS

  • HermiTux
  • Context switch & syscall handling can only be done by the kernel
  • The only way to enter the kernel is through an interrupt:
  • Interrupt traps to the kernel which saves the state of the interrupted task for resuming properly later. Interrupt traps(保存状态)-enter kernel -恢复使用
  • Example with sched_yield syscall (syscall + context switch):sched_yield
  • Syscall entry point is in arch/x86/kernel/entry.asm:
    1
    2
    3
    4
    5
    6
    7
    8
    isyscall:
    cli ; disable interrupts
    push rax ; start pushing the task state on the stack
    push rcx
    ; ... push many other registers
    mov rdi, rsp ; prepare a datastructure containing register values for use by the kernel
    sti ; enable interrupts
    call syscall_handler ; jump to kernel (C) code managing the syscall
  • syscall_handler is in arch/x86/kernel/isrs.c:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    void syscall_handler(struct state *s) {
    switch(s->rax) {
    // ... one "case" for each syscall number
    case 24: // sched_yield's number is 24
    s->rax = sys_sched_yield();
    break;
    /* ... */
    }
    }
  • sys_sched_yield (in kernel/syscalls/sched_yield) simply calls check_scheduling, in kernel/tasks.c:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    void check_scheduling(void)
    {
    /* ... */
    uint32_t prio = get_highest_priority();
    task_t* curr_task = per_core(current_task);
    if (prio > curr_task->prio) {
    reschedule();
    /* ... */
    }
  • Finally, reschedule calls switch_context (in kernel/tasks.c)
  • switch_context is in arch/x86/kernel/entry.asm:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    switch_context:
    push rax ; ... start pushing the state of the scheduled out task on the stack
    push rcx
    push rdx
    ; ...
    jmp common_switch
    ; ...
    common_switch:
    ; ...
    call get_current_stack ; get new rsp
    mov rsp, rax ; switch stack to the new task's one
    ; ...
    pop r15 ; start restoring the scheduled in task state (reverse order)
    pop r14
    ; ...
    add rsp, 16 ; at that point the stack pointer points to the saved instruction pointer
    iretq ; restore instruction pointer from the stack, i.e. jumps there

  • Executes return path in the kernel until we reach the kernel/user boundary
    1
    2
    3
    4
    5
    6
    7
    8
    9
    isyscall:
    ; ...
    call syscall_handler
    ; ...
    pop r15 ; start restoring the userland state of the task
    pop r14
    pop r13
    ; ...
    sysret ; return from syscall handler to userspace

    Memory Unsafety

  • C/C++ are inherently天生的 memory unsafe
  • There is absolutely no check at runtime if memory accesses is initialised, allocated, or mapped
  • Generally the compiler won’t warn you either
  • A few examples of memory errors
    • Out of band accesses on buffers/arrays accesses
    • Failure to initialise before read stack/heap allocated variables
    • Leaks, double free, use-after-free
  • Memory errors are hard to detect, hard to debug
    • No compiler/runtime warning
    • Undefined behaviour: sometimes it crashes, sometimes not…

      Example 1: Infoleak

      Original code:
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      // char *welcome_message = "Hi there! How is it going?\n"; // 27 char
      // the it's updated to a shortened welcome message, only 11 chars now
      char *welcome_message = "Hi there!\n";
      char *password = "secret";
      char entered_password[128];
      int main(int argc, char **argv) {
      // Print welcome message character by character
      // but the loop is still 27!!
      for(int i=0; i<27; i++)
      printf("%c", welcome_message[i]);
      printf("Please input the password:\n");
      scanf("%s", entered_password);
      if(!strcmp(entered_password, password)) { printf("Passowrd ok!\n"); /* ... */ }
      else { printf("Wrong password! aborting\n"); }
      return 0;
      }
  • It read overflows welcome message, and password leaks to stdout!

    Example 2” Sensitive Data Tampering

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    char user_input[32] = "0000000000";
    char password[32] = "secret";
    int main(int argc, char **argv) {
    if(argc != 2) { printf("Usage: %s <password>\n", argv[0]); return 0; }
    strcpy(user_input, argv[1]);
    if(!strncmp(password, user_input, strlen(password))) {
    printf("login success!\n");
    /* do important stuff ... */
    } else {
    printf("wrong password!\n");
    }
    return 0;
    }
  • If user_input is write overflows, in “strcmp”, strings are equal so password will be accepted

    Example 3: Stack Smashing

  • Classic control flow hijacking劫持 attack
  • Originally proposed in 1996 here: http://www.phrack.org/archives/issues/49/14.txt
  • See how the CPU manage function calls/returns
  1. The machine push return address on the stack and jump to the function
  2. CPU pops return address from the stack and jumps to the end of calling function line
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    char *password = "super-secret-password";
    void security_critical_function() {
    printf("launching nukes!!\n"); //发射核武器
    }
    void preprocess_input(char *string) {
    char local_buffer[16];
    strcpy(local_buffer, string);
    /* work on local buffer ... */
    }
    int main(int argc, char **argv) {
    if (argc != 2) { /* ... */ }
    preprocess_input(argv[1]); // This line makes mistake
    if(!strncmp(password, argv[1],
    strlen(password)))
    security_critical_function();
    else
    printf("Unauthorised user!\n");
    return 0;
    }
  3. Stack layout memory when preprocess_input runs
  4. Overflow local_buffer and rewrite the return address
  5. Then the return -> CPU jumps to security_critical_function()!

    Example 4: Use-After-Free

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    typedef struct {
    int member1;
    void (*member2)(int);
    } my_struct;
    void print_hello(int x) {
    printf("Hello, parameter: %d\n", x);
    }
    void security_critical_function() {
    printf("Auto destruction engaged!\n");
    /* ... */
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    int main(int argc, char **argv) {
    /* allocate and init ms */
    my_struct *ms = malloc(sizeof(my_struct));
    ms->member1 = 42;
    ms->member2 = &print_hello;
    /* call the function pointer */
    ms->member2(12);
    free(ms);
    char *buffer = malloc(16);
    memcpy(buffer, argv[1], 16);
    ms->member2(12);
    exit(0);
    }
    use after free 1
    use after free 2
    use after free 3
    use after free 4
  • Because I can control what to input in command line, can overwrite the function pointer with the address of security

    Good Practices

    Writing Good Code

  • The compiler won’t catch all mistakes
  • The tools have their limitations
  • Writing good code from the start is super important
    • It will save you a lot of debugging time
    • It will save you from introducing serious security issues
  • Never use these functions, always unsafe:
    • gets (use fgets)
    • getwd (use getcwd)
    • readdir_r (use readdir)
    • More here: https://bit.ly/3ef4TBc

      String Manipulation Functions

  • Use the n methods,
    • strcpy -> strncpy
    • sprintf -> snprintf
  • Even with the versions with n, some specificities
    • strncpy won’t add \0 at the end of the target buffer
      1
      2
      3
      4
      char string1[] = "hello, world";
      char string2[32] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
      strncpy(string2, string1, strlen(string1));
      printf("%s\n", string2); // prints "hello, worldxxxxxxxxxxxxxxxxxxx"

      Dynamic Memory Allocation

  • Check malloc return value
  • After free, the pointer is invalid
    • Cannot be dereferenced
    • Cannot be used (ex: comparison)
  • realloc returns null upon failure but does not free the old pointer
    • so this: ptr = realloc(ptr, new_size) is a leak

Quiz review

Quiz 2

  • Int is 4 bytes, short is 2 bytes, float is 4 bytes, double is 8 bytes, char is 1 bytes, unsigned long int is 8 bytes.
  • The compiler, parsing the sources from top to bottom, wants to have either the full implementation or at least a forward declaration of negate_int before a first call to that function is encoutered.

    Quiz 3

  • strcmp returns 0 when the strings match
  • 11 in base 10 corresponds to 0xB In hexadecimal.
  • struct members are placed in order contiguously in memory, the address of the first member of the struct is the same as the address of the struct itself, so the address of the second member is 0x7ffd04d39a48 + 4 bytes (the size of the 1st member) = 0x7ffd04d39a4c
  • Pointers are useful to allow a function to modify its calling context and to avoid passing large parameters as copy. 
  • y is a local variable and it is thus located on the stack. x being a global variable, it is located in static memory.
  • We use O_RDWR to indicate read and write mode, O_TRUNC to destroy the file content if it exists, and O_CREAT to create the file if it does not exist

    Quiz 4

  • For a parent function C1 with virtual int function:
    • it cannot be instantiated
    • all classes inheriting from C1 need to provide an implementation for function(
  • The vector needs to be declared with the containing type between < >. An empty vector can be filled with push_back(). The vector constructor also takes an optional parameter that initializes empty elements, allowing them to be updated with[ ]without prior calls to push_back().
  • strcpy should be used rather than = to ensure a copy of the string itself and not a simple copy of the pointer used to refer to the character array
  • Dynamic object allocation is made with new, that returns a pointer to an object and not the object itself
    • Library *l = new Library(100);
  • Overloading corresponds to compile-time polymorphism, while overriding regards runtime polymorphism
  • Constructor’s implementation with same name
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
        class C {
    private:
    int member;

    public:
    C(int member);
    };

    C::C(int member) {
    this->member = member;
    }
  • Call the parent constructor: following a colon placed right after the child’s constructor parameter list. _a cannot be accessed directly in the child’s constructor (or in any of the child member functions) because it is private in Parent.
    1
    2
    3
    Child::Child(int a, int b) : Parent(a) {
    _b = b;
    }

    Quiz 5

  • By convention, up to 6 arguments for system calls are passed in order in %rdi, %rsi, %rdx, %r10, %r8, and %r9
    • By convention, on x86-64 the return value of a system call is placed in %rax
  • During the execution of printf, Musl uses writev to write to the file descriptor corresponding to the standard output under the hood by the Musl C library
  • The preprocessor is used for including headers, expanding macros, and enabling/disabling bits of code conditionally.
  • The int gets promoted to a double and the result of the addition is then a double, which gets interpreted as an int due to the %d specifier, printing garbage.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    #include <stdio.h>

    int main(int argc, char **argv) {
    int i = 12;
    double d = 4.6;

    printf("%d\n", i + d);
    return 0;
    }
  • Access latency rank: registers are the fastest, followed by the cache, followed by main memory, then we have the disk.
  • interpreted as an unsigned int (4 bytes), the long long unsigned int (8 bytes) gets its higher 4 bytes truncated截断.
    • E.g: long long unsigned int x = 0x400000000001; turn to unsigned int will become 1
  • The addition overflow the 4 bytes reserved for an int, and the variable wraps over to the smallest value that can be stored in an int: -2147483648
    • int i = 2147483647 + 1
    • 2147483647 is the max positive value that can be stored in an int
  • The addition of 1 triggers an overflow of the unsigned int type that is stored on 4 bytes, an as it is unsigned it wraps over to 0.
    1
    2
    3
    4
    5
    6
    7
    8
    #include <stdio.h>

    int main(int argc, char **argv) {
    unsigned int ui = 4294967295; // this is 32 bits full of 1's

    printf("%u\n", ui+1);
    return 0;
    }
  • Illustrate preprocessor:
    • The preprocessor produces source files as output
    • The preprocessor performs textual transformations on C sources
    • The preprocessor is generally called under the hood by the compiler
    • During the program build process, the preprocessor pass is made before the compilation (transformation of sources into machine code) pass
    • The preprocessor takes source files as input
  • Include guards allow to avoid the issue that the compiler encountering the definition of the struct twice while there should be only a single definition per struct over the entire program.
  • C/C++ are fast because they integrated well with assembly, and have no runtime overheads due to things such as the lack of memory access checks or garbage collection. They also give the programmer a high degree of control over the program memory layout, which helps for various things such as fitting as much data/code as possible in the CPU cache.

    Quiz 6

  • Use-after-free bugs are generally not detected by the compiler, they put the program into undefined behavior and can allow an attacker to subvert its control flow. They do not necessarily lead to a crash at runtime.
  • the mechanism(s) used to save the user context of a task during a syscall, and to save the kernel context of a task during a context switch by Pushing the CPU registers’ content on the stack
  • Mistakes that corresponds to memory errors
    • Reading an uninitialized variable on the stack
    • Double free of a pointer
    • Reading the content of an uninitialized buffer on the heap
    • Use-after-free
    • Out-of bound array/buffer access
    • Dereferencing a NULL pointer
  • System call handling and context switching in an OS are generally implemented
    • with a mix of C and assembly
    • with memory unsafe languages
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      int f(char *user_buf) {
      char local[128];
      strcpy(local, user_buf);
      printf("you entered: %s\n", user_buf);
      /* ... */
      }

      int main(int argc, char **argv) {
      if(argc != 2) { /* ... */ }
      f(argv[1]);
      /* ... */
      }
  • The possibility to overflow in write mode a local buffer with strcpy can lead to a stack smashing attack.