Programming Languages and Paradigms -- Part 1
A programming paradigm is a fundamental style of programming used to classify programming languages.
Main Programming Paradigms and Sub-Paradigms
Imperative
Structured (or procedural)
- Eg. C
Object Oriented
- Encapsulation of code and the associated data into objects
- Suit to problems of huge amounts of state/operations
- GUI, simulators, video games, business management software
- Easy to organize large codebases
- Eg. C#, C++, Java, Python
Concurrent
- Shared-memory threads/processes in C/C++, Java, Python, etc.
- Message passing (for example MPI) in C/C++/FORTRAN
- Semi-automatic loop paralelization with libraries such as OpenMP
- GPU programming with CUDA
- Use cases: HPC, distributed computing, graphic processing, etc.
Describes sequences of statements manipulating the program state
- describes how to obtain the computation results
Uses advanced control flow operations
- Loops, conditionals, procedures
- easier to describe/reason about complex/large programs
Uses execution threads/processes to describe interleaving and/or parallel computation flows
Functional
- First-class/higher-order functions
- Loops implemented with recursion
- Pure functions have no side-effects
- Languages: Haskell, Scala, F#, etc.
Concurrent
- Less need for synchronization
- Use cases: distributed applications, web services, etc.
Describes the meaning/result of computations
High level of abstraction Code can easily become convoluted
Languages: HTML, SQL, XML, CSS, Latex (non-Turing complete)
Usage: document rendering, structured data storage and manipulation
Calling and composing functions to describes the program
Multi-Paradigm Languages
- Haskell: purely functional and does not allow OO style
- OCaml: mainly functional, allows OO and imperative constructs
- C, C++: imperative but allow some functional idoms:
- Many languages are multi-paradigm
Introduction to C
Pros:
- Syntax
- Low level
- Fast
- Controlled memory footprint (occupation)控制内存占用
Cons:
- Large area for making mistakes
- Lack of memory safety, risks of undefined behavior at runtime
Hello World
Assuming the code is in listing1.c, to compile and run on the linux command line:1
2
3
4
5
6
7
8
9
// will run first when the program is executed
// returns an integer and takes no parameters
int main() {
// print text on the standard output
printf("hello, world!\n");
// return to go back to the calling context, and exit if returning from main
return 0;
}1
2
3gcc listing1.c -o listing1
./listing1
hello, world! (This is the output)C – Variables, Types, Printing to the Console
Variables:
1
2
3
4
5
6
7
int main() {
int a; // declare a of type int (signed integer)
a = 12; // set the value of a to 12
printf("a: %d\n", a); // print the value of a
return 0;
} - Variable:name,type,value
- Must be declared before being used
- Declare with <variable type><variable name>
- A variable’s name should start with a letter or underscore
- Defines the amount of space in memory allocated to the variable
- Compiler checks the validty of operations on the variable at build time
- 3 basic types categories: integers, floating-point numbers, characters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15// INTEGER
int a; // signed integer takes 4 bytes in memory on the Intel x86-64 architecture
// CHARACTERS
char my_char_variable;
my_char_variable = 'x'; // using single quotes for characters
// FLOATS
float my_float_var;
my_float_var = 12.4;
// MIXING INTEGERS AND FLOATS
int int1 = 2;
float float1 = 3.8;
int int2 = int1 + float1; // int2 = 5;
int int3 = int1 * float1; // int3 = 7;
/* when mixed with floats in arithmetics, integers are promoted to floats */
/* when stored in an integer variable, floats are _truncated_ */ - Types and Qualifiers
- Types: char, int, float, double
- The function
sizeof
can be used to get the exact size of a type on a given manchine1
2
3int so_short = sizeof(short int);
int so_int = sizeof(int);
printf("size of short:%d bytes\n", so_short);
- The function
- Qualifers: signed, unsigned(positive), short, long
- Determines the storage size in memory
- Exact implementation is architecture dependent 依赖于架构
1
2
3
4
5
6
7
8
9short int a; // signed, at least 16 bits: [−32,767, +32,767] i
int b; // signed, at least 16 bits: [−32,767, +32,767]
unsigned int c; // unsigned: [0, +65,535]
long int d; // at least 32 bits: [−2,147,483,647, +2,147,483,647]
unsigned long int e; // unsigned: [0, +4,294,967,295]
long long int f; // at least 64 bits: [-9x10^18, +9x10^18]
long long unsigned int g; // unsigned: [0, +18x10^18]
float h; // storage size unspecified, generally 32 bits
double i; // storage size unspecified, generally 64 bitsPrinting to the Console: printf
- Types: char, int, float, double
- Accepts one or more arguments: format string + variable list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16int int_var = -1;
unsigned int uint_var = 12;
long int lint_var = 10;
float float_var = 2.5;
double double_var = 2.5;
char char_var = 'a';
char string_var[] = "hello";
printf("Integer: %d\n", int_var);
printf("Unsigned integer: %u\n", uint_var);
printf("Long integer: %ld\n", lint_var);
printf("Float: %f\n", float_var);
printf("Double: %lf\n", double_var);
printf("Characters: %c\n", char_var);
printf("String: %s\n", string_var);
printf("Several variables: %d, %lf, %s\n", int_var, double_var, string_var);
// Format string accepts escaped character such as \n for line break - Make sure to use the right marker because the compiler won’t tell you the mistakes and incorrect values
- It will show warning like:”format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long long unsigned int’ “
1
2
3
4
5
6
7
8
9
10int my_variable = -42;
printf("-42 printed with %%d: %d\n", my_variable);
printf("-42 printed with %%u: %u\n", my_variable); // -42 interpreted as an unsigned integer
// The outcome is 4294967254
unsigned long long int ull = 5467513055454315;
printf("ull printed with %%llu: %llu\n", ull);
printf("ull printed with %%u: %u\n", ull); // interprets only 4 bytes vs. 8 for llu
// The outcome is 2507777131
printf("ull printed with %%d: %d\n", ull); // first 4 bytes, interpreted as unsigned
// The outcome is -1787190165Arrays, Strings, Command Line Arguments
Arrays
1
2
3
4
5
6
7
8
9int array[4]; // declaration
array[0] = 10; // write a slot
array[1] = 20;
array[2] = 30;
array[3] = 40;
int x = array[3]; // read a slot
printf("array[%d] contains %d\n", 0, array[0]); //array[0] contains 10
printf("array[%d] contains %d\n", 1, array[1]); //array[1] contains 20
printf("array[%d] contains %d\n", 2, array[2]); //array[2] contains 30
- It will show warning like:”format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long long unsigned int’ “
- Arrays are stored contiguously in memory;
- Multi-Dimensional Arrays
1
2int my_2d_array[3][2];
my_2d_array[0][0] = 0; - Static initialization allows to omit the size (of the first dimension)
1
2int array[] = {1, 2, 3, 4, 5}; // do not have the size
int array_2d[][2] = {{1, 2}, {3, 4}, {5, 6}}; - Character arrays, i.e. Strings
- String: array of characters that ends with the special
\0
end of string character marker1
2
3
4
5
6
7
8
9
10
11
12char string1[6] = "hello";
char string2[] = "hello";
char string3[6];
string3[0] = 'h';
string3[1] = 'e';
string3[2] = 'l';
string3[3] = 'l';
string3[4] = 'o';
string3[5] = '\0'; // Important here
printf("%s\n", string1); // hello
printf("%s\n", string2); // hello
printf("%s\n", string3); // helloCommand Line Arguments
gcc is program; the rest are arguments1
gcc test.c -o program
- String: array of characters that ends with the special
main
parameters:- argc (integer)
- argv: array containing the command line arguments
1 | int main(int argc, char **argv) { // 'char ** argv' means 'char argv[][]' |
- If the command line input is:
./test one two three
The output is:
1 | Number of command line arguments: 4 |
- If the command line input is:
./test one
The output is:
1 | Number of command line arguments: 2 |
- If the command line input is:
./test
The output is:
1 | Number of command line arguments: 1 |
This is quite DANGEROUS
- Arguments are strings, to convert to integer use
atoi
:1
2
3
4
5
6
7
8
9
10
11
/* Sum the two integers passed as command line integers */
int main(int argc, char **argv) {
int a, b;
// Once again, dangerous!
a = atoi(argv[1]);
b = atoi(argv[2]);
printf("%d + %d = %d\n", a, b, a+b);
return 0;
} - Statements end with
;
and are executed sequentiallyConditionals
- Condition:
<
,<=
,>
,>=
,==
,!=
- “false” is
0
, “true” is everything except0
- Boolean operations:
&&
AND,||
OR,!
negation- Operators precedence优先级:
!
first, then&&
, the||
- Use parentheses to modify/clarify evaluation order
- Operators precedence优先级:
- Switch statement
- Test the equality of an integer against a list of values
- Use
break
to avoid fallthrough坠落 in cases default
path taken when none of the other cases matches1
2
3
4
5
6
7
8
9
10
11
12switch(x) {
case 1:
printf("x is 1\n");
break;
case 2:
printf("x is 2\n");
break;
case 3:
printf("x is 3\n");
break;
default:
printf("x is neither 1, nor 2, nor 3\n"); }Loops
- While loop
while...
checks the condition before entering the loop bodydo... while
checks after a first iteration
- For Loops
- for (<init statement> ; <condition>; <update statement>) {<body>}
- <init statement> executed once before the first iteration
- <condition> checked before each iteration
- <update statement> executed after each iteration
- E.g
for(int i = 0; i<10; i = i + 1) {...}
break
simply exits the loopcontinue
directly goes to the next iterationBack to fix command line potential danger
1
2
3
4
5
6
7
8
9
10
11
12
13int print_help_and_exit(char **argv) {
printf("Usage: %s <number 1> <number 2>\n", argv[0]);
exit(-1); }
/* Sum the two integers passed as command line integers */
int main(int argc, char **argv) {
int a, b;
// To make sure there are only two numbers in command line
if(argc != 3)
print_help_and_exit(argv);
a = atoi(argv[1]);
b = atoi(argv[2]);
printf("%d + %d = %d\n", a, b, a+b);
return 0; }Braces in Conditional and Loops
- for (<init statement> ; <condition>; <update statement>) {<body>}
- Can omit(ignore) the braces if conditional/loop body is a single statement
Functions
- Contains name, zero or more parameters with types, and a return type
1
2
3
4int add_two_integers(int a, int b) {
int result = a + b;
return result;
} - The absence缺席 of return value can be indicated with the
void
type1
2
3void print_hello(void) {
printf("hello!\n");
return; // we can even omit this as the function does not return anything }Function Calls
- Function parameters and return value are passed as copies
1
2
3
4
5
6
7
8
9void my_function(int parameter) {
parameter = 12; // does not update x in main
}
int main() {
int x = 10;
my_function(x);
printf("x is %d\n", x); // x is 10
return 0;
}Forward Declarations
- Forward declaration, also called function _prototype_ 原型
- The compiler parses解析 source files from top to bottom
1
2
3
4
5
6
7
8int add(int a, int b);
int main(){
....
}
/* The actual function definition */
int add(int a, int b) {
return a + b;
}Variable Scope and Lifetime
- Global variables declared outside functions
- visible at the level of the entire source file
- Try to limit the use of globals as much as possible
- A lot of global state makes it harder to reason about the program
- Within a a function, a declared local variable is accesible within its enclosing braces (a block of code)
1
2
3
4
5int j;
for(j=0; j<10; j++) {
printf("In loop body, j is %d\n", j);
}
printf("Out of loop body, j is %d\n", j); // Out of loop body, j is 10Custom Types and Data Structures in C
Custom Types
- Use
typedef
to alias别名 a type1
2
3
4
5
6
7
8
9
// 'my_int' is now equivalent to 'long long unsigned int'
// To make it shorter
typedef long long unsigned int my_int;
int main(int argc, char **argv) {
my_int x = 12;
printf("x is: %llu\n", x);
return 0;
}Custom Data Structures
- Use struct to define a data structure with named fields
- Use struct <struct name> to refer to it as a type
- Access the fields of a variable with <variable_name>.<field_name>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Definition of the struct:
struct person {
char name[10];
float size_in_meters;
int weight_in_grams;
};
void print_person(struct person p) {
/* Access fields with '.' */
printf("%s has a size of %f meters and weights %d grams\n", p.name, p.size_in_meters, p.weight_in_grams);
}
int main(int argc, char **argv) {
// declare a variable of the struct type:
struct person p1;
// sets the variable field
p1.size_in_meters = 1.6;
p1.weight_in_grams = 60000;
strcpy(p1.name, "Julie");
struct person p2 = {"George", 1.8, 70000};
print_person(p1);
print_person(p2);
return 0;
} - Fields are place consecutively连续地 in memory (compiler may insert padding)
- Structures can be
typedef
A faster way to define structures:1
2
3
4
5
6
7
8
9
10
11
struct s_person {
/* fields declaration ... */
};
typedef struct s_person person;
void print_person(person p) { /* ... */}
int main(int argc, char **argv) {
person p1;
person p2 = {"George", 1.8, 70000};
/* ... */ }1
2
3typedef struct {
/* fields declaration ... */
} person;Enumeration
- User defined type for assigning textual names to integer constants
- The compiler assigns an integer constant to each value of an enum
1
2
3
4
5enum color{
RED,
BLUE,
GREEN
} - Enumerations can be
typedef
‘d
typedef enum color mycolor
or in once:
1 | typedef enum { |
Introduction to Pointers
Program Memory Layout
- All the program’s code and data is present somewhere in memory
- The area of memory accessible by the program is the address space
- Large array of contiguous bytes
- Addresses are indexes in that array
- This is Virtual memory:
- Address space size independent from the amount of RAM
- Each program gets its own address space
- Address of a variable: the address in memory of the first byte storing that variable
- Use the
&
operator to get the address of a variable - With modern processeors an address is a 64 bit value so we need the right format specifier: ‘%lu’ or ‘%lx’ if we want to print in hexadecimal
Pointers
- Pointer: variable that contains an address(possibly of another variable)
- Declaration: <pointed type> * <pointer name>
1
2int v = 42;
int *ptr = &v;
- Declaration: <pointed type> * <pointer name>
- Access the pointed value with
*
1
2printf("pointer value: 0x%lx\n", ptr); // 0xF123
printf("pointed value: %d\n", *ptr); // 42Pointers Applications
Allow a Function to Access the Calling Context
- Can’t change a fuction’s parameter unless to use pointer
1
2
3
4
5
6
7
8
9
10void add_one(int *param) {
*param++; // access the pointed value
}
int main(int argc, char **argv) {
int x = 20;
printf("before call, x is %d\n", x); // print 20
add_one(&x); // address of x
printf("after call, x is %d\n", x); // print 21
return 0;
} - A function can ‘return’ more than a single value if…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16// we want this function to return 3 things: the product and quotient of n1 by n2,
// as well as an error code in case division is impossible
int multiply_and_divide(int n1, int n2, int *product, int *quotient) {
/* Can't divide if n2 is 0 */
if(n2 == 0) return -1;
*product = n1 * n2;
*quotient = n1 / n2;
return 0;
}
int main(int argc, char **argv) {
int p, q, a = 10, b = 5;
if(multiply_and_divide(a, b, &p, &q) == 0) {
printf("10*5 = %d\n", p); // 10*5 = 50
printf("10/5 = %d\n", q); // 10/5 = 2
}
}Getting over C FUnction Call Limitations
- Assume we want to write a function updating a large struct variable
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15typedef struct {
// lots of large (8 bytes) fields:
double a; double b; double c; double d; double e; double f;
} large_struct;
large_struct f(large_struct s) { // very inefficient in terms of performance and memory usage!
s.a += 42.0;
return s;
}//set a large struct and only change one variable in it
int main(int argc, char **argv) {
large_struct x = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};
large_struct y = f(x);
printf("y.a: %f\n", y.a); // 43.0000
} - Huge performance loss and memory footprint increase!
- Solution: With pointer: maintain a single copy of the variableThis is much faster and efficient
1
2
3
4
5
6
7
8
9
10
11
12
13typedef struct {
double a; double b; double c; double d; double e; double f;
} large_struct;
void f(large_struct *s) { // now takes a pointer parameter
(*s).a += 42.0; // dereference to access x
return;
}
int main(int argc, char **argv) {
large_struct x = {1, 2, 3, 4, 5, 6};
f(&x); // pass x's address
printf("x.a: %f\n", x.a);
return 0;
}C Arrarys are Pointers!
1
2
3
4
5
6
7
8
9
10
11
12
13
14void negate_int_array(int *ptr, int size) { // functions taking arrays as parameters
// also need the size to iterate properly
for(int i=0; i<size; i++)
// use square brackets like a standard array
ptr[i] = -ptr[i];
}
int main(int argc, char **argv) {
int array[] = {1, 2, 3, 4, 5, 6, 7};
negate_int_array(array, 7); // to get the pointer just use the array's name
for(int i=0; i<7; i++)
printf("array[%d] = %d\n", i, array[i]); // the result will be negative
return 0;
} - We can call the elemnt in array by using *(arrayname + index)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15int int_array[2] = {1, 2};
double double_array[2] = {4.2, 2.4};
char char_array[] = "ab";
printf("int_array[0] = %d\n", int_array[0]); //=1
printf("int_array[1] = %d\n", int_array[1]); //=2
printf("*(int_array+0) = %d\n", *(int_array+0)); // pointer arithmetic!, =1
printf("*(int_array+1) = %d\n", *(int_array+1)); // +1 means + sizeof(array type), =2
printf("double_array[0] = %f\n", double_array[0]); //=4.200
printf("double_array[1] = %f\n", double_array[1]); //=2.400
printf("*(double_array+0) = %f\n", *(double_array+0));//=4.200
printf("*(double_array+1) = %f\n", *(double_array+1));//=2.400
printf("char_array[0] = %c\n", char_array[0]); //=a
printf("char_array[1] = %c\n", char_array[1]); //=b
printf("*(char_array+0) = %c\n", *(char_array+0)); //=a
printf("*(char_array+1) = %c\n", *(char_array+1)); //=bPointers and Structures
- . takes precedence优先级 over *
- s->x is a shortcut for (*s).xp->int_member2 is (*p).int_member2 = 2; &(p->int_member2) is the address of this variable; *(p->ptr_member) is the actual value of this address point at
1
2
3
4
5
6
7
8
9
10
11
12
13
14typedef struct {
int int_member1;
int int_member2;
int *ptr_member;
} my_struct;
/* declare and initialise ms of type my_struct ... */
my_struct *p = &ms;
(*p).int_member1 = 1; // don't forget the parentheses! . takes precedence over *
p->int_member2 = 2; // s->x is a shortcut for (*s).x
p->ptr_member = &(p->int_member2);
printf("p->int_member1 = %d\n", p->int_member1); //1
printf("p->int_member2 = %d\n", p->int_member2); //2
printf("p->ptr_member = %p\n", p->ptr_member); //0x7fff38692954
printf("*(p->ptr_member) = %d\n", *(p->ptr_member)); //2Pointer Chains
- A pointer is a variable and has itself an address, i.e. a location in memory, so it can be pointed to!The output is:
1
2
3
4
5
6
7int value = 42; // integer
int *ptr1 = &value; // pointer of integer
int **ptr2 = &ptr1; // pointer of pointer of integer
int ***ptr3 = &ptr2; // pointer of pointer of pointer of integer
printf("ptr1: %p, *ptr1: %d\n", ptr1, *ptr1);
printf("ptr2: %p, *ptr2: %p, **ptr2: %d\n", ptr2, *ptr2, **ptr2);
printf("ptr3: %p, *ptr3: %p, **ptr3: %p, ***ptr3: %d\n", ptr3, *ptr3, **ptr3, ***ptr3);1
2
3ptr1: 0x7ffc7e51775c, *ptr1: 42
ptr2: 0x7ffc7e517760, *ptr2: 0x7ffc7e51775c, **ptr2: 42
ptr3: 0x7ffc7e517768, *ptr3: 0x7ffc7e517760, **ptr3: 0x7ffc7e51775c, ***ptr3: 42Dynamic Memory Allocation
Motivation
- Many scenarios where an array size is determined only at runtime
- Variable size arrays are rarely used due to support and space restrictions
- Are not supported for all versions of C, if we try a very large number, we will get an error “segmentation fault”
- So we need another solution
- Static data: large size, all allocations fixed at compile time. Store global variables
- Stack: small size, most allocations fixed at compile time.
- Heap: lagre size, all allocations made dynamically at runtime
Allocating on the Heap:
malloc
1
2
3
4
5
6
7
8
9
10
11
12
13
int process_array(int *array, int size) { /* do something with array */ }
int main(int argc, char **argv) {
int *heap_array;
/* ... */
int size = atoi(argv[1]);
heap_array = malloc(size * sizeof(int));
if(heap_array == NULL) return -1;
process_array(heap_array, size);
free(heap_array); //free up memory when you done with it
return 0;
} malloc
allocates a chunk of memory on the heap and returns a pointer to it- Memory is contiguous and can be used as an array
- Returns NULL (0) if the allocation fails
- Always check the return value
void *
: generic pointer type- Void pointer is a specific pointer type – void * – a pointer that points to some data location in storage, which doesn’t have any specific type. Void refers to the type. Basically the type of data that it points to is can be any. If we assign address of char data type to void pointer it will become char Pointer, if int data type then int pointer and so on. Citation
size_t
: long unsigned int1
2void *malloc(size_t size);
void free(void *ptr);Multidimensional Arrays and malloc
- 3 x 2 int array
1 | int a = 3; int b = 2; |
Memory Leaks
- Don’t forget to free memory!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19void print_ten_integers() {
int *array = malloc(10 * sizeof(int));
if(!array) {
printf("cannot allocate memory ...\n");
return;
}
for(int i=0; i<10; i++) {
array[i] = rand()%100;
printf("%d ", array[i]);
}
printf("\n");
/* array is neve free, leaking 10*sizeof(int) of memory each iteration */
}
int main(int argc, char **argv) {
int iterations = atoi(argv[1]);
for(int i=0; i<iterations; i++)
print_ten_integers();
return 0;
} - Can use Valgrind to detect memory leaks
The C Standard Library
- Already saw:
printf
,malloc
,atoi
, etc - can use
man <function name>
to find function description, prototype, required headers, return value in a Linux terminalString Copy
The format of String copy:Example:1
2char *strcpy(char *dest, const char *src);
char *strncpy(char *dest, const char *src, size_t n);1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16int main(int argc, char **argv) {
char *string1 = "hello";
char *string2 = string1; // this is not a string copy!
char string3[10];
/* not super safe, what happen if string1 > string3? */
strcpy(string3, string1);
/* better */
strncpy(string3, string1, 10);
printf("string1 @%p: %s\n", string1, string1); // string1 @0x55bcccbf98a4: hello
printf("string2 @%p: %s\n", string2, string2); // string1 @0x55bcccbf98a4: hello
printf("string3 @%p: %s\n", string3, string3); // string3 @0x7ffda803dbee: hello
return 0;
}String Concatenation 连结
The format of String Concatenation:Example:1
2char *strcat(char *dest, const char *src);
char *strncat(char *dest, const char *src, size_t n);1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int main(int argc, char **argv) {
char world[6] = "world";
char s1[32];
char s2[32];
strcpy(s1, "hello ");
strcpy(s2, "hello ");
strcat(s1, world); // not very safe
strncat(s2, world, 32); // better
printf("s1: %s\n", s1); //s1: hello world
printf("s2: %s\n", s2); //s2: hello world
return 0;
}Format-based String Creation
Format is:
int sprintf(char *str, const char *format, ...);
Fill a string with printf-like specifiers 用类似 printf 的说明符填充字符串
1 |
|
String Manipulation
- Get string length with
strlen
- Compare two strings with
strcmp
, returns 0 if they match
The formats of them are:
size_t strlen(const char *s)
int strcmp(const char *s1, const char *s2);
Example:
1 |
|
Console Input
Formats:
1 | char *fgets(char *s, int size, FILE *stream); |
fgets()
can read from any open file, butscanf()
only reads standard input.fgets()
reads ‘a line of text’ from a file;scanf()
can be used for that but also handles conversions from string to built in numeric types.- Many people will use
fgets()
to read a line of data and then usescanf()
to dissect it. From stackoverflow
Example:
1 | int main(int argc, char **argv) { |
Memory Manipulation
- Writing bytes to memory, Copying memory
Formats:1
2void *memset(void *s, int c, size_t n);
void *memcpy(void *dest, void *src, size_t n);memset
sets for char (or other type) variable, maken
size of it toc
memcpy
copy from src
to dest
in the size of n
Example:
1 |
|
Math functions
1 | double ceil(double x); // the ceil function returns the smallest integer that is greater than or equal to x |
Example:
1 |
|
Time
- Sleeping
- format
1
2unsigned int sleep(unsigned int seconds); // parameter as second
int usleep(useconds_t usec); //suspends the current process for the number of microseconds passed to it - 1 s = 1000000 µs
- Example
1
2
3
4
5
6
7
8
9
10
int main(int argc, char **argv) {
printf("hello!\n");
printf("Sleeping for 2 seconds ...\n");
sleep(2);
printf("Now sleeping for .5 seconds ...\n");
usleep(500000);
return 0;
}
- format
- Current Time
- format
1
time_t time(time_t *tloc); // time_t is generally a long long unsigned int
- Example
1
2
3
4
5
6
7
8
int main(int argc, char **argv) {
unsigned long long t = time(NULL);
printf("Number of seconds elapsed since the epoch (01/01/1970):\n");
printf("%llu\n", t); // 1650917906
return 0;
}
- format
- Measuring Execution Time
- Format
1
2
3
4
5
6int gettimeofday(struct timeval *tv, struct timezone *tz);
struct timeval {
time_t tv_sec; /* seconds (type: generally long unsigned) */
suseconds_t tv_usec; /* microseconds (type: generally long unsigned) */
};
// A pointer to a struct timeval as parameter - Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
int main(int argc, char **argv) {
struct timeval tv, start, stop, elapsed;
gettimeofday(&tv, NULL);
printf("Seconds since the epoch: %lu.%06lu\n", tv.tv_sec, tv.tv_usec); // get second and microsecond
gettimeofday(&start, NULL); // set a start time stamp
for(int i=0; i<1000000000; i++); // implementing a large loop
gettimeofday(&stop, NULL); // set an ending time stamp
timersub(&stop, &start, &elapsed); // stop-start = elapsed
printf("Busy loop took %lu.%06lu seconds\n", elapsed.tv_sec,
elapsed.tv_usec);
return 0;
}File I/O
1
int open(const char *pathname, int flags, mode_t mode);
- Format
open
creates a reference to a file: file descriptorpathname
: path of the fileflags
:- Access mode:
O_RDONLY
,O_WRONLY
,O_RDWR
- create the file if it doesn’t exist:
O_CREAT
- Truncate截短 the file size to 0 if it exists:
O_TRUNC
1
2
3
4// open /home/pierre/test, read-write mode, and create the file if it does not
// exists with r/w/x permissions
// I_IRWXU means read, write, execute/search by
int file_descriptor = open("/home/pierre/test", O_RDWR | O_CREAT, S_IRWXU);1
ssize_t read(int fd, void *buf, size_t count);
- Access mode:
read
attempts to read bytes from a filefd
is the file descriptor, previously created withopen
buf
is the address of the buffer that will receive the content roadcount
is the number of bytes to try to read- return the number of bytes actually read
1
2
3
4
5
6int bytes_read = (file_descriptor, buffer, 10);
if(bytes_read == -1) {
/* error */
} else if(bytes_read != 10) {
/* technically not an error */
}1
ssize_t write(int fd, const void *buf, size_t count);
write
attemps to writecount
bytes frombuf
int the file corresponding tofd
- Returns the number of bytes actually written
- File system can be full
1
2
3
4
5
6int bytes_written = (file_descriptor, buffer, 10);
if(bytes_written == -1) {
/* error */
} else if(bytes_written != 10) {
/* technically not an error */
}1
int close(int fd)
close
terminates file I/O on a given descriptor
Example:The result is a txt file called “test” with “hello, world!hello, world!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
int main(int argc, char **argv) {
int fd1;
char *buffer = "hello, world!";
fd1 = open("./test", O_WRONLY | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
if(fd1 == -1) {
printf("error with open\n");
return -1;
}
/* write 'hello, world!' in the file */
if(write(fd1, buffer, strlen(buffer)) != strlen(buffer)) {
// write will return the number of bytes actualy written
printf("issue writing\n");
close(fd1); return -1;
}
/* write it again */
if(write(fd1, buffer, strlen(buffer)) != strlen(buffer)) {
printf("issue writing\n");
close(fd1); return -1;
}
close(fd1);
return 0;
}
“ in it.
Read Example which assume the last example has executed
1 | // Assume ./test was previously created with the write example program |
Random Numbers
V1: get numbers between 0 and RAND_MAX
1
2
3
4
5
6
7
8
int main(int argc, char **argv) {
for(int i=0; i<10; i++)
printf("%d ", rand());
printf("\n"); // 1804289383 846930886 1681692777 1714636915 1957747793 424238335 719885386 1649760492 596516649 1189641421
return 0;
}V2: numbers between 0 and 99 with modulo
1
2
3
4
5
6
7
8
int main(int argc, char **argv) {
for(int i=0; i<10; i++)
printf("%d ", rand()%100); // 83 86 77 15 93 35 86 92 49 21
printf("\n");
return 0;
}V3: display a different sequence each time we launch the program
- makes use of the computer’s internal clock to control the choice of the seed. Since time is continually changing, the seed is forever changing.
- void srand(unsigned int seed) seeds the random number generator used by the function rand.
-
seed
− This is an integer value to be used as seed by the pseudo-random number generator algorithm. - Because the number is set after compiling the program!
1
2
3
4
5
6
7
8
9
10
11
12
13
int main(int argc, char **argv) {
srand(time(NULL));
for(int i=0; i<10; i++)
printf("%d ", rand()%100);
printf("\n");
return 0;
}Error Management
The variable
errno
can be used to get more information about the failure of many functions of the standard library1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int main(int argc, char **argv) {
int fd = open("a/file/that/does/not/exist", O_RDONLY, 0x0);
/* Open always returns -1 on failure, but it can be due to many different
* reasons */
if(fd == -1) {
printf("open failed! errno is: %d\n", errno);
// the errno is 2
/* errno is an integer code corresponding to a given reason. To format
* it in a textual way use perror: */
perror("open");
}
return 0;
}Errors are listed, more can find in the function man page
1
2
3
4
5
6
7
8
9
101 EPERM Operation not permitted
2 ENOENT No such file or directory
3 ESRCH No such process
4 EINTR Interrupted system call
5 EIO Input/output error
6 ENXIO No such device or address
7 E2BIG Argument list too long
8 ENOEXEC Exec format error
9 EBADF Bad file descriptor
10 ECHILD No child processesClasses and Objects in C++
Superset of C with many additional functionalities, in particular Object-Oriented Programming (OOP)
- Everything we saw for C will also work in C++
3 fundamental concepts of OOP:
Object Oriented Programming binds together some data and the code that is manipulating it into objects
The code for manipulating similar objects is defined once into a class
The programmer uses the class as a pattern to create (instantiate) as many objects as needed
By convention, class names start with a capital letter and use camel case
Member variables represent data attached to the object created from the class
- Different objects of the same class can have different values for member variables
Member variables represent data attached to the object created from the class Member functions are functions, i.e. the code manipulating the data
- Prototype原型 in the class, implemented outside with ::
Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19class SampleClass {
public:
int data;
void add(int val);
};
void SampleClass::add(int val) {
data = data + val;
}
int main(int argc, char **argv) {
// create two objects, instances of SampleClass
SampleClass obj1, obj2;
obj1.data = 42;
obj2.data = 12;
obj1.add(58);
obj2.add(8);
printf("obj1 data: %d\n", obj1.data); //obj1 data: 100
printf("obj2 data: %d\n", obj2.data); //obj2 data: 20
return 0;
}Encapsulation
Encapsulation: prevent direct access to part of the member variables/functions of a class from outside of this class
- To force the outside code to use a well-defined interface, i.e. the member functions and (rarely) variables still available to it
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class BankAccount {
// All attributes/methods set as private are unacessible from outside the class
private:
int _balance = 0;
int _transaction_num = 0;
public:
void update_balance(int val);
int get_balance();
int get_transaction_num();
};
/* Add val to the balance and increment the number of transactions */
void BankAccount::update_balance(int val) {
_balance += val;
_transaction_num++;
}
/* Return the current value of the balance */
int BankAccount::get_balance() {
return _balance;
}
/* Return the number of transactions */
int BankAccount::get_transaction_num() {
return _transaction_num;
}
int main(int argc, char **argv) {
BankAccount ba;
ba.update_balance(100); // payday 发薪日
ba.update_balance(-20); // groceries 杂货
printf("The account balance is %d after %d transactions\n",
ba.get_balance(), ba.get_transaction_num());
// output: The account balance is 80 after 2 transactions
return 0;
}
- To force the outside code to use a well-defined interface, i.e. the member functions and (rarely) variables still available to it
Have as much member variables private as possible
Have funcions to read/write these only if needed
- Use accessors or getters and setters
- Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class MyClass {
private:
int _foo = 0;
double _bar = 0.0;
public:
void set_foo(int val); // setter
int get_foo(); // getter
double get_bar(); // getter
/* ... */
};
void MyClass::set_foo(int val) {
_foo = val;
}
int MyClass::get_foo() {
return _foo;
}
double MyClass::get_bar() {
return _bar;
}
int main(int argc, char **argv) {
MyClass c1, c2;
c1.set_foo(42);
c2.set_foo(24);
printf("c1's foo: %d, c2's foo: %d\n", c1.get_foo(), c2.get_foo());
// c1's foo: 42, c2's foo: 24
return 0;
}Constructors
Initialise member variables at object creation time with a constructor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class BankAccount {
// All attributes/methods set as private are unacessible from outside the class
private:
int _balance;
int _transaction_num;
public:
BankAccount(int initial_balance);
void update_balance(int val);
int get_balance();
int get_transaction_num();
};
/* Constructor */
// now can set any value as intitial balance
BankAccount::BankAccount(int initial_balance) {
_balance = initial_balance;
_transaction_num = 0;
}Destructors
Member functions called when the object is destroyed
I.e. when it goes out of scope for the stack-allocated objects we have seen this far1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class BankAccount {
// All attributes/methods set as private are unacessible from outside the class
private:
int _balance;
int _transaction_num;
char *_account_name;
public:
BankAccount(char *name, int initial_balance);
~BankAccount();
void update_balance(int val);
int get_balance();
int get_transaction_num();
char *get_name();
};
/* Constructor */
BankAccount::BankAccount(char *name, int initial_balance) {
_balance = initial_balance;
_transaction_num = 0;
_account_name =
(char *)malloc((strlen(name) + 1) * sizeof(char));
strcpy(_account_name, name);
}
/* destructor */
BankAccount::~BankAccount() {
free(_account_name);
// printf here to see when the destructor is called:
printf("destructor called\n");
}
/* Add val to the balance and increment the number of transactions */
void BankAccount::update_balance(int val) {
_balance += val;
_transaction_num++;
}
/* Return the current value of the balance */
int BankAccount::get_balance() {
return _balance;
}
/* Return the number of transactions */
int BankAccount::get_transaction_num() {
return _transaction_num;
}
/* Return the name of the account */
char *BankAccount::get_name() {
return _account_name;
}
int main(int argc, char **argv) {
char name[11] = "my_account";
BankAccount ba(name, 100); //set name and initial balance
ba.update_balance(100); // payday
ba.update_balance(-20); // groceries
printf("The account \"%s\" balance is %d after %d transactions\n",
ba.get_name(), ba.get_balance(), ba.get_transaction_num());
return 0;
}
// The output is:
//The account "my_account" balance is 180 after 2 transactions
// destructor calledThis Pointer
In any member function (including destructors and constructors), one can obtain a pointer to the current object with this
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Test {
private:
int x;
int y;
public:
Test(int x, int y);
void print(void);
};
/* This constructor's parameter have the same names as the class member
* variables */
Test::Test(int x, int y) {
this->x = x; // x corresponds to the parameter, this->x to the member
this->y = y; // this->x equal to (*this).x
}
void Test::print() {
printf("x: %d, y: %d\n", x, y);
printf("x: %d, y: %d\n", this->x, this->y); // same thing
}
int main(int argc, char **argv) {
Test t(1, 2);
t.print();
return 0;
}C++: Dynamic Object Allocation and Dynamic Arrays
Dynamic Object Allocation
malloc
andfree
are available in C++- Good for allocating arrays/buffers of primitive原始 types(
int
,char
, etc.)
- Good for allocating arrays/buffers of primitive原始 types(
new
anddelete
to allocate objects dynamicallyvector class
Variable size arrays, their allocated memory (on the heap) is managed automatically by the language
They also embed their size
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// vector is in the std namespace
using namespace std;
/* apply a few random transactions on account pairs within the array
* Note that we have a dynamic array which embeds its size so no need to pass
* the number of accounts as parameter anymore
* Also note that we pass a pointer to the vector rather than the vector
* to avoid a costly copy with long vectors! */
void rnd_transac(vector<BankAccount *> *b) {
int num = rand()%10 + 1;
int n = b->size(); // returns the amount of elements in the vector
for(int i=0; i<num; i++) {
BankAccount *source = (*b)[rand()%n]; // we can access a vector with []
BankAccount *target = (*b)[rand()%n];
int amount = rand()%15;
target->update_balance(amount);
source->update_balance(-amount);
}
}
int main(int argc, char **argv) {
// a vector of BankAccount pointers:
vector<BankAccount *> accounts;
int n;
n = atoi(argv[1]);
for(int i=0; i<n; i++) {
char name[32];
sprintf(name, "account %d", i);
BankAccount *b = new BankAccount(name, 100);
accounts.push_back(b); // add an element at the end of the vector
}
rnd_transac(&accounts);
for(int i=0; i<n; i++) {
BankAccount *b = accounts[i];
printf(/* ... */);
// this frees what is pointed by b but does not remove b from the
// vector, we'll clear it out below as we are currently iterating on
// that vector
delete b;
}
accounts.clear(); // remove all elements from the vector
}
}Inheritance in C++
Goal: reusing code between similar classes
- Class Person (Base class) — Class Student (Derived classes)
|___ Class Professor
Base class declaration:
1
2
3
4
5
6
7
8
9
10class Person {
private:
char *_name;
int _id;
public:
Person(const char *name, int id);
~Person();
char *get_name();
int get_id();
};Derived classes declarations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18class Student : public Person {
private:
double _gpa;
public:
Student(const char *name, int id,
double gpa);
void update_gpa(double new_gpa);
double get_gpa();
};
class Professor : public Person {
private:
int _courses_taught;
public:
Professor(const char *name, int id,
int courses_taught);
void update_courses_taught(int n);
int get_courses_taught();
};Methods implementations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27Person::Person(const char *name, int id) {
_name = new char[strlen(name)+1];
strcpy(_name, name);
_id = id;
}
Person::~Person() {
delete [] _name;
}
char *Person::get_name() {
return _name;
}
int Person::get_id() {
return _id;
}
Student::Student(const char *name, int id, double gpa) : Person(name, id)
{ _gpa = gpa; }
void Student::update_gpa(double new_gpa)
{ _gpa = new_gpa; }
double Student::get_gpa()
{ return _gpa; }
Professor::Professor(const char *name, int id, int courses_taught) : Person(name, id)
{ _courses_taught = courses_taught; }
void Professor::update_courses_taught(int n)
{ _courses_taught = n; }
int Professor::get_courses_taught()
{ return _courses_taught; }usage in external code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14int main(int argc, char **argv) {
Person p1("James", 1000);
Student p2("George", 1242, 3.8);
Professor p3("Jane", 1019, 2);
/* These methods is callable on all Person/Student/Professor object */
printf("%s's id: %d\n", p1.get_name(), p1.get_id());
printf("%s's id: %d\n", p2.get_name(), p2.get_id());
printf("%s's id: %d\n", p3.get_name(), p3.get_id());
/* Only callable on a Student object */
printf("%s's GPA: %lf\n", p2.get_name(), p2.get_gpa());
/* Only callable on a Professor object */
printf("%s teaches %d courses\n", p3.get_name(), p3.get_courses_taught());
return 0;
}Public Inheritance and Members Visibility
Public members of the base class visible in the derived class
Private members of the base class not visible in the derived class
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19class Base {
private: int b;
public: int a;
void f1() {
printf("f1 called\n");
a = 10; // working
b = 10; // working
}
};
class Derived : public Base {
public:
void f2() {
printf("f2 called\n");
f1(); // working
a = 12; // working
// b = 12; // can't do that,
// b is not visible, because it's private
}
};Polymorphism in C++
Having several implementations for a function with a given name
Polymorphism
- Function overloading compile-time polymorphism
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17// multiple implementation of the function print, differing in types and number of arguments
void print(int x) {
printf("an int: %d\n", x);
}
void print(char c) {
printf("a char: %c\n", c);
}
void print(double d1, double d2) {
printf("a couple of doubles:"
"%lf and %lf\n", d1, d2);
}
int main(int argc, char **argv) {
print(1);
print('a');
print(5.5, 1.2);
return 0;
}- A more complex example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
class Person {
private:
char _name[128];
int _id;
char _postcode[7];
public:
Person(const char *name);
Person(const char *name, int id);
Person(const char *name, const char *postcode);
Person(const char *name, int id, const char *postcode);
void print();
};
Person::Person(const char *name) {
strcpy(_name, name);
_id = 0;
_postcode[0] = '\0';
}
Person::Person(const char *name, int id) {
strcpy(_name, name);
_id = id;
_postcode[0] = '\0';
}
Person::Person(const char *name, const char *postcode) {
strcpy(_name, name);
_id = 0;
strcpy(_postcode, postcode);
}
Person::Person(const char *name, int id, const char *postcode) {
strcpy(_name, name);
_id = id;
strcpy(_postcode, postcode);
}
void Person::print() {
printf("name: %s, id: %d, postcode: %s\n", _name, _id, _postcode);
}
int main(int argc, char **argv) {
Person p1("James");
Person p2("Jane", 1234);
Person p3("George", "M139PL");
Person p4("Julie", 1235, "M188HN");
p1.print();//name: James, id: 0, postcode:
p2.print();//name: Jane, id: 1234, postcode:
p3.print();//name: George, id: 0, postcode: M139PL
p4.print();//name: Julie, id: 1235, postcode: M188HN
return 0;
}
- A more complex example
- Function overriding Runtime polymorphism
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Base {
public:
void f();
};
class Derived : public Base {
public:
void f(); // overrides f from Base class
};
void Base::f() {
printf("Base's f() called\n");
}
void Derived::f() {
printf("Derived's f() called\n");
}
int main(int argc, char **argv) {
Base b;
Derived d;
Base *base_ptr = &d;
b.f(); // base f called
d.f(); // derived f called
base_ptr->f(); // compiler sees that base_ptr is of the type Base * so !Base's f() is called!
return 0;
}Virtual Function Overriding
- Function overloading compile-time polymorphism
A virtual function is a member function which is declared within a base class and is re-defined (overridden) by a derived class. When you refer to a derived class object using a pointer or a reference to the base class, you can call a virtual function for that object and execute the derived class’s version of the function. By GeeksforGeeks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class Base {
public:
virtual void f();
};
class Derived : public Base {
public:
void f();
};
void Base::f() {
printf("Base's f() called\n");
}
void Derived::f() {
printf("Derived's f() called\n");
}
void call_f(Base *b) {
b->f();
}
int main(int argc, char **argv) {
Base b;
Derived d;
call_f(&b); // Base f() called
call_f(&d); // Derive f() called
return 0;
}Abstract Classes
A class with a least one pure virtual function is named an abstract class
1
2
3Class C {
virtual void pure_virtual_function(int a, int b) = 0;
}Cannot be instantiated
Requires all derived classes to override all pure virtual functions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class Animal {
public:
void sleep(); // all animals sleep similarly
virtual void sound() = 0; // all animals make noise, but different noises
};
class Dog : public Animal {
public:
void sound();
};
class Cat : public Animal {
public:
void sound();
};
void Animal::sleep() {
printf("zzzzz\n");
}
void Dog::sound() {
printf("woof\n");
}
void Cat::sound() {
printf("meow\n");
}
int main(int argc, char **argv) {
Dog d;
Cat c;
d.sound(); // woof
d.sleep(); // zzzzz
c.sound(); // meow
c.sleep(); // zzzzz
return 0;
}The Preprocessor
gcc src.c -o prog
src.c
(source file) -preprocessor->src-tmp.c
(Preprocesses Source) -compiler->prog
(executable program)Run by the C/C++ compiler transparently 透明地
Performs textual transformations in the sources
Header files: source file with .h extension, contain what is necessary to use foreign libraries/source files
- Headers should contain only declarations: functions prototypes, variables/custom types/structs/C++ classes declaration
- No implementation!
1
2
3// In the .c source file, include headers like that:
Can use
whereis stdio.h
to find location ofstdio.h
Macro Expansions
Textual substitutions, useful for compile-time defined constants
- Generally indicated in capital letters as a convention
1 |
|
- In general, try to have a less hardcoded values as possible
- When you see that you are writing a constant value more then once, check if using a macro is possible!
- Even if you have a constant used only once a macro can make it look more meaningful
- Macros can also be used to write macro “functions”
- Can be error-prone易出错, see here: https://bit.ly/3EaQ2DA
- Be careful with operator precedence! Use parentheses
1
2
3
4
5
6
7
8
9
10
11
12
// We can use a macro in another macro's definiiton:
// More than 1 macro in another macro's definition? use parentheses
int main(int argc, char **argv) {
// correct, expands to:
// (10 + 10) * 2
printf("total twice = %d\n",
TOTAL * 2);
return 0;
}Modular Compilation
- Large programs: need to organise the code in several source files
- Linux kernel sources (52K source files!)
- Modular compilation can break up the program code into different files representing well isolated compilation units?
- Also, can’t run gcc <52K source files> each time we do a small update to one or a few source files
- Automated compilation can automate efficiently the build process
Compilation Phase
gcc -c src.c -o src.o
,gcc src.o -o prog
src.c
(source file) (text) -compiler->src.o
(Object file)(binary, non-executable) -linker->prog
(executable program) (binary, executable)Breaking down the program into modules
- Hypothetical假设 example of a server application implemented into 3 .c source files, also named modules:
1
2
3
4gcc -c network.c -o network.o
gcc -c parser.c -o parser.o
gcc -c main.c -o main.o
gcc main.o network.o parser.o -o prog - Each module (.c source file) exposes公开 an interface callable from the other source files
- Should be as small as possible to hide internal code/data
- Same principle as for C++ classes public/private member functions/variables
- Should be as small as possible to hide internal code/data
- Assume the following interactions:
Header File for Modular Compilation
- There is generally 1 header file per module, defining its interface.
- Can be included in several .c files so must contain only declarations
- Function prototypes, struct/typedefs/enums declarations, variable declarations
- No definitions (i.e. function body/global variable initialisation)
- For example network.h:
1
2
3
4
5
6
7
8
9
typedef struct {
int id;
char content[128];
} request; // only declarations
void init_network();
int rcv_request(request *r); - in network.c:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18/* std includes here */
// this function and variable are internal
// so they are not declared in network.h
// the keyword static force their use
// to be only within the network.c file
static void generate_request(request *r);
static int request_counter = 0;
void init_network() {
/* init code here ... */
}
int rcv_request(request *r) {
generate_request(r);
/* ... */
}
static void generate_request(request *r) {
/* ... */
} - in parser.h
1
2
3
4
5
6
/* needed for the definition of request: */
void parse_req(request *r); - in parser.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* These two function are internal and their prototype is not in the header */
static void internal1(request *r);
static void internal2(request *r);
void parse_req(request *r) {
internal1(r);
internal2(r);
}
static void internal1(request *r) {
/* ... */
}
static void internal2(request *r) {
/* ... */
} - in main.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int main(int argc, char **argv) {
request req;
printf("[main] calling init_network()\n");
init_network();
while(1) {
printf("[main] calling rcv_request()\n");
if(rcv_request(&req) != 0)
break;
printf("[main] calling parse_req()\n");
parse_req(&req);
sleep(1);
}
printf("[main] exiting now\n");
return 0;
}C++ Source Management
- Common practice to divide the source code of a C++ program into multiple files as follows:
- one .h file per class or groups of related classes, containing class declarations
- one .cpp file per class or group of related classes, containing member functions implementations
Automated Compilation
1
2
3
4
5
6
7
8
9
10
11
12# Initial build:
gcc -c network.c -o network.o
gcc -c parser.c -o parser.o
gcc -c main.c -o main.o
gcc main.o network.o parser.o -o prog
# Assume we updated parser.c, to rebuild, we don't need to recompile everything:
gcc -c parser.c -o parser.o
gcc main.o network.o parser.o -o prog
# Now we updated parser.h, remember it's included in both parser.c and main.c, so:
gcc -c parser.c -o parser.o
gcc -c main.c -o main.o
gcc main.o network.o parser.o -o progMakefiles: Automated Build and Dependency Management
- Let’s consider the dependencies in our example program
- Text file named
Makefile
in the local source directory1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16# The first rule is executed when the
# command make is typed in the local folder:
all: prog
# executable deps and command to buil1d it:
prog: main.o network.o parser.o
gcc main.o network.o parser.o -o prog
# network.o deps and command to build it:
network.o: network.c network.h
gcc -c network.c -o network.o
parser.o: parser.c parser.h network.h
gcc -c parser.c -o parser.o
main.o: main.c network.h parser.h
gcc -c main.c -o main.o
# Special rule to remove all build files
clean:
rm -rf *.o prog - Rule Syntax:
1
2<target>: <dependencies>
[TAB] <command to build target from deps> - Use make file
- Type
make
in terminal1
2
3
4gcc -c main.c -o main.o
gcc -c network.c -o network.o
gcc -c parser.c -o parser.o
gcc main.o network.o parser.o -o prog - Type
touch network.c
in terminal to update network.c, thenmake
1
2gcc -c network.c -o network.o
gcc main.o network.o parser.o -o prog - Type
make clean
to remove all build files1
rm -rf *.o prog
Type Conversion and Casting
Implicit Type Conversions
1
2
3
4
5
6
7
8
9
10
11
int main(int argc, char **argv) {
char char_var = 12; // 1 byte, -128 -to 127
int int_var = 1000; // 4 bytes, -2*10^9 to 2*10^9
long long ll_var = 0x840A1231AC154; // 8 bytes, -9*10^18 to 9*10^18
// here, char_var is first promoted to int, then the result of the first
// addition is promoted to long long
long long result = (char_var + int_var) + ll_var;
printf("%lld\n", result); //2322860636554568
return 0;
}Integer Promotion
- Type
- Same type? no promotion
- Same signed/unsigned attribute? lesser rank promoted to greater rank
- Unsigned rank >= other operand rank? signed promoted to the unsigned rank
- Signed type can represent all the values of the unsigned type? unsigned operand promoted to signed type
- Otherwise, both operands converted to the unsigned type corresponding to the signed operand’s type
1
2
3int si = -1;
unsigned int ui = 1;
printf("%d\n", si < ui); // prints 0! si converted to unsigned int; 0 means false
- Always remember the storage sizes of types
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15int main(int argc, char **argv) {
int i = -1000;
unsigned int ui = 4294967295;
printf("%d\n", i + ui); // prints -1001
// i: 11111111111111111111110000011000 (A)
// ui: 11111111111111111111111111111111 (B)
// i + ui: 111111111111111111111110000010111 (C)
// final: 11111111111111111111110000010111 (D)
// (A) originally 2's complement promoted to unsigned
// (B) standard unsigned representation, max number an unsigned int can store
// (C) addition result
// (D) result overflows 32 bits as an int (expected by %d), loosing MSB
// Solution: use %ld rather than %d to store the result on 64 bits
return 0;
}Integer to Floating-Point Conversion
1
2
3
4
5
6
7
8
9
10
11
12
13
14int main(int argc, char **argv) {
//prints 7: 25/10 rounds to 2; 2 * 15 = 30; 30/4 rounds to 7
printf("%d\n", 25 / 10 * 15 / 4);
// prints 7.5: 25/10 rounds to 2; 2*15 = 30; 30 gets converted to
// 30.0 (double) and divided by 4.0 (double) giving result 7.5 (double)
printf("%lf\n", 25 / 10 * 15 / 4.0);
// prints 9.375: 25.0 / 10.0 (converted from 10) is 2.5, multiplied by 15.0
// (converted from 15) gives 37.5, divided by 4.0 (converted from 4) gives
// 9.375
printf("%lf\n", 25.0 / 10 * 15 / 4);
// prints garbage, don't try to interpret a double as an int!
printf("%d\n", 25.0 / 10 * 15 / 4);
return 0;
}Parameter Passing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17void f1(int i) {
printf("%d\n", i);
}
void f2(double d) {
printf("%lf\n", d);
}
void f3(unsigned int ui) {
printf("%u\n", ui);
}
int main(int argc, char **argv) {
char c = 'a';
unsigned long long ull = 0x400000000000;
f1(c); // prints 97 (ascii code for 'a')
f2(c); // prints 97.0
f3(ull); // overflows int ... prints 0 (lower 32 bits of 0x400000000000)
return 0;
}Type Casting 类型铸造
- Casting allows to force the type of an expression
- syntax:
(type)expression
1
2
3
4
5
6
7
8// prints 3.75: 3 gets converted to 3.7500
printf("%lf\n", (float)15/4);
// prints 4: 2.5 converted to 2 (int), multiplied by 12 gives 24, divided
// by 5 gives 4
printf("%d\n", ((int)2.5 * 12)/5);
// prints 4.8: 2*12 = 24, converted to 24.0, divided by 5.0 gives 4.8
printf("%lf\n", ((int)2.5 * 12)/(double)5);
return 0;1
2
3int si = -1;
unsigned int ui = 1;
printf("%d\n", si < (int)ui); // prints 1Generic Pointers
- void * can be used as a generic pointer to pass data of different types
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
typedef enum {
CHAR,
INT,
DOUBLE
} type_enum;
void print(void *data, type_enum t) {
switch(t) {
case CHAR:
printf("character: %c\n", *(char *)data);
break;
case INT:
printf("integer: %d\n", *(int *)data);
break;
case DOUBLE:
printf("double: %lf\n", *(double *)data);
break;
default:
printf("Unknown type ...\n");
}
}
int main(int argc, char **argv) {
char c = 'x';
int i = 42;
double d = 11.5;
print((void *)&c, CHAR); //character: x
print((void *)&i, INT); //integer: 42
print((void *)&d, DOUBLE); //double: 11.500000
return 0;
}Debugging with GDB
- Compile with
-g
flag to include debug symbols - To launch gdb on a particular binary :
gdb program
in a shell b test.c:12
places a breakpoint on the instruction at line 18 in test.cb my_func
places a breakpoint on the first instruction ofmy_func
b test.c:12 if x == 42
is a conditional breakpoint, will pop only if x is equal to 42 when the corresponding statement is executedrun
starts the program- In single-step mode:
- C/C++ are extensively used in High Performance Computing (HPC) because of their speed
- Due to many reasons, including the fact that they gives the programmer control over the data memory layout
Controlling Memory Layout
- For performance reasons it is very important to fit as much of the data set as possible in the cache
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20typedef struct {
char c[60];
int i;
double d;
} my_struct;
my_struct array[N];
int main(int argc, char **argv) {
struct timeval start, stop, res;
my_struct s;
gettimeofday(&start, NULL);
/* Randomly access N elements */
for(int i=0; i<N; i++)
memcpy(&s, &array[rand()%N],
sizeof(my_struct));
gettimeofday(&stop, NULL);
timersub(&stop, &start, &res);
printf("%ld.%06ld\n", res.tv_sec,
res.tv_usec); // takes 14.896576s
return 0; } - Struct size: 60 + 4 (int) + 8 (double) = 72 bytes
- Larger and not a multiple of the cache line size (64 bytes)
- Most objects in the array will require to fetch 2 cache lines from main memory
1
2
3
4
5
6typedef struct {
char c[52]; // down from 60, we have 52 + 4 + 8 == 64 bytes i.e. a cache line
int i;
double d;
} my_struct;
my_struct array[N] __attribute__ ((aligned(64))); /* force alignment of the array itself */ //takes 13.696097s
- Should be 25% faster, mine is not so explicit because I’m using sandbox50
Case study - C Standard Library
- Provides and implement all the functions from stdio.h, stdlib.h, etc.
- C + a tiny bit of assembly
- Default libc in most Linux distribution Glibc
- Complex and hard to study
- https://www.gnu.org/software/libc/
- Simpler: Musl
- https://www.musl-libc.org/
Using Musl - System Calls
- https://www.musl-libc.org/
- Application request services from the OS through system calls
- Generally not made directly by user code but rather the libc
- Upon syscall 在系统调用时: user/kernel “world switch”
- System calls have a calling convention
- Machine instructions used to trigger a syscall
- On Intel x86-64, this convention is as follows:
- Syscall number in %rax
- List here: https://filippo.io/linux-syscall-table/
- Syscall parameters in order in %rsi, %rdx, %r10, %r8, and %r9
- Execute the syscall instruction
- Return value in %rax
Musl’s printf Implementation
printf("value: %d %lf\n", int_val, double_val)
;
- In src/stdio/printf.c:
1
2
3
4
5
6
7
8
9
10
11
int printf(const char *restrict fmt, ...)
{
int ret;
va_list ap; // variable argument list management
va_start(ap, fmt);
ret = vfprintf(stdout, fmt, ap); // stdout: data structure representing the std output
va_end(ap);
return ret;
} - In src/stdio/vfprintf.c:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15/* lots of logic for format string parsing ... */
int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap) {
/* ... */
ret = printf_core(f, fmt, &ap2, nl_arg, nl_type);
/* ... */
}
static int printf_core(FILE *f, const char *fmt, va_list *ap, /* ... */) {
/* ... */
if (f) out(f, a, l);
/* ... */
}
static void out(FILE *f, const char *s, size_t l)
{
if (!(f->flags & F_ERR)) __fwritex((void *)s, l, f);
} - All these transforms something like “hello: %d, %lf”, 12, 12.0” into “hello: 12, 12.00000”
- In src/stdio/fwrite.c:
1
2
3
4
5
6size_t __fwritex(const unsigned char *restrict s, size_t l, FILE *restrict f)
{
/* ... */
size_t n = f->write(f, s, i); /* this is actually a call to __stdout_write(f, s, i);
/* ... */
} - In src/stdio/__stdout_write.c:
1
2
3
4
5
6
7
8size_t __stdout_write(FILE *f, const unsigned char *buf, size_t len)
{
struct winsize wsz;
f->write = __stdio_write;
if (!(f->flags & F_SVB) && __syscall(SYS_ioctl, f->fd, TIOCGWINSZ, &wsz))
f->lbf = -1;
return __stdio_write(f, buf, len);
} - In src/stdio/__stdio_write.c:
1
2
3
4
5
6size_t __stdio_write(FILE *f, const unsigned char *buf, size_t len)
{
/* ... */
cnt = syscall(SYS_writev, f->fd, iov, iovcnt);
/* ... */
} - In C, printing to the standard output corresponds to a write (or writev) syscall on a file descriptor representing the standard output]
- In src/internal/syscall.h:
1
2
3
4
5
6
7
8
9/* ... */
/* ... */1
2
3// This is a set of macros that transforms "syscall(SYS_writev, f->fd, iov, iovcnt); we saw
// earlier into:
__syscall_3(SYS_writev, f->fd, iov, iovcount); - We now reach the architecture-specific part, let’s look at the x86-64 code, in arch/x86_64/syscall_arch.h:
1
2
3
4
5
6
7
8
9
10
11static __inline long __syscall3(long n, long a1, long a2, long a3) {
unsigned long ret;
__asm__ __volatile__ ( "syscall" : // 5) the syscall instruction
"=a"(ret) : // 6) we'll get the return value in rax
"a"(n), // 1) syscall number in rax
"D"(a1), // 2) argument 1 in rdi
"S"(a2), // 3) argument 2 in rsi
"d"(a3) : // 4) argument 3 in rdx
"rcx", "r11", "memory");
return ret;
}1
2
3
4
5
6
7401bae: 41 be 14 00 00 00 mov $0x14,%r14d
...
401beb: 48 63 7b 78 movslq 0x78(%rbx),%rdi
401bef: 49 63 d5 movslq %r13d,%rdx
401bf2: 4c 89 f0 mov %r14,%rax
401bf5: 48 89 ee mov %rbp,%rsi
401bf8: 0f 05 syscallPrinting Without Libc
- Now that we know how to print on stdout, we can make a syscall directly from our program without involving libc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18/* Without libc the default entry point is _start */
void _start() {
unsigned long ret;
/* write(1, "hello!\n", 7); */
__asm__ __volatile(
"syscall" : // the syscall instruction
"=a"(ret) : // we'll get the return value in rax
"a"(1), // syscall number (1 for write)
"D"(1), // argument1: file descriptor (1 for stdout)
"S"((long)"hello!\n"), // argument2: char array to print
"d"(7) : // argument3: number of bytes to write
"rcx", "r11", "memory");
/* exit(0); */
__asm__ __volatile( "syscall" : : // syscall instruction
"a"(60), // exit's syscall number
"D"(0) : // exit parameter: 0
"rcx", "r11", "memory");
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
160000000000001000 <_start>:
1000: 55 push %rbp
1001: 48 89 e5 mov %rsp,%rbp
1004: 48 8d 35 f5 0f 00 00 lea 0xff5(%rip),%rsi # argument 2, address of "hello!"
100b: b8 01 00 00 00 mov $0x1,%eax # syscall number, 1 == write
1010: bf 01 00 00 00 mov $0x1,%edi # argument 1, fd 1 == stdout
1015: ba 07 00 00 00 mov $0x7,%edx # argument 3, 7 bytes to write
101a: 0f 05 syscall
101c: 48 89 45 f8 mov %rax,-0x8(%rbp)
1020: b8 3c 00 00 00 mov $0x3c,%eax # syscall number, 60 == exit
1025: ba 00 00 00 00 mov $0x0,%edx # argument 1: 0
102a: 89 d7 mov %edx,%edi
102c: 0f 05 syscall
102e: 90 nop
102f: 5d pop %rbp
1030: c3 retqCase Study - OS Kernel
- C is the default language to write OS kernels
System Calls in a Nutshell
- Stack pointer (in CPU) -> Task stack (in Memory)
lnstr.pointer (in CPU) -> Task code (in Memory) - Stack pointer (in CPU) -> Kernel stack (in Memory)
lnstr.pointer (in CPU) -> Kernel code (in Memory) - CPU push all registers to Kernel stack
- Kernel stack pop all registers to CPU
- Stack pointer (in CPU) -> Task stack (in Memory)
lnstr.pointer (in CPU) -> Task code (in Memory)Context Switches in a Nutshell
- Stack pointer (in CPU) -> Task A’s stack (in Memory)
lnstr.pointer (in CPU) -> Task A’s code (in Memory) - Stack pointer (in CPU) -> Task A’s Kernel stack (in Memory)
lnstr.pointer (in CPU) -> Kernel code (in Memory) - CPU push all registers to Task A’s Kernel stack
- Task B’s Kernel stack pop all registers to CPU
- Stack pointer (in CPU) -> Task B’s Kernel stack (in Memory)
lnstr.pointer (in CPU) -> Kernel code (in Memory) - Stack pointer (in CPU) -> Task B’s stack (in Memory)
lnstr.pointer (in CPU) -> Task B’s code (in Memory)System Call and Context Switch Implementations in a Real OS
- HermiTux
- Very small OS for cloud applications
- Originally based on HermitCore (https://github.com/hermitcore/libhermit)
- ~10K lines of C code and a bit of assembly, great to study and learn about OSes
- https://github.com/ssrg-vt/hermitux
- Very small OS for cloud applications
- Context switch & syscall handling can only be done by the kernel
- The only way to enter the kernel is through an interrupt:
- Interrupt traps to the kernel which saves the state of the interrupted task for resuming properly later. Interrupt traps(保存状态)-enter kernel -恢复使用
- Example with sched_yield syscall (syscall + context switch):
- Syscall entry point is in arch/x86/kernel/entry.asm:
1
2
3
4
5
6
7
8isyscall:
cli ; disable interrupts
push rax ; start pushing the task state on the stack
push rcx
; ... push many other registers
mov rdi, rsp ; prepare a datastructure containing register values for use by the kernel
sti ; enable interrupts
call syscall_handler ; jump to kernel (C) code managing the syscall - syscall_handler is in arch/x86/kernel/isrs.c:
1
2
3
4
5
6
7
8
9void syscall_handler(struct state *s) {
switch(s->rax) {
// ... one "case" for each syscall number
case 24: // sched_yield's number is 24
s->rax = sys_sched_yield();
break;
/* ... */
}
} - sys_sched_yield (in kernel/syscalls/sched_yield) simply calls check_scheduling, in kernel/tasks.c:
1
2
3
4
5
6
7
8
9void check_scheduling(void)
{
/* ... */
uint32_t prio = get_highest_priority();
task_t* curr_task = per_core(current_task);
if (prio > curr_task->prio) {
reschedule();
/* ... */
} - Finally, reschedule calls switch_context (in kernel/tasks.c)
- switch_context is in arch/x86/kernel/entry.asm:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18switch_context:
push rax ; ... start pushing the state of the scheduled out task on the stack
push rcx
push rdx
; ...
jmp common_switch
; ...
common_switch:
; ...
call get_current_stack ; get new rsp
mov rsp, rax ; switch stack to the new task's one
; ...
pop r15 ; start restoring the scheduled in task state (reverse order)
pop r14
; ...
add rsp, 16 ; at that point the stack pointer points to the saved instruction pointer
iretq ; restore instruction pointer from the stack, i.e. jumps there - Executes return path in the kernel until we reach the kernel/user boundary
1
2
3
4
5
6
7
8
9isyscall:
; ...
call syscall_handler
; ...
pop r15 ; start restoring the userland state of the task
pop r14
pop r13
; ...
sysret ; return from syscall handler to userspaceMemory Unsafety
- C/C++ are inherently天生的 memory unsafe
- There is absolutely no check at runtime if memory accesses is initialised, allocated, or mapped
- Generally the compiler won’t warn you either
- A few examples of memory errors
- Out of band accesses on buffers/arrays accesses
- Failure to initialise before read stack/heap allocated variables
- Leaks, double free, use-after-free
- Memory errors are hard to detect, hard to debug
- No compiler/runtime warning
- Undefined behaviour: sometimes it crashes, sometimes not…
Example 1: Infoleak
Original code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16// char *welcome_message = "Hi there! How is it going?\n"; // 27 char
// the it's updated to a shortened welcome message, only 11 chars now
char *welcome_message = "Hi there!\n";
char *password = "secret";
char entered_password[128];
int main(int argc, char **argv) {
// Print welcome message character by character
// but the loop is still 27!!
for(int i=0; i<27; i++)
printf("%c", welcome_message[i]);
printf("Please input the password:\n");
scanf("%s", entered_password);
if(!strcmp(entered_password, password)) { printf("Passowrd ok!\n"); /* ... */ }
else { printf("Wrong password! aborting\n"); }
return 0;
}
- It read overflows welcome message, and password leaks to stdout!
Example 2” Sensitive Data Tampering
1
2
3
4
5
6
7
8
9
10
11
12
13char user_input[32] = "0000000000";
char password[32] = "secret";
int main(int argc, char **argv) {
if(argc != 2) { printf("Usage: %s <password>\n", argv[0]); return 0; }
strcpy(user_input, argv[1]);
if(!strncmp(password, user_input, strlen(password))) {
printf("login success!\n");
/* do important stuff ... */
} else {
printf("wrong password!\n");
}
return 0;
} - If user_input is write overflows, in “strcmp”, strings are equal so password will be accepted
Example 3: Stack Smashing
- Classic control flow hijacking劫持 attack
- Originally proposed in 1996 here: http://www.phrack.org/archives/issues/49/14.txt
- See how the CPU manage function calls/returns
- The machine push return address on the stack and jump to the function
- CPU pops return address from the stack and jumps to the end of calling function line
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19char *password = "super-secret-password";
void security_critical_function() {
printf("launching nukes!!\n"); //发射核武器
}
void preprocess_input(char *string) {
char local_buffer[16];
strcpy(local_buffer, string);
/* work on local buffer ... */
}
int main(int argc, char **argv) {
if (argc != 2) { /* ... */ }
preprocess_input(argv[1]); // This line makes mistake
if(!strncmp(password, argv[1],
strlen(password)))
security_critical_function();
else
printf("Unauthorised user!\n");
return 0;
} - Stack layout memory when preprocess_input runs
- Overflow local_buffer and rewrite the return address
- Then the return -> CPU jumps to security_critical_function()!
Example 4: Use-After-Free
1
2
3
4
5
6
7
8
9
10
11typedef struct {
int member1;
void (*member2)(int);
} my_struct;
void print_hello(int x) {
printf("Hello, parameter: %d\n", x);
}
void security_critical_function() {
printf("Auto destruction engaged!\n");
/* ... */
}1
2
3
4
5
6
7
8
9
10
11
12
13int main(int argc, char **argv) {
/* allocate and init ms */
my_struct *ms = malloc(sizeof(my_struct));
ms->member1 = 42;
ms->member2 = &print_hello;
/* call the function pointer */
ms->member2(12);
free(ms);
char *buffer = malloc(16);
memcpy(buffer, argv[1], 16);
ms->member2(12);
exit(0);
}
- Because I can control what to input in command line, can overwrite the function pointer with the address of security
Good Practices
Writing Good Code
- The compiler won’t catch all mistakes
- The tools have their limitations
- Writing good code from the start is super important
- It will save you a lot of debugging time
- It will save you from introducing serious security issues
- Never use these functions, always unsafe:
- gets (use fgets)
- getwd (use getcwd)
- readdir_r (use readdir)
- More here: https://bit.ly/3ef4TBc
String Manipulation Functions
- Use the n methods,
- strcpy -> strncpy
- sprintf -> snprintf
- Even with the versions with n, some specificities
- Check malloc return value
- After free, the pointer is invalid
- Cannot be dereferenced
- Cannot be used (ex: comparison)
- realloc returns null upon failure but does not free the old pointer
- so this: ptr = realloc(ptr, new_size) is a leak
Quiz review
Quiz 2
- Int is 4 bytes, short is 2 bytes, float is 4 bytes, double is 8 bytes, char is 1 bytes, unsigned long int is 8 bytes.
- The compiler, parsing the sources from top to bottom, wants to have either the full implementation or at least a forward declaration of negate_int before a first call to that function is encoutered.
Quiz 3
- strcmp returns 0 when the strings match
- 11 in base 10 corresponds to 0xB In hexadecimal.
- struct members are placed in order contiguously in memory, the address of the first member of the struct is the same as the address of the struct itself, so the address of the second member is 0x7ffd04d39a48 + 4 bytes (the size of the 1st member) = 0x7ffd04d39a4c
- Pointers are useful to allow a function to modify its calling context and to avoid passing large parameters as copy.
- y is a local variable and it is thus located on the stack. x being a global variable, it is located in static memory.
- We use O_RDWR to indicate read and write mode, O_TRUNC to destroy the file content if it exists, and O_CREAT to create the file if it does not exist
Quiz 4
- For a parent function C1 with
virtual int function
:- it cannot be instantiated
- all classes inheriting from C1 need to provide an implementation for function(
- The vector needs to be declared with the containing type between
< >
. An empty vector can be filled withpush_back()
. The vector constructor also takes an optional parameter that initializes empty elements, allowing them to be updated with[ ]
without prior calls to push_back(). - strcpy should be used rather than = to ensure a copy of the string itself and not a simple copy of the pointer used to refer to the character array
- Dynamic object allocation is made with new, that returns a pointer to an object and not the object itself
-
Library *l = new Library(100);
-
- Overloading corresponds to compile-time polymorphism, while overriding regards runtime polymorphism
- Constructor’s implementation with same name
1
2
3
4
5
6
7
8
9
10
11class C {
private:
int member;
public:
C(int member);
};
C::C(int member) {
this->member = member;
} - Call the parent constructor: following a colon placed right after the child’s constructor parameter
list. _a
cannot be accessed directly in the child’s constructor (or in any of the child member functions) because it is private in Parent.1
2
3Child::Child(int a, int b) : Parent(a) {
_b = b;
}Quiz 5
- By convention, up to 6 arguments for system calls are passed in order in %rdi, %rsi, %rdx, %r10, %r8, and %r9
- By convention, on x86-64 the return value of a system call is placed in %rax
- During the execution of printf, Musl uses
writev
to write to the file descriptor corresponding to the standard output under the hood by the Musl C library - The preprocessor is used for including headers, expanding macros, and enabling/disabling bits of code conditionally.
- The int gets promoted to a double and the result of the addition is then a double, which gets interpreted as an int due to the %d specifier, printing garbage.
1
2
3
4
5
6
7
8
9
int main(int argc, char **argv) {
int i = 12;
double d = 4.6;
printf("%d\n", i + d);
return 0;
} - Access latency rank: registers are the fastest, followed by the cache, followed by main memory, then we have the disk.
- interpreted as an unsigned int (4 bytes), the long long unsigned int (8 bytes) gets its higher 4 bytes truncated截断.
- E.g:
long long unsigned int x = 0x400000000001;
turn to unsigned int will become1
- E.g:
- The addition overflow the 4 bytes reserved for an int, and the variable wraps over to the smallest value that can be stored in an int: -2147483648
- int i = 2147483647 + 1
- 2147483647 is the max positive value that can be stored in an int
- The addition of 1 triggers an overflow of the unsigned int type that is stored on 4 bytes, an as it is unsigned it wraps over to 0.
1
2
3
4
5
6
7
8
int main(int argc, char **argv) {
unsigned int ui = 4294967295; // this is 32 bits full of 1's
printf("%u\n", ui+1);
return 0;
} - Illustrate preprocessor:
- The preprocessor produces source files as output
- The preprocessor performs textual transformations on C sources
- The preprocessor is generally called under the hood by the compiler
- During the program build process, the preprocessor pass is made before the compilation (transformation of sources into machine code) pass
- The preprocessor takes source files as input
- Include guards allow to avoid the issue that the compiler encountering the definition of the struct twice while there should be only a single definition per struct over the entire program.
- C/C++ are fast because they integrated well with assembly, and have no runtime overheads due to things such as the lack of memory access checks or garbage collection. They also give the programmer a high degree of control over the program memory layout, which helps for various things such as fitting as much data/code as possible in the CPU cache.
Quiz 6
- Use-after-free bugs are generally not detected by the compiler, they put the program into undefined behavior and can allow an attacker to subvert its control flow. They do not necessarily lead to a crash at runtime.
- the mechanism(s) used to save the user context of a task during a syscall, and to save the kernel context of a task during a context switch by Pushing the CPU registers’ content on the stack
- Mistakes that corresponds to memory errors
- Reading an uninitialized variable on the stack
- Double free of a pointer
- Reading the content of an uninitialized buffer on the heap
- Use-after-free
- Out-of bound array/buffer access
- Dereferencing a NULL pointer
- System call handling and context switching in an OS are generally implemented
- with a mix of C and assembly
- with memory unsafe languages
1
2
3
4
5
6
7
8
9
10
11
12int f(char *user_buf) {
char local[128];
strcpy(local, user_buf);
printf("you entered: %s\n", user_buf);
/* ... */
}
int main(int argc, char **argv) {
if(argc != 2) { /* ... */ }
f(argv[1]);
/* ... */
}
- The possibility to overflow in write mode a local buffer with strcpy can lead to a stack smashing attack.