Distributed System

Introduction to Distributed System

The Evolution

  • From 1945 to 1985
    • Computers were large and expensive
    • It was not possible to connect computers, so they operated independently from one another.
    • From 1985
      • Advances in technology began to change that situation, including:
        • the development of powerful microprocessors,
        • the invention of high-speed computer networks, and
        • the miniaturization 小型化 of computer systems.
    • The result of these technologies is that it is, now, possible and relatively easy to put together a computing system composed of large numbers of networked computers, large or small.
      • As these computers are generally dispersed分散的, they are said to form a distributed system.

        Examples of Distributed Systems

  • The Web over the Internet
    • Documents, email, media, commerce, etc.
  • Mobile telephony
    • Calls, texts, location, sensing, etc.
  • Electronic funds transfer
    • Banking, credit cards, etc.
  • Instant messaging
  • Video conferencing 会议
  • Home entertaining systems
  • Global positioning systems

    Why Distributed Systems?

  • To share resources:
    • One publisher, many beneficiaries!
  • To bind customers and suppliers
  • To allow us to do things we could not otherwise do, due to
    performance, scalability, reliability and availability issues.
    • Performance issues: e.g.,if using 1 computer,it takes 60 minutes, I’ll use 100 computers and it will take 0.6 minutes!
    • Scalability issues: e.g.,if there’s 10 times more to do in th 0.6
      minutes I have, I’ll use 1,000 computers, then, if it things quieten
      down again, I’ll go back to using 100 computers!
    • Reliability and availability issues: e.g., if 1% of computers fails
      every day, add an extra 1% to keep the 0.6 stable and on!
      In other words:
  • To make continuously-evolving不断发展, remote resources accessible for
    sharing.
  • To open proprietary 所有权 processes to external interaction in order to foster cooperation 促进合作.
  • To achieve better performance/cost ratios.
  • To scale effectively and efficiently if demand需求 for resources changes significantly.
  • To scale through modular, incremental expansion and contraction.
  • To achieve high levels of reliability and availability.

    What is a Distributed System

    “A distributed system is a collection of autonomous 自主 computing elements that appears to its users as a single coherent相关 system”

  • A computing element or a node, can be either a hardware
    device or a software process
    • Note that a node can be anything from a high-performance mainframe computer to a small device in a sensor network. Also, nodes can be interconnected in anyway.
  • In a single coherent system, users (i.e., people or applications) believe they are dealing with a single system.
    • Note that, for this to be possible, the autonomous nodes(each node will have its own notion of time) need to
      collaborate. The way in which this collaboration is established represents the most fundamental property that distinguishes between different distributed systems.

      Characteristics of Distributed System

  • Distributed systems can also vary in:
    • Size(e.g.,from a few to millions of computers).
    • The way in which nodes are interconnected (e.g.,via a wired, wireless or a hybrid混合 (a combination of both) network)
    • The way in which node membership is handled(e.g.,by being open or closed group when allowing new nodes to join).
  • Note that, although nodes can act independently from one another, they cannot ignore one another. Otherwise, there is no point in putting them to compose构成 the same system!
  • Nodes are programmed to achieve common goals, and they do this by exchanging messages with each other.
  • To appear to users as a single coherent system, distribution transparency is an important goal of distributed systems.
  • Computation is concurrent
  • There is no shared state
    • No single global clock that all programs can agree to follow
  • Failures occur may will not be noticeable
  • Communication events have non-negligible不可忽略 duration
    • communication costs may be more siginificant than processing time
  • Components may exchange data at variable rates可变速率.
  • Different components process data at different rates.
    • Asynchrony异步 (i.e., non-aligned 不对齐的 timelines) is unavoidable.

      Fallacies 谬误 about Distributed System

      “Distirbuted systems differ from traditional software because components are dispersed across a network. Not taking this dispersion into account during design time is what makes so many systems needlessly complex and results in flaws that need to be patched later on”.”
      False assumptions that are commonly made when developing a distributed application:

  1. The network is reliable
  • Various way of nodes linking (network topology)
  • DS is composed of different and varied (i.e., heterogeneous) resources (software and hardware)
    • Example of Lack of software reliability: need for reliable message exchange between nodes (e.g., functionality for retrying messages, acknowledging承认 messages, verifying message integrity, etc.)
    • In essence, the result is a network of networks, each managed with local scope by a different group of administrators 每个网络由不同的管理员组在本地范围内管理
    • As a consequence, no (sufficiently足够的 complex to be useful) network of networks is reliable. 如果不足够复杂以至于有用的话,网络就不是可靠的
  1. The network is secure
  • You may need to build security into your applications from Day 1.
  • As a result of security considerations, you might not be able to access networked resources, different user accounts may have different privileges.
  1. The network is homogeneous
  • Interoperability 互操作性 will be needed.
  • The use of standard technologies, such as XML (a W3C recommended general- purpose markup language) or JSON will be necessary.
  1. The topology does not change
  • Network topology is the way in which nodes of a distributed system are linked.
  • In the wild, servers may be added and removed often, clients (laptops, wireless ad hoc networks) are coming and going: the topology is changing constantly.
  • Do not rely on specific endpoints or routes.
  • Abstract from the physical structure of the network, by (using the most obvious example) using DNS names as opposed to IP addresses (which may vary) for referring to an endpoint.
    • DNS (Internet Domain Name System) is a hierarchical and decentralized naming system for computers, services, or other resources connected to the Internet or a private network. It is commonly used to translate more readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols.
  1. Latency is zero
  • The minimum round-trip time between two points on earth is determined by the maximum speed of information transmission: the speed of light.
    • At 300,000 km/sec, it will take at least 30msec to send a ping from Europe to the USA and back.
      • Ping is a computer network administration software utility that measures the round-trip time for messages sent from the originating host to a destination computer that are echoed back to the source.
  1. Bandwidth is infinite
  • Bandwidth is a measure of how much data it is possible to transfer over a period of time (may be measured in bits/second).
  • To avoid congestion拥塞 and increase connection throughput, the loss of data packets being transmitted from one point to another can be performed.数据包丢失的现象可能会随着增加链接吞吐量而发生
    • Throughput is a measure of how much data is successfully transferred from source to destination within a given timeframe. In other words, it is used to measure how many packets arrive at their destinations successfully. It can be measured in bits/second. Throughput测量数据包传递的成功率
    • To avoid packet loss, we may want to use larger packet sizes.
  1. Transport cost is zero
  • Information needs to be serialised (by marshalling编组) to get data onto the wire.
    • Marshalling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission.
    • The cost (in terms of money) for setting and running the network is not zero.
  1. There is one administrator
  • Unless we refer to a small LAN, there will be different administrators associated with the network with different degrees of expertise.

    Distributed System Organization

  • To assist the development of distributed applications, distributed systems are often organized to have a separate layer of software that is logically placed on top of the respective各自的 operating systems of the computers that are part of the system.
  • This separate layer is called middle layer and offers each application the same interface as well as the following:
    • Facilities for inter-application communication.
    • Security services.
    • Support for transaction management.
    • Recovery from failures.
    • Support for Web services composition.
    • Etc.

      Designed for Transparency

      An object can be a resource or a process.
  • Access - Hide differences in data representation and how an object is accessed
  • Location - Hide where an object is located.
  • Relocation - Hide that an object may be moved to another location while in use.
  • Migration - Hide that an object may move to another location.
  • Replication - Hide that an object is replicated.
  • Concurrency - Hide that an object may be shared by several independent users.
  • Failure - Hide the failure and recovery of an object.

Basic Concepts: Synchronization

  • Synchronisation refers to data synchronisation and process synchronisation
  • Data synchronisation: about keeping multiple copies of a dataset in coherence with one another, given that the various copies are located in different nodes.
    • E.g: Copy photos from mobile device to laptop
  • Process synchronisation : about multiple processes needing to act together to achieve a certain overall purpose.
  • Synchornisation requires fast and reliable communication between the process
  • None of the assumptions is safe, synchronisation behaviour is often challenging

    Synchronization is needed specially when…

  • Multiple processes need to agree on the ordering of events, such as whether message m1 from process P was sent before or after message m2 from process Q.
  • Multiple processes try to simultaneously 同时 access a shared resource, such as a printer, and should, instead, cooperate in granting 授予 each other temporary临时的 exclusive独家的 access.

    Synchronization Challenges in Distributed Systems

  • Since nodes in a distributed system are connected via a network, and networks are not always reliable, coordination 协作 of actions that depend on communication over a network is quite challenging.
    • For example, the same message sent to different nodes can take a different time interval to arrive at each node; if sent multiple times to the same node, it may take different times to arrive at the node, or sometimes it may not arrive at all.
  • Sending a synchronisation message from one node to another could be considered, however, because there is no way of knowing exactly how long a message is going to take to arrive at its destination, this does not represent a good solution to this problem.
  • Another alternative选择 could be to timestamp a message as its sent out, but, as there is no way of knowing how long the message took to arrive at its destination, unless the recipient’s clock is in sync with the sender’s clock in the first place, which cannot be not guaranteed, as nodes are independent.
  • In fact, about two nodes with independent clocks, the only thing that is certain is that a message will not arrive at its destination before it is sent.

    Definition

  • a -> b is “event a happens before event b”.
  • If a and b are events in the same process, and a occurs before b, then a -> b is true.
  • If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a -> b is also true.
  • Message transmission takes a finite, nonzero amount of time
  • Happens-before (->) is a transitive relation, so if a -> b and b -> c, then a -> c.
  • If two events, x and y, happen in different processes that do not exchange messages (not even indirectly via third parties), then x -> y is not true, but neither is y -> x. These events are said to be concurrent

    Logical Clocks

  • Logical clocks take advantage of the fact that an implicit partial ordering of events can be obtained from the simple sending and receiving of messages between processes in a distributed system. As such, they do not measure “real time” but, instead, provide a distributed incremental pseudo time to events in a distributed system.
  • A simple example is Lamport’s logical clock. It assumes that each processor i has a Logical Clock, LCi.
    1. When an event occurs on processor i, LCi is incremented by one.
    2. When processor X sends a message to Y, it also sends its Logical Clock, LCx.
    3. When Y receives the message, if its local Logical Clock is already in advance of the clock it has just received plus one timestep, it keeps its current Logical Clock, otherwise it sets its local Logical Clock to be the one it has just received plus one timestep, i.e.:if LCy < (LCx + 1): LCy = LCx + 1
  • Notes on Lamport’s Logical Clock
    1. the receipt of a message forces the recipient to move its clock forward so that the “happened after” relationship is preserved 保存 at that point.
    2. if a -> b (a happens before b) then it is true that LCa < LCb. But it is not necessarily true that just because LCa < LCb then a -> b. This means that we cannot infer 推断 a causal 因果 relationship just by looking at timestamps.
    3. by using logical clocks, we can obtain a basic partial ordering of events, i.e., we can tell that one event happened after another one, but we do not obtain a perfectly synchronised global time.

Example 2: Stopping Events from Happening Simultaneously When Sharing Resources

  • Assume that there are multiple distributed components collaborating to achieve one goal.
  • For example: the selling of a game ticket to a client. The involved components include a database of available tickets, at least two bank accounts (one for the ticket vendor, one for the person buying the ticket), and possibly other systems for validating addresses, and arranging for the ticket to be delivered electronically.
  • These systems need to choreographed to act as one for the purposes of selling a ticket, to avoid situations such as the following: (1) one ticket is selected by a client, which happens to be the last ticket. But, just before the client is able to pay for it, the ticket is purchased by another client who had also selected it; or (2) when paying for the ticket, the client confirms the payment transaction, and so the money leaves his account, but a system crash prevents the money from arriving at the seller’s account.
  • Possible solutions:
    • Centralised Lock Server (and Mutual Exclusion Locks (mutex))
    • Two Phase Commit Algorithm

      Centralised Lock Server and Mutual Exclusion Locks (Mutex)

  • Client:
  1. Sends a request to the lock server for a mutex on a given resource.
  2. When a reply comes back, it starts executing its critical process over the resource.
  3. When its finished, it sends a message to release the mutex.
  • The Lock Server:
  1. If the resource is available, it marks it as being used by the client and sends a reply to say that the lock has been granted. Otherwise, it puts the request in a queue.
  2. When the mutex is released by the client, if another client wants the resource (i.e., is in the queue waiting), pass the mutex to them, otherwise mark it as being available.
  • This solution presents a basic limitation: a single point of failure (i.e. the central lock server).
  • Although the solution is able to protect sequences of events that need to be treated as ‘atomic’ (indivisible) operations, it cannot make sure that, if any part of the sequence of event fails, the state of the system remains unchanged.

    Two Phase Commit Algorithm

  • The Two Phase Commit Algorithm ensures that a sequence of events either runs to successful completion, or, if the sequence fails, that the overall system is returned to its original state as though nothing had happened. In other words, it allows intermediate steps that have occurred to be undone (i.e., rolledback) to return things back to a safe, sensible state, if a fault is detected.
  • A coordinator node requests a transaction, and sends a request to all participants nodes
    • • e.g., to node C1, it sends ‘request to remove X pounds from account’, and to C2 sends ‘request to add X pounds to account’.
  • All participants respond as to whether they are willing and able to execute the request, and send VOTE_COMMIT or VOTE_ABORT.
    • They log their current state, and then perform the transaction
    • All participants log their vote
  • The coordinator looks at the votes. If everyone has voted to commit, then the co-ordinator sends a GLOBAL_COMMIT to everyone; otherwise it sends a GLOBAL_ABORT.
  • On receiving the decision from the coordinator, all participants record the decision locally. If it was an ABORT, participants ROLL BACK to their previous safe state.

    Electing a Coordinator Node

  • The Bully Algorithm is a mechanism for choosing a coordinator from a set of candidate nodes.
    • The algorithm gets its name from the fact that higher numbered nodes ‘bully’ lower numbered nodes into submission.
  • Note 1: the algorithm relies on the use of timeouts to decide when to ‘give up’ waiting for responses from nodes that have potentially died (so the usual problem of ‘how long is it reasonable to wait’ applies here).
  • Note 2: the algorithm assumes that the participating nodes are ordered.
  • P sends an ELECTION message to all nodes with higher numbers.
  • If no one responds, P wins the election and becomes the co-ordinator. It informs all the other nodes that it is now the coordinator.
  • If one of the higher-numbered nodes Q answers, P concedes that it is not the winner, and Q begins the election process again until one node eventually wins.

    Clock Synchronization

  • In distributed systems, clock synchronisation is not completely possible by the fact that messages are not sent instantaneously over real networks, and that there usually is some degree of variation in the time messages take to arrive at their destination.
  • As long some amount of error is acceptable, there are at least two widely acceptable ways for getting clocks in different parts of a system into near-synchronisation.
    • Cristian’s Algorithm
    • The Berkeley Algorithm

      Cristian’s Algorithm

  • Cristian’s algorithm works between a process P, and a time server S connected to a source of UTC (Coordinated Universal Time).
  • It relies on the accuracy of an estimate based on the Round Trip Time (RTT), which is the elapsed time between the time when a request is sent from P to S, and the time when a response is received by P from S.
  • P requests the time from S.
  • After receiving the request from P,S prepares a response and appends the time T from its own clock. The response is sent to P.
  • P then sets its time to be T+RTT/2, where RTT is Round Trip Time of the request P made to S.
    • This method assumes that the RTT is split equally between both request and response, which may not always be the case but is a reasonable assumption on a LAN connection.
  • But what happens if the RTT estimate is not accurate, i.e., it is different when the update is sent to that which was measured?
    • Clocks tend to drift (incorrect)
  • And what happens if send and receive do not take the same amount of time, which affects RTT/2?
  • These issues are dealt with by a similar but more complex algorithm, The Berkeley Algorithm.

    The Berkeley Algorithm

  1. A master is chosen via an election process such as the Bully Algorithm.
  2. The master polls the slaves who reply with their time in a similar way to Cristian’s algorithm.
  3. The master observes the round-trip time (RTT) of the messages and estimates the time of each slave and its own.
  4. The master then averages the clock times, ignoring any values it receives far outside the values of the others.
  5. Instead of sending the updated current time back to the other process, the master then sends out the amount (positive or negative) that each slave must adjust its clock. This avoids further uncertainty due to RTT at the slave processes.
  • Unlike Cristian’s algorithm, the server process in the Berkeley algorithm, called the master, periodically polls other slave processes.
  • The clock synchronisation method used in this algorithm, the average, cancels out individual clock’s tendencies to drift.

    Deadlock

  • Deadlock occurs when four particular conditions hold simultaneously in a system:
  1. Mutual exclusion: at least one resource must be non-shareable, i.e. only one process can use that resource at a given time.
  2. Hold and wait: a process is currently holding at least one resource, and is requesting at least one more resource that is currently being held by another process (i.e., it is waiting for a process to release something).
  3. No pre-emption: once a process has acquired a resource, nothing (e.g., the operating system or some external factor) can force it to relinquish that resource; it has to do so voluntarily.
  4. Circular wait: a process must be waiting for a resource that is being held by another process, which in turn and possibly indirectly, is waiting for the first process to release a resource.
    • Put another way ,processes are waiting for one another in such a way that there is a cycle of dependencies between them. For example, if A waits for B, B waits for C, and C waits for A, we have a cycle of waiting that would fulfil this condition.

      Example

  • Consider a situation where two processes are running at the same time; one of them is using a printer, and the other, a scanner. It comes a point in time when:
    • Both processes can no longer continue to do their work until they can get access to the device that is being used by the other process.
    • The processes will not relinquish control of the device they are currently using until they get it.
  • The result is: now both processes are in a deadlock state—neither can continue because they are waiting for the other.

    Dealing with a Deadlock

  • There are three potentially sensible strategies for dealing with a deadlock:
    • Prevent deadlock from occurring in the first place by making sure that it is never possible for all four of the conditions to be true at the same time (you might like to think about what this means in each of the four cases).
    • Avoid deadlock by making sure that, even though it is potentially possible for all four conditions to hold at the same time, that they never do.
    • Detect a deadlock and deal with the consequences (this last approach is very similar to ‘avoid’ really, in that it requires having some mechanism for detecting whether all four conditions are about to hold (for avoid) or actually do hold (for detect)).
    • Whether you decide to go for a detect or avoid approach really depends on how likely or frequently a deadlock is likely to happen—if some property of your system means that deadlock is possible, but hugely unlikely, it might make sense to allow it to occur once in a while and to then deal with the consequences, rather than to incur the overhead of trying to avoid it from happening in the first place.

Introduction to Basic Concepts: Naming Schemes

  • In distributed systems, names are used to uniquely identify entities, to refer to locations, and more.
  • An important issue with naming is that a name can be resolved to the entity it refers to; name resolution thus refers to the means by which a process is allowed to access a named entity, which is supported by a naming system.
  • The implementation of a naming system is itself often distributed across multiple machines.

    Example: Internet Naming

  • As in any distributed system, every computer connected to the Internet needs to be “addressable”, so that other computers on the net are able to “talk” to each other. Naming entities is the addressing mechanism via which a computer on the Internet is uniquely identified.
    • The Internet is the biggest distributed system of all, being a huge network of networks.
  • Possible approaches to addressing mechanisms include:
    • Centralised
    • Free-for-all
    • By delegating (give) naming responsibilities

      Centralised Naming Approach

  • Centralised naming is the most obvious approach to guaranteeing that any name is handed out once and only once. In this approach, there is a single point of contact, that either validates that a name is unique, or alternatively makes up a unique name and hands that out on demand.
  • Its main limitation is that the single point of contact has to deal with every request, and, as a consequence, it is not a very scalable solution, and it creates a single point of failure.

    Free-for-All Naming Approach

  • Free-for-all allows any object that wants a name to make up its own name.
  • Although this is a massively ‘distributed’ solution which avoids a single point of failure, it does not guarantee uniqueness.

    The ‘Delegating Naming Responsibilities’ Approach

  • In this approach, the authority to allocate names is delegated to smaller parts of the system, and governed by some rules.
  • This approach better balances the conflicting issues associated with single points of failure and scalability, but it raises questions as to what rules are appropriate for each system.
    • E.g., a rule could state that every device in a particular organisation has to have a name that includes the name of the organisation; another rule could state that every device in a particular country has to have a name that includes the name of the country; etc.

      MAC Addresses

  • The Media Access Control (MAC) address is a unique identifier given to each network device in a system: this means that every ethernet or wifi card in a computer has one MAC address.
  • Note that, since most computers have several network devices, there are more MAC addresses than there are computers. A MAC address is a 48 bit number, so that means there are 281,474,976,710,656 different numbers, or enough for every person living on Earth to have lots of different network devices.
  • A MAC consists of two main parts: the ‘Organisationally Unique Identifier’ (OUI) and the ’Network Interface Controller’ (NIC).
  • A MAC address does not tell you where a device is on a network.
  • The OUI is a 24 bit number that is purchased from the Institute of Electrical and Electronics Engineers (IEEE), which acts as a central authority from which vendors (selling provider) of hardware can purchase unique identifiers. Once a vendor has an OUI to use in the MAC addresses for its hardware, as long as that particular vendor makes sure that the NIC part of the address is unique, then the combination of the OUI and the NIC will be unique.

    IP Addresses

  • An Internet Protocol Address (IP Address) serves two purposes: it is a ‘unique’ identifier and also contains some information about ‘where’ a device is on a network.
  • Most IP addresses are 32 bit numbers, but are most often written as four 8 bit numbers separated by dots (e.g. 130.88.192.9).
  • The top-level authority for IP addresses is the Internet Assigned Numbers Authority (IANA), which delegates ranges of the address space to five Regional Internet Registries, which in turn delegate sub ranges of their space to Internet service providers.
  • Note that, unlike MAC addresses, the delegation of IP addresses takes place initially to geographical regions (rather than ‘hardware vendors’),
  • And, for this reason, IP addresses can tell you some information about the location of a device on a network.
  • There are not enough IP addresses for each person on Earth, and so, IPv6W, the latest generation of IP address, uses 128 bit numbers to overcome this problem.

    Domain Names

  • Domain Names were created because humans find IP addresses hard to read.
  • The Domain Name System (DNS) is itself a Distributed System built on top of the Internet, used to create associations between human-readable names and IP addresses.
    • For example, www.bbc.co.uk, will be “translated into” an IP address that you can use to communicate with the appropriate computer.
  • The ‘delegation’ model used by the DNS is complex, as it has aspects of geographical delegation
    • For example, domain names ending in ‘.co.uk’ are for UK-based companies.
  • But it also separates matters out in other ways as well.
    • For example, domains ending in ’.com’ and ’.org’ have no geographical implications, but refer to companies and organisations.
  • Unlike MAC and IP addresses, DNS cannot allocate ‘batches’ of names up- front, and needs to respond in real-time to requests to translate names into IP addresses.
    • It achieves this by being in itself a Distributed System, consisting of a hierarchy of servers with the most authoritative server at the ‘top’ of the hierarchy, dealing with requests from users.

Introduction to Basic Concepts: Protocols

  • Protocols define sets of rules governing how two or more objects should interact with one another. Protocols serve as specifications rather than implementations of a piece of technology.
  • An example of an relevant protocol is the HyperText Transport Protocol (HTTP), which provides an specification (i.e., a vocabulary) that allows client applications to request resources from Web servers, and Web servers to respond to these requests.
  • Examples of “verbs” used by client applications and Web servers when communicating via the HTTP protocol over the Internet are shown below.
    • HEAD - Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.
    • GET - Requests a representation of the specified resource. Requests using GET should only retrieve data and should have no other effect.
    • POST - ubmits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
    • OPTIONS - Returns a list of the commands supported by this particular server.
    • DELETE - It is used to delete a resource. It may return the a representation of the removed resource.

      Statelessness

  • By following the HTTP protocol, Web servers are said to be stateless, meaning that once a request from a client application is fulfilled, the Web server disconnects from the client and “forgets” that the client ever connected.
  • The stateless nature of the communication between client and server allows the system to treat each request for content as an independent transaction that can be completed.
  • However, there are ways in which state information can be preserve. These are outside of the scope of the protocol, but encoded in applications and servers that use it.

    Email

  • Electronic mail (email) seems like a simpler system than ‘the Web’, in many ways it is more complex.
  • The expectation is that an email has to be in exactly one place at any one time.
  • If it is in two places at the same time then it has been duplicated accidentally.
  • If it is in zero places then it has been lost ‘in the system’, in detriment(damage) of the sender and the recipient of that email.

    Email Associated Protocols

    The Simple Mail Transport Protocol (SMTP)
  • Like HTTP, SMTP is a text-based protocol. But unlike HTTP, it is ‘connection based’, meaning that a client (in this case, the Mail User Agent) can issue multiple consecutive (following) comments to the SMTP server, and should explicitly terminate its connection when its finished.
  • At the end of this exchange, Bob’s email has successfully been moved from his mail client, to his ‘outgoing mail server’.

Sequential vs. Multi Processing, Concurrent, Parallel and Distributed Computing

  • A machine makes available computing and storage resources.
  • By computing resource we mean, e.g., a processor, such as a CPU.
  • By storage resource we mean, e.g., a given amount of primary memory.
  • A process is an executing instance of a program.
  • Resource usage is typically controlled by an operating system (OS).
  • Since resources are scarce (rare) and differ in their capabilities, an OS aims to make the most efficient use possible of those resources.
  • The OS assigns a unique identity to each process and then controls how a process is granted access to computing resources.
  • The OS also controls how a process is granted access to storage resources by assigning an address space to that process.
  • When the OS ensures that each process P has a single address space A that is exclusive (only) to P , we are allowed a sequential reading of the steps that comprise the process.

    Sequential Processing

  • If foo and bar take long to run, we may wish to run them concurrently, in different processes, and perhaps even better, in parallel, in different processors.
  • If foo and bar are proprietary services held in remote machines, we may not be able to hold local copies of them.
  • Sequential, isolated processing is simple , but bounded and limiting.
  • Non sequential, non isolated processing expands the bounds and limits with respect to performance.
  • It is possible to switch between a process P and a process P’ if, for example, P is idle (free) waiting for something (like I/O) to complete; also, we get more responsiveness, as:
    • While printing a long document, your machine still allows you to go on doing other things.
    • While downloading a file, a web browser still allows you to traverse (iterate) a link.
    • While you ponder(think) what to do next, the same machine goes about attending to someone else.

      Multi-Processing vs. Multi-Tasking

  • Two different concepts:
  • An OS multi tasks by:
    • allowing more than one process to be underway by controlling how each one makes use of the resources allocated to it
    • implementing a scheduling policy, which grants each active process a time slice during which it can access the resources allocated to it.
  • Taking it literally, in multi-tasking, processes are not really executing concurrently.
    • Concurrent execution is only apparent.
  • The appearance of concurrent execution stems from an effective scheduling policy.
  • If all processes get a fair share of the resource and they get it sufficiently often, it seems to users that all processes are executing concurrently.
  • For example, while a process P is waiting on a slow output device, the OS may schedule another process P’ to make use of the CPU. It seems to users that the machine is both printing for P and running P’.

    Multi-Processing by Forking

  • When a process forks (using an OS call) it causes two copies of itself to be active concurrently.
  • The child process is given a copy of the parent process’s address space. The address spaces however are distinct. And so, if either process modifies a variable in its address space, this change is not visible to the other process.
    • The child process starts executing after the OS call.
    • The parent can continue or wait for the child to execute.
    • Ultimately (finally) the parent must mean to find out how and when the child completes execution.
  • Forking is quite common in the client-server type of distributed computing, as a server typically forks a child process for each request it receives
  • Because of the copying , forking can be expensive.
  • In practice, modern OSs have strategies that make the actual cost quite affordable.
  • Forking is reasonably safe because the address spaces are distinct
    • Discipline in adhering to best practice is nonetheless required (e.g., to avoid zombie processes, to avoid unintended sharing of references to files, etc.)

      Multi-Processing by Threading

  • Forking imposes (increase) a certain degree of isolation. And so, if parent and child need to interact and share, threading may be a better approach to multi tasking.
  • With threading, the address space is not copied, it is shared.
    • This means that if one process changes a variable, all other processes see it.
    • This makes threading less expensive, but also less safe than forking.

      Concurrent Computing

  • Consider many application processes.
  • Processes are often threads
  • The OS schedules the execution of n copies of a process Pi, 1 =< i=< n, to run in the same processor, typically sharing a single address space.

    Parallel Computing

  • There are now many processors bound by an interconnect (e.g., a bus across processors).
  • There is truly many processes running at the same time, not just multi threading, but true parallelism.
  • The n copies of a process Pi, 1 =<i =< n, can, each, run in one of m, 1
    =<j =< m, different processors Cj, possibly (but not necessarily)
    sharing a single address space A.

    Distributed Computing

  • There are many independent, self-sufficient, autonomous, heterogeneous machines.
  • We now have spatial (in space) separation.
  • Message exchange is needed, network effects are felt.
  • Complexity may reach a point in which applications are not written against OS services. Instead, they are written against a middleware API. The middleware then takes some of the complexity upon itself.
  • The n (not necessarily identical) processes Pi, 1 =< i =< n, each run in one of m, 1 =< j =< m, different machines Mj, that cannot share a single address space A (and therefore must communicate).

Architectures of Distributed Systems

  • An obvious way to distinguish between distributed systems is on the organisation of their software components and how they interact. In other words, their software architecture.
  • Centralised architectures, e.g., traditional client-server, where a single server implements most of the software components (and thus functionality), while remote clients can access that server using simple communication means.
  • Decentralised architectures, e.g., peer-to-peer architectures in which all nodes more or less play equal roles.
  • Hybrid architectures, i.e., combining elements from centralized and decentralized architectures.

    Software Architectural Styles

  • A software architectural style is formulated in terms of components, the way that components are connected to each other, the data exchanged between components, and finally how these elements are jointly configured into a system.
    • A component is a modular unit with well-defined required and provided interfaces that is replaceable within its environment.
    • A connector is a mechanism that mediates communication, coordination, or cooperation among components . In other words, a connector allows for the flow of control and data between components .
  • A variety of architectural styles result from the creation of systems configurations using components and connectors, including:
    • Layered architectural styles
      • In this architectural style, components are organized in a layered fashion where a component at layer Lj can make a downcall to a component at a lower-level layer Li (with i < j) and generally expects a response.
      • The layers on the bottom provide a service to the layers on the top. The request flows from top to bottom, whereas the response is sent from bottom to top. In this approach, the calls always follow a predefined path.
    • Object-based architectural styles
      • In this architectural style, each object corresponds to a component, and these components are connected through a procedure call mechanism, that can take place over a network, if the calling object is not on the same machine as the called object.
      • Object-based architectures provide a natural way of encapsulating data and the operations that can be performed on that data into a single entity.
      • And so, communication between objects happen as method invocations. These are generally called Remote Procedure Calls (RPC).
    • Resource-centered architectural styles
      • In this architectural style, a distributed system is viewed as a huge collection of resources that are individually managed by components. Resources may be added or removed by (remote) applications, and likewise can be retrieved or modified.
      • It is based on a data center, where the primary communication happens via a central data repository.
      • This approach has been widely adopted for the Web.
    • Event-based architectural styles
      • In this architectural style, processes running on the various components are both referentially decoupled and temporally coupled. In other words, one process does not explicitly know any other process, but for coordination to take place processes need to be running at the same time.
      • In such scenarios, the only thing a process can do is publish a notification describing the occurrence of an event).
      • Assuming that notifications come in all sorts and kinds, processes may subscribe to a specific kind of notification.
      • Component send event notification to Event bus, and the bus deliver to other component

        Centralised System Architectures: Examples

        simple Client-Server Architectures
  • Processes in a distributed system are divided into two main groups: Clients and Servers.
  • A server is a process implementing a specific service, for example, a file system service or a database service.
  • A client is a process that requests a service from a server by sending it a request and, subsequently, waiting for the server’s reply.
  • The client-server interaction is known as request-reply behaviour.

    Multi-Tiered Client-Server Architectures

  • Typically presents three logical tiers (layer).
  • The distinction (difference) into three logical tiers suggests a number of possibilities for physically distributing a client-server application across several machines. However, the simplest organization is to have only two types of machines:
    • A client machine containing only (part of) the user-interface level.
    • A server machine containing the rest, i.e., the programs implementing the processing and data management functionalities.
  • In the most typical organization, all functionality is handled by the server, while the client is essentially no more than a dumb terminal.
  • Many distributed applications are divided into the three layers.
    • User interface layer,
    • Processing layer,
    • Data layer.
  • Thus, the main challenge to clients and servers is to distribute these layers across different machines.
  • A server may sometimes need to act as a client, as shown, typically leading to a physically three-tiered architecture.
  • An example of use of this architecture is in the organisation of Web sites.

  • Multi-tiered client-server architectures are a consequence of dividing distributed applications into a user interface, processing, and data-management components, where the different tiers correspond directly with the logical organization of applications. This type of distribution is called vertical distribution, achieved by placing logically different components on different machines.

    Decentralised System Architectures: Peer-to-Peer Systems

  • In decentralised architectures, there is a greater concern about distributing client and server functionality more evenly across machines to achieve better workload balance.
  • As a consequence, a client (or a server) may be physically split up into a number of logical parts, with each part operating on “its own share of the complete data set”. This is a horizontal distribution of functionality.
  • A class of modern system architectures that support this horizontal distribution is known as peer-to-peer systems.
  • From a high-level perspective, the processes that constitute a peer-to-peer system are all equal.
  • Much of the interaction between processes is symmetric: each process will act as a client and a server.
  • Due to the symmetric behaviour of processes in peer-to-peer architectures, processes are organised in an overlay network;
    • i.e., its nodes are formed by the processes and its links represent the possible communication channels (TCP connections).
      • The Transmission Control Protocol (TCP) is one of the main Internet Protocols.
    • Thus node may not be able to communicate directly with an arbitrary other node, but is required to send messages through the available communication channels.
  • Two types of overlay networks exist, characterising peer-to-peer systems as:
    • Structured
    • Unstructured

      Structured Peer-to-Peer Systems

  • In structured P-2-P systems, nodes are organized in an overlay that adheres (persist) to a specific, deterministic topology: a ring, a binary tree, a grid, etc.
  • This topology is used to efficiently look up data that is maintained by the system,
    • i.e., each data item is uniquely associated with a key, typically obtained by a hash function on the data item’s value. This key is used as an index, since it identifies a node in the system.
    • key(data item) = hash(data item’s value)
  • The topology of a structured peer-to-peer system plays a crucial role: any node can be asked to look up a given key, i.e., to efficiently route a request for data to the node responsible for storing the data associated with the given key.

    Structured P2P System example

  • A peer-to-peer system with a fixed number of nodes, organised into a hypercube.
  • Each data item is associated with one of the 16 nodes of the hypercube - by hashing the value of a data item to a key k ε {0, 1, 2, . . .,24 –1}.

    Unsrtuctured Peer-to-Peer Systems

  • In an unstructured peer-to-peer system each node maintains an ad hoc list of neighbours, such that the resulting overlay resembles a random graph: a graph in which an edge <u, v> between two nodes u and v exists only with a certain probability P[<u, v>].
  • When a node joins in, it often contacts a well-known node to obtain a starting list of other peers in the system. This list can then be used to find more peers, and perhaps ignore others, and so on. In practice, a node generally changes its local list almost continuously.
  • Unlike structured P-2-P systems, looking up data cannot follow a predetermined route, because lists of neighbours are constructed in an ad hoc fashion. Instead, searching for data is necessary.

    Example of Searching Methods

    Flooding

  • Assume an issuing node, u, passes a request for a data item to all its neighbours.
  • Each of its neighbours, v:
    • Ignores the request when it has seen it before, otherwise, searches locally for the requested data item.
    • If it has the required data, it can either respond directly to the issuing node u, or send it back to the original forwarder, who will then return it to its original forwarder, and so on.
    • If it does not have the requested data, it forwards the request to all of its own neighbours.
  • Obviously, flooding is expensive, for which reason a request often has an associated time-to-live or TTL value, giving the maximum number of hops a request is allowed to be forwarded.

    Random Walks

  • An issuing node, u, tries to find a data item by asking a randomly chosen neighbour, v.
  • If v does not have the data, it forwards the request to one of its randomly chosen neighbours, and so on.
  • Generally, a random walk imposes less network traffic than Flooding, but it may take longer before a node is reached that has the requested data. To decrease the waiting time, an issuer can simply start n random walks simultaneously.
  • A random walk also needs to be stopped. To this end, a TTL can be used, or alternatively, when a node receives a lookup request, it can check with the issuer whether forwarding the request to another randomly selected neighbour is still needed.
  • Notably in unstructured P-2-P systems, locating relevant data items can become problematic as the network grows, causing a scalability problem.

    Making Data Search more Scalable in Unstructured P-2-P Systems

  • To improve scalability of data search, P-2-P systems can make use of special nodes that maintain an index of data items, abandoning their symmetric nature by creating “special” collaborations among nodes.
  • For example, in a collaborative content delivery network (CDN), nodes may offer storage for hosting copies of Web documents, allowing Web clients to access pages nearby, and thus to access them quickly.

    Hybrid Architectures

  • Encompass classes of distributed systems in which client-server solutions are combined with decentralized architectures. Examples:
    • Edge-server systems
    • Collaborative distributed systems
  • One of the main motivations for these hybrid architectures, is the scalability problems of unstructured peer-to-peer systems, and the difficulties in workload balancing in traditional client-server architectures.

    Collaborative Distributed Systems

  • Combine traditional client-server structures (when nodes are joining the system) and fully decentralised structures (once a node has joined the system). An example of such a system is BitTorrent, a peer-to-peer file downloading system.
  • In BitTorrent, an end user, looking for a file, downloads chunks of the file from other users, until the downloaded chunks can be assembled together, yielding the complete file.
    • To download a file, a user needs to access a global directory containing references to torrent files.
    • A torrent file contains the information that is needed to download a specific file, such as a link to a file tracker, a server that keeps an accurate account of active nodes that have (chunks of) the requested file.
      • There will be many different trackers, although there will generally be only a single tracker per file (or collection of files).
    • Once the nodes have been identified from where chunks can be downloaded, the downloading node effectively becomes active.

      Edge-Server Systems

  • Are characterised by the following main properties:
    • Are deployed on the Internet
    • Their servers are placed “at the edge” of the network (i.e., the boundary between enterprise networks and the actual Internet).
  • The edge server’s main purpose is to serve content, possibly after applying filtering and transcoding functions.
    • For a specific organization, one edge server acts as an origin server from which all content originates. That server can use other edge servers for replicating high-demand content, e.g., Web pages.
  • Edge-server systems have recently been used to assist data centers in cloud computations and storage, leading to distributed cloud systems. In the case of fog computing, even end-user devices form part of the system and are (partly) controlled by a cloud-service provider.

Inter-Process Communication

  • Inter-process communication encompasses (include) the ways that processes on different machines can exchange information.
  • Traditional inter-process communication has always been based on low-level message passing, as offered by the underlying network; thus it is harder to realise than communication based on shared memory, as available for non-distributed platforms.
  • As modern distributed systems often consist of thousands or even millions of processes scattered (dispersed) across a network with unreliable communication such as the Internet, development of large-scale distributed applications is very difficult.
  • Two widely-used models for communication are:
    • Remote Procedure Call (RPC)
      • Communication transparency cannot be achieved with traditional inter-process communication, as low-level operations (e.g., send and receive) do not conceal (hide) communication.
      • Based on the idea that programs can call procedures located on other machines, an RPC aims at hiding most of the intricacies of message passing, and is ideal for client-server applications.
      • For example: when a process on machine A calls a procedure on machine B, the calling process on A is suspended (stopped), and execution of the called procedure takes place on B. Information can then be transported from the caller to the callee in the parameters and can come back in the procedure result. This way, no message passing is visible to the programmer.
      • A more detailed example
        • A program has access to a database that allows it to append data to a stored list, after which it returns a reference to the modified list. The operation is made available to a program by means of a routine append: newlist = append(data, dbList)
        • When append is a remote procedure, a different version of append (a.k.a. client stub) is offered to the calling client. The client stub packs the parameters into a message and requests that message to be sent to the server, by calling send. Following the call to send, the client stub calls receive, blocking itself until the reply comes back.
        • When the message arrives at the server, the server’s OS passes it to a server stub. The server stub unpacks the parameters from the message and then calls the server procedure in the usual way. The server performs its work and then returns the result to the caller (the server stub).
          • A server stub is the server-side equivalent of a client stub: it is a piece of code that transforms requests coming in over the network into local procedure calls.
      • In summary:
      1. The client procedure calls the client stub in the normal way.
      2. The client stub builds a message and calls the local operating system.
      3. The client’s OS sends the message to the remote OS.
      4. The remote OS gives the message to the server stub.
      5. The server stub unpacks the parameter(s) and calls the server.
      6. The server does the work and returns the result to the stub.
      7. The server stub packs the result in a message and calls its local OS.
      8. The server’s OS sends the message to the client’s OS.
      9. The client’s OS gives the message to the client stub.
      10. The stub unpacks the result and returns it to the client.
      • Parameter Passing in RPCs
        • The function of the client stub is to take its parameters, pack them into a message, and send them to the server stub.
        • Packing parameters into a message is called Parameter Marshalling (Not straightforward).
        • The server just sees a series of bytes coming, which constitute (compose) the original message sent by the client, i.e., no additional information on how those bytes should be interpreted is provided with the message.
        • The meta-information can be recognized as original message by transforming data into a machine-and network-independent format, making sure that both communicating parties expect the same message data type to be transmitted.
          • The latter can typically be solved at the level of programming languages.
          • The former is accomplished (finish) by using machine-dependent routines that transform data to and from machine- and network-independent formats.
        • Marshalling is all about this transformation to neutral (fair) formats and forms, an essential part of remote procedure calls.
      • Passing References to Objects
        • Pointers or references passed by copying the entire data structure to which the parameter is referring, effectively replacing the copy-by-reference mechanism by copy-by-value/restore.
      • Asynchronous RPCs
        • To support situations in which there is simply no result to return to the client, RPC systems may provide facilities for what are called asynchronous RPCs.
        • With asynchronous RPCs, the server immediately sends a reply back to the client the moment the RPC request is received, after which it locally calls the requested procedure. The reply acts as an acknowledgment to the client that the server is going to process the RPC. The client will continue without further blocking as soon as it has received the server’s acknowledgment.
        • The feedback is only means the remote procedure has received the client’s call!
    • Message-Oriented Middleware (MOM)
      • MOM relies on message-queuing mechanisms for providing extensive support for persistent asynchronous communication. As such, message-queuing systems offer intermediate(middle)-term storage capacity for messages, without requiring either the sender or receiver to be active during message transmission.
      • Message-queuing systems are typically targeted to support message transfers that are allowed to take minutes instead of seconds or milliseconds.
      • The sender posts a message in the queue, and the receiver retrieves the message from the queue.
      • Applications communicate by inserting messages in specific queues which are then forwarded until eventually delivered to the destination.
      • A sender is generally given only the guarantees that its message will eventually be inserted in the recipient’s queue.
      • Message-queuing systems are often used to assist the integration of dispersed collections of databases as well as in publish-subscribe systems.
      • “Message queues enable asynchronous communication, which means that the endpoints that are producing and consuming messages interact with the queue, not each other. Producers can add requests to the queue without waiting for them to be processed. Consumers process messages only when they are available. No component in the system is ever stalled (stopped) waiting for another, optimizing data flow.”

Replication of Data in Distributed Systems

Data Replication

  • Replication is used to
    • Enhance system reliability
    • Improve performance.
    • Potentially improve system scalability
  • Major challenge: keeping data replicas consistent

    Increasing Reliability through Replication

    Motivation:
  • If a file system has been replicated, it may be possible to continue working with it after one replica crashes, by simply switching to one of the other replicas.
  • By maintaining multiple replicas of the same data, it becomes possible to provide better protection against corrupted 损坏的 data.

    Improving Performance through Replication

    Motivation:
  • A distributed system needs to scale in terms of size
    • E.g., when an increasing number of processes needs to access data that are managed by a single server. 当越来越多的进程需要访问单独的服务器的数据时
  • A distributed system needs to scale in terms of the geographical area it covers
    • E.g., by placing a copy of data in proximity of the process using them,the time to access the data decreases. As a consequence, performance, as perceived by that process, increases.
    • 通过将数据副本放置在使用它们的进程附近,访问数据的时间会减少。 因此,该过程所感知的性能会提高。
    • Note that, although a client process may perceive better performance, it may also be the case that more network bandwidth is consumed to keep all replicas up to date

      The Price of Replication

      The problem with replication is that having multiple copies may lead to consistency problems.

  • Whenever a copy is modified, that copy becomes different from the other copies.
  • Consequently, modifications have to be carried out on all copies to ensure consistency.
  • Exactly when and how those modifications need to be carried out determines the price of replication.
    Example - Trying to improve access times to Web pages:
  • To imporve performance, Web browsers often cache a Web page
    • Pros: excellent access time from a user viewpoint
    • Cons: requires refetching for latest version of a page
  • Solution 1: Always fetch pages from the server
    • Poor access time if there is no local copy
  • Solution (2): Allow Web server to invalidate or update each cached copy
    • Degrade 降低 the overall performance

      Replication for Performance

  • Local data replication helps to reduce access time and solve scalability problems.
  • However, the following problems arise from data replication:
    • Keeping multiple copies up to date may require more network bandwidth
    • Keeping multiple copies consistent may itself be subject to serious scalability problems.
      Intuitively, a collection of copies is consistent when the copies are always the same.

      Tight Consistency through Synchronous Replication

  • To guarantee tight consistency, a data update should be performed to all data copies at the same time
  • In other words, an update must performed at all copies as a single atomic operation.
  • Implementation of such atomic operation, involving potentially a large number replicas, is inherently 天生地 difficult:
    • The replicas may be widely dispersed across a large-scale network.
    • Operations on the replicas may be required to complete quickly.

      Relaxing Tight Consistency

  • Global synchronization takes a lot of communication time, especially when replicas are spread across a wide-area network.
    • Solution: relax the consistency constraints.
  • The (instantaneous 瞬间的 ) global synchronizations are avoided if consistency constraints are relaxed, performance may be improved.
    • Cons: replicas may not always be the same everywhere.
  • The extent 程度 to which consistency can be relaxed highly depends on the following:
    • the access and update patterns of the replicated data,
    • the purpose for which the data are used.

      Consistency Models - Assumptions

  • Assuming that there is a data store to which multiple processes running on different machines have access, a consistency model is a “contract” between the processes and this data store.
    • The store “promises” to work correctly provided that the defined rules are obeyed by the processes.
  • Local write operations (performed on copy of the data) are propagated 传播 to the other copies.
  • A data operation is classified as a write operation when it changes the data, otherwise, is a read operation.
    • Any consistency model restricts the values that a read operation on a data item can return.

    Sequential Consistency Model

  • A data store is sequentially consistent when it satisfies the condition:
    • “The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.”
  • There is no involvement of time; no reference to the “most recent” write operation.

Example 1

  • Consider four processes operating on the same data item x.
    • Process P1 first performs a write operation W1(x)a on x by setting the value of x to a.
    • Later, process P2 also performs a write operation W2(x)b, by setting the value of x to b .
  • Although, both processes P3 and P4 first read value b, and later value a (i.e., the write operation W2(x)b of process P2 appears to have taken place before W1(x)a of P1), the figure shows sequentially consistent store.
    1
    2
    3
    4
    P1: W(x)a
    P2: W(x)b
    P3: R(x)b R(x)a
    P4: R(x)b R(x)a
    Example 2
  • In contrast, the scenario shown bellow violates sequential consistency because not all processes see the same interleaving of write operations.
    • P3 will have ‘a’ as final value of x whereas P4 will conclude that the final value is ‘b’.
      1
      2
      3
      4
      P1: W(x)a
      P2: W(x)b
      P3: R(x)b R(x)a
      P4: R(x)a R(x)b

      Causal Consistency Model

  • A data store is causally consistent if it satisfies the condition:
    • “Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.”
  • This model distinguishes between events that are possibly causally related and those that are not.
    • For example, if event b is caused or influenced by an earlier event a, causality requires that “everyone else” first sees a, then see b.
  • Operations that are not causally related are said to be concurrent

Example 1

  • The following event sequence is allowed with causally consistent store only (i.e., it is not a sequentially consistent store).
  • Note that the writes W2(x)b and W1(x)c are concurrent, so it is not required that all processes see them in the same order.
    1
    2
    3
    4
    P1: W(x)a                          W(x)c
    P2: R(x)a W(x)b
    P3: R(x)a R(x)c R(x)b
    P4: R(x)a R(x)b R(x)c
    Example 2
  • Consider the following event sequence where W2(x)b potentially depends on W1(x)a because writing the value b into x may be a result of a computation involving the previously read value by R2(x)a.
  • Note that the two writes are causally related, so all processes must see them in the same order. Therefore, the event sequence is incorrect.
    1
    2
    3
    4
    P1: W(x)a                     
    P2: R(x)a W(x)b
    P3: R(x)b R(x)a
    P4: R(x)a R(x)b
    Example 3
  • Consider the following event sequence where W1(x)a and W2(x)b are concurrent writes.
  • As a causally consistent store does not require concurrent writes to be globally ordered, this event sequence is correct. Note, however, that it reflects a situation that would not be acceptable for a sequentially consistent store.
    1
    2
    3
    4
    P1: W(x)a                     
    P2: W(x)b
    P3: R(x)b R(x)a
    P4: R(x)a R(x)b
    Eventual consistency: given a sufficiently long period of time over which no changes are sent, all updates are expected to propagate, and eventually all replicas will be consistent.

    Replica Management

  • A Greedy Heuristic to Find Locations (minimum k-median problem)
    1. Find the total cost of accessing each site from all the other sites. Choose the site with the minimum total cost.
    2. Repeat (1) above, taking also into account sites hosting replicas (i.e , recalculate costs).

Security in Distributed Systems

  • Security in distributed systems can roughly be divided into two parts:
    1. Communication between users or processes, possibly residing 位于 on different machines.
      • Use secure channel to deal.
    2. Authorization, to ensure that a process gets only those access rights to the resources in a distributed system it is entitled 有资格 to.
      • Access control mechanisms

        Relationship between Security and Dependability

  • Security in a computer system is associated with the notion of dependability, because a dependable system is one that is trusted to deliver its services.
  • Recall that Dependability is about availability, reliability, safety, and maintainability
    • Which indirectly suggest TRUST (i.e., also include confidentiality and integrity 保密性和完整性).
  • Confidentiality refers to the property of a computer system whereby its information is disclosed 披露 only to authorized parties.
  • Integrity is the characteristic that alterations 更改 to a system’s assets can be made only in an authorized way.

    Types of Security Threats

  • Security in computer systems involves mechanisms to protect system services and data against security threats.
  • Four types of security threats:
    • Interception: 拦截 an unauthorized party has gained access to a service or data.
      • Example: When communication between two parties has been overheard by someone else
    • Interruption: services or data become unavailable, unusable, destroyed, and so on.
      • Examples: denial of service attacks by which someone maliciously attempts to make a service inaccessible to other parties; when a file is corrupted 损坏 or lost, etc.
    • Modification: unauthorized changing of data or tampering 篡改 with a service
      • Examples: intercepting and subsequently changing transmitted data, tampering with database entries, etc.
    • Fabrication 制造: additional data or activity are generated that would normally not exist.
      • Examples: an intruder 入侵者 may attempt to add an entry into a password file or database; breaking into a system by replaying previously sent messages.

        Security Mechanisms

  • A description of actions that are allowed to be taken and actions that are prohibited to guarantee system security is what we call Security Policy, which impact users, services, data, machines, etc.
  • Based on a system’s Security Policy, mechanisms can be put in place to enforce 执行 this policy:
    • Encryption 加密
      • Encryption transforms data into something an attacker cannot understand.
      • As such, encryption is not only able to implement data confidentiality, but it also supports integrity checks, because it allows checking of whether data has been modified.
        • Two types of encryption: symmetric and asymmetric encryption.
        • In symmetric encryption the same secret value (the key) is used for encryption and decryption whereas different keys are used in asymmetric encryption.
    • Authentication 验证
      • Authentication is used to verify the claimed identity of a user, client, server, host, or other entity.
      • Authentication is based on the possession 所有权 of some secret information, like password, known only to the entities participating in the authentication.
      • When an entity wants to authenticate another entity, the former will verify if the latter possesses the knowledge of the secret
      • Authentication can be one-way or mutual.
        • In one-way authentication, only one entity verifies the identity of the other entity.
        • In mutual authentication, both communicating entities verify each other’s identity.
    • Authorization
      • Authorization checks whether a client is authorized to perform the action requested.
        • Example: For accessing records in a medical database, such that depending on who accesses the database, permission may be granted to read records, to modify certain fields in a record.
    • Auditing 审计
      • Auditing tools are used to trace which clients accessed what, and in which way.
      • Audit logs can be extremely useful for the analysis of a security breach 安全漏洞, and subsequently taking measures against intruders.

        Cryptography 密码学

  • Cryptography is a fundamental security technique in distributed systems
  • Consider a sender S wanting to transmit message m to a receiver R. To protect the message against security threats, the sender
    1. encrypts it into an unintelligible message m’,
    2. sends m’ to R
    3. R, in turn, must decrypt the received message into its original form m.
  • The original form of the message that is sent is called the plaintext, shown as P; the encrypted form is referred to as the ciphertext, illustrated as C.

    Security in Distributed System-Recap

  • Two main issues that need to be addressed:
  1. Making the communication between clients and servers secure.
    • This may involve the following: authentication of the communicating parties as well as integrity and confidentiality of messages.
  2. Controlling access to resources.
    • Once a server has accepted a request from a client, how can it find out whether that client is authorized to have that request carried out?

      Secure channels

      “The issue of protecting communication between clients and servers, can be thought of in terms of setting up a secure channel between communicating parties.”

  • A secure channel protects senders and receivers against interception, modification, and fabrication of messages.
  • Protecting messages against interception is done by ensuring confidentiality: the secure channel ensures that its messages cannot be eavesdropped 窃听 by intruders.
  • Protecting against modification and fabrication by intruders is done through protocols for mutual 相互的 authentication and message integrity.

    Example of Autentication Protocol

  • Consider that Alice and Bob want to communicate, and that Alice takes the initiative 倡议 in setting up a channel.
  • Alice starts by sending a message to Bob, or otherwise to a trusted third party who will help set up the channel.
  • Once the channel has been set up, Alice knows for sure that she is talking to Bob, and Bob knows for sure he is talking to Alice, they can exchange messages.

    Secure channels: Message integrity and confidentiality

  • Besides authentication, a secure channel should also provide guarantees for message integrity and confidentiality
  • Confidentiality is established by simply encrypting a message before sending it.
  • Message integrity can be obtained via the use of digital signatures.

    Message integrity via Digital Signature

  • Consider the situation in which Bob has just sold Alice a collector’s item of some vinyl record 黑胶唱片 for $500. The whole deal was done through e-mail.
  • Besides authentication, at least two concerns that need to be addressed:
    1. Alice needs to be assured that Bob will not maliciously change the $500 and claim she promised more than $500.
    2. Bob needs to be assured that Alice cannot deny ever having sent the message.
  • Possible solution: Alice digitally signs the message, uniquely binding her signature to its content.
    • The unique association between a message and its signature prevents illegitimate modifications and backing out from the agreement.
  • One common form of digital signature is to use a public-key cryptosystem.
  • Subject -(Request for operation)-> Reference monitor -(Authorized request)-> Object
  • Controlling the access to an object means protecting it against requests
    generated by unauthorized subjects
  • Protection is often enforced by a program called a reference monitor
    • A reference monitor records which subject may do what, and decides whether a subject is allowed to have a specific operation carried out.
  • Reference monitor should be impenetrable 不可穿透的 by its very nature

Service-Oriented Architecture

  • This architectural style encapsulates services into independent units.
  • A service is considered as a discrete unit of functionality that can be accessed remotely, via a network, and updated independently. However, a service can possibly make use of other services.
    • Example: a procedure to retrieve a credit card statement online.
  • Distributed systems or distributed applications constructed using this architectural style are said to be have a Service-Oriented Architecture (SOA)
  • A distributed system constructed as SOA is a essentially a composition of many different services.

    Example of a SOA Application

  • Consider a distributed system application composed of several services for processing e-book orders from a Web Shop.
    • Note that not all services composing a distributed system application may belong to the same administrative organization.
    • On-line order processing:
      • Organisation A: select items -> check delivery channel ->
      • Organisation B: process payment ->
      • Organisation A: Finish the payment and do other things

        Service Composition

  • One of the main challenges of developing a distributed system is service composition and of making sure that the services operate in harmony. 和谐
  • For service composition to be possible, each service must offer an interface (including the allowed input and output messages).
  • Service composition is a far from trivial problem. 不是一个微不足道的问题

Quiz

Week 1

  • Distributed systems example are:
    • The Lloyds Internet banking system.
    • The Netflix home entertainment system.
    • The Galileo global navigation satellite system.
  • Distributed systems can be used to connect potential customers to retailers.
  • Distributed systems can be used to speed up the execution of a complex computing task.
  • Distributed systems can be used to allow multiple book authors to edit the same manuscript at the same time.
  • Distributed systems increase system reliability levels, leading to decreases in system down time.
  • Any two nodes in a distributed system will not show the same execution time, even when executing the exact same task.

    Week 2

  • In a distributed system, data sharing between processes running on distinct nodes is a major challenge due to the fact that processes do not share a common storage space.
  • Data synchronisation is about data consistency across multiple copies of the same dataset, each located on a different node of a distributed system.
  • Process synchronisation is a complex task because its success depends on the reliable communication between processes running on distinct nodes.
  • Synchronisation in a client-server scenario where the client is a Web browser application and the server is serves Web pages to its clients means that both client and server need to be active at the same time.
  • Synchronisation in a (client-server) email exchange scenario does not require components to be active at the same time, as a chain of servers between the sender and receiver can arrange for messages have to ‘live’ somewhere in between being sent and being received.
  • The use of mutual exclusion locks, provided by a central lock server, represents one of the existing means by which two processes can share exclusive access to a single resource.
  • Lamport’s Logical Clock solution to event ordering is based on the notion that a message only arrives at one node after being sent by another node.
  • A synchronisation solution based on mutual exclusion locks fails to guarantee recovery from failures in the event sequences being synchronised, potentially leaving the system in an inconsistent state.
  • The Two Phase Commit Algorithm guarantees recovery from failures in the event sequences being synchronised, which could potentially leave the system in an inconsistent state.
  • The Cristian’s Algorithm relies on the assumption that the time taken for a message to be sent from one process to a time server is the same as the time taken for a message from the server to be returned to the process. Unfortunately, this assumption may not always hold.

    Week 3

  • In distributed systems, naming resolution refers to the way in which an entity can be accessed by a process using its unique name.
  • In distributed systems, an addressing mechanism should guarantee that each entity in the system is given a unique name.
  • The ‘Free-for-all’ addressing mechanism represents the most scalable approach.
  • With the IP address of a device connected to a network, it is possible to obtain location information about that device, but not its exact location.
  • A device’s Domain Name can be mapped into the device’s IP address.
  • In distributed systems, objects typically follow protocols to communicate with one another.
  • HTTP is a content exchange protocol associated with client and server processes communicating over the Internet.
  • The process of navigating from one Web page to another using a Web Browser like Firefox is a client-server scenario where HTTP is the main protocol.
  • HTTP and SMTP are both content exchange protocols.
  • In SMTP, a series of client-server interactions can take place before the connection between the client and the server are explicitly terminated.
  • SMTP is an old and outdated email protocol.

    Week 4

  • A single computer program may be associated with multiple processes running in a single machine.
  • Access to computing and storage resources by a process is controlled by the Operating System.
  • When a process is granted an exclusive address space, it is possible to obtain a sequential reading of the steps that comprise the process.
  • Concurrent processing can occur in parallel.
  • One characteristic of multi-processing is the ability to switch between processes, transferring resources from one to another.
  • The benefits of multi-processing become more obvious in situations where a process may be idle waiting for a resource to be released, while another may be ready to start.
  • The Operating System can perform resource time-slicing, to enable multiple processes to be active at the same time.
  • Forking is an expensive operation because if involves the copying of an address space.
  • When forking is used in multi-processing, consistency of processing results is more likely to occur than in cases where multi-processing is achieved by threading
  • In parallel computing, multiple processors may be used in the execution of a single program.
  • In distributed computing, multiple machines may be used in the execution of multiple programs.

    Week 5

  • In a centralised software architecture, most of the distributed system functionality is implemented on the server, rather than distributed across multiple clients.
  • In a decentralised software architecture, the system functionality is distributed across nodes in a more balanced manner.
  • In a hybrid software architecture, functionality distribution is more complex, combining some level of centralisation and decentralisation across nodes for subsets of the functionality.
  • The TCP/IP reference model for network communication is an example of layered software architectural style.
  • In the object-based architectural style, calls made by one system entity to another can be realised as message passing via the network, in case the entities reside in distinct nodes, or as a local function call, in case the entities reside on the same node.
  • In the resource-oriented architectural style, reuse of resources for different purposes can be made possible, if appropriate interfaces for each resource is provided.
  • A Cloud environment offering data services is an example of a distributed system that primarily follows the resource-oriented architectural style.
  • An event-based distributed system is one that has its various components running simultaneously on different nodes, and use events as main vehicle to organise component intercommunication.
  • Multi-tiered client-server system is a result of a logical organisation of application functionality into three parts.
  • Multi-tiered client-server systems support a vertical distribution of the system functionality, while peer-to-peer systems support a horizontal distribution of the system functionality.
  • Peer-to-peer distributed systems are characterised by overlay networks determining communication channels.
  • In an unstructured peer-to-peer distributed system, despite the presence of an overlay network, searching for data is still necessary.

Week 6

  • Remote Procedure Call (RPC) is a communication model that, to some extent, successfully conceals the complexities of low-level message passing.
  • The Remote Procedure Call (RPC) model promotes communication transparency and, thus, is widely used in client-server scenarios.
  • Without client and server stub mechanisms, communication transparency in Remote Procedure Calls could not be achieved.
  • Parameter marshalling is a complex data transformation process that makes it possible the transfer of values of input parameters to RPCs via the network.
  • Asynchronous RPCs can be used when no RPC result is expected from the server process.

Week 7

  • Data replication in distributed systems is a commonly used mechanism to improve system scalability through increases in performance and responsiveness.
  • Data replication in distributed systems can potentially damage users’ trust in the quality of the data obtained through interaction with the system.
  • Data replication in distributed systems can potentially damage users’ expectations for fast responses from the system.
  • When used to enhance system reliability, data replication is associated with mechanisms for recovery from system failures.
  • Consistency of data replicas distributed across the network is typically associated with increases in the consumption of network bandwidth.
  • The cost of data replication does not solely depend on the frequency with which propagation of updates made on individual replicas is carried out.
  • There usually is a trade-off between data access time and data freshness in distributed systems.
  • Tight consistency is difficult to be achieved, due to the need for atomicity of update propagation operations over multiple data replicas.
  • In Causal Consistency, events that are causally related need to follow the Sequential Consistency Model.
  • Considerations about data access and update patterns can be made when deciding the extent to which tight consistency can be relaxed.

    Week 10

  • Distributed systems that remain operational when a failure occurs are said to be fault tolerant.
  • Distributed systems that remain operational when a failure occurs are said to present dependability requirements.
  • The level of safety of a distributed system can be measured by severity of the consequences of any type of system failure.
  • A system is said to have failed when it cannot deliver the services it is supposed to.
  • A system is said to have failed when it can deliver the services it is supposed to, if the services do not meet the quality requirements as promised.
  • The ability to control system faults or failure is an important property of fault tolerant systems.