ECE 428/CS 425/CSE 424 (Fall 2005) Distributed Systems Homework 3 Assigned 9/22/05 Due by start of class on 9/29/05 (a) The OM(m) algorithm assumed that each node has a reliable channel to every other host. This question explores whether the algorithm performs correctly if the above assumption is not true. Suppose that we have a network of 4 nodes, A, B, C and D that contain the following bi-directional links: (A,B), (B,C), (C,D), and (D,A). Thus, not all hosts have direct links to each other. For example, if host A wants to send a message to host C, that message must be routed through B or D. If host B or D is faulty, it may modify a forwarded message in arbitrary manner before forwarding, and the modification will not necessarily be detected by the recipient of the message. The message may also be dropped by the intermediate node. Suppose that we use OM(1) algorithm in the above system, with node A being the general. When possible, the messages are routed on a direct link from source to destination; otherwise, it is routed through one intermediate node. Will algorithm OM(1) always achieve agreement with at most 1 Byzantine failure? If you answer yes, explain why. If you answer no, present a situation in which the algorithm fails. (b) Suppose that in a certain system containing 10 nodes, 2 of the 10 nodes make most of the requests for critical section entry, whereas the remaining 8 hosts only rarely request critical section entry. Design an algorithm that will reduce the number of messages required per critical section entry (including exit) as compared to an algorithm that is designed for uniform request rates. Explain why you believe your algorithm will work better under the above circumstances. (c) Consider a variation on the central coordinator algorithm for mutual exclusion. Instead of one coordinator, suppose we have two coordinator, such that the algorithm will continue to operate even if at most one coordinator fails. Suppose that the two coordinators are connected with a channel with maximum delay = D. Other channels are potentially asynchronous. When a process needs to enter critical section, it sends a request to both the coordinators. The process may enter the critical section when it receives a grant message from either of two coordinators. What protocol should the coordinators implement to ensure that mutual exclusion and liveness is guaranteed up to a single coordinator failure? If this condition cannot be met, explain why. Suggested exercise: What if the channel between the coordinators is asynchronous? (d) Problem 12.16 from 4th edition (or 11.16 from 3rd edition). ---------------------------------------------- Suggested exercises (you do not need to turn in solutions to these): * Show that OM(1) with 2 faults can lead to incorrect result for the Byzantine agreement algorithm even if number of hosts is at least 7. * Show that OM(2) with 2 faults can lead to incorrect result for the Byzantine agreement algorithm with less than 7 hosts. * Can the mutual exclusion algorithms we discussed be generalized to K-mutual exclusion (wherein at most K processes may be in the critical section simultaneousl)? * Try other exercises from the textbook.