In “Improving the Safety, Scalability, and Efficiency of Network Function State Transfers,” Gember and Akella describe ways to overcome the shortfalls in their OpenNF system. OpenNF enables the moving network-functions (NFs; Firewalls, Intrusion Detection Systems, etc…) from one physical piece of equipment to another one within a network. This is a problem that industry professionals are concerned with when equipment needs to be replaced or moved to accommodate more bandwidth.
Problems with OpenNF
There are two important problems with their previously described system in “OpenNF: Enabling Innovation in Network Function Control.” The problems are Buffer Overflow and Latency. Both stem from the fact that Open NF depends on a controller which is responsible for buffering packets that need to be processed at the new location. If these packets aren’t processed at the new location then important state information about the system could be lost. The consequence of such a loss will be a less secure system which an adversary could likely exploit for sinister purposes.
Suppose we have three entities, A, B, and C for the old location, new location, and controller respectively. The controller has a limited buffer space to store incoming packets from the old location, A. Previously packets were being processed at A, but while changing the architecture of our system the packets A receives are forwarded to B and not processed. If C’s buffer runs out of space before B is ready to take over for A then any packets that A sends to C are lost.
To overcome this fault the authors offer a stronger solution. Simply process packets at A while changing the architecture, and forward the packets to C only if they will affect the state of A and B. If C’s buffer becomes full then B is updated with a snap-shot of A’s state.
The authors prove that sending a snapshot of A to B ensures no packets are lost in the face of buffer overflow, but they don’t include details on the cost replicating A’s state to B. If such an action was cost free then state could be continuously replicated at B and there would be no need for a packet buffer at all.
Latency is a measure of the time that it takes for a message to be received. Ensuring that updates from A to B are loss-free is costly when considering this metric. In the original description of OpenNF a message couldn’t be received until it was released from the buffer at C. The authors solution to the buffer overflow problem also helps mitigate the latency problem, as any packet that is buffered at C is first processed at A. However, the time to complete the migration from A to B is still a problem.
To reduce the migration time the authors propose a peer-to-peer migration in which packets are forwarded directly from A to B rather than passing through a controller, C. With a peer-to-peer transfer the latency for time to complete a migration stays relatively constant with respect to the number of flows moved from A to B whereas with a controller-directed transfer the time to complete the migration increased linearly.
The analysis of the buffer-overflow solution should be looked into more closely. Their proof guarantees loss free transfer via re-exporting state for some flow from A to B if an overflow occurs. Will this still work if there is so much state to transfer that the overflow occurs multiple times in quick succession? Is there a point at which the the cost of copying of the state from A to B too great to honor that loss-free guarantee?
The authors seem to have provided non-overlapping solutions to their problems. Perhaps they intend for an approach to be chosen based on case-specific needs. For instance if controller constraints are not a problem then the safe buffer approach is enticing, but if you need to scale the system to accommodate many controllers then the P2P solution looks more helpful. The authors did not mention any trade-offs between these two approaches in their conclusion.
Their papers together offer a great deal of insight into the problem of state transfer in a networked environment. Their solutions are novel and they have considered a great amount of previous work. I think that although they considered their solutions as a way to enable state transfer of network functions their methods might find relative applications in other areas of distributed systems. Their papers are highly readable, and they deserve a lot of credit for that as well.