Understanding Red-Black Trees

A Red-Black(RB) Tree is a self-balancing binary search tree. It maintains balance with the help of a few properties:

1. Every node is either red or black.

Every node is assigned a color.

2. The root and the leaves(nil’s) are all black.

Given a RB Tree, extend the existing leaves by adding two black children(which don’t have any value and hence called nil’s) for each of them. Root must always be black.

3. Every red node has a black parent.

Imposes a limit on the number of red nodes we can have in the tree. This is helpful because we will be working mostly with Red nodes.

4. All simple paths from a node x to some descendant leaf have the same number of black nodes = BlackHeight(x).

This and the previous property play a key role in the height balancing of the tree.

Example –

Height – 

For n keys, the height h satisfies the relation:

h <= 2 * (log n + 1)

Proof –

I am trying to give an outline here. Merge all the red nodes back to their black parents. Now, we get what is called a 2,3,4 – tree where each node has 2,3 or 4 children. All leaves in this tree have uniform depth(remember BlackHeight? Property 4).

Let’s calculate the height h’ of this tree. We know that h’>= h/2 since at most half the nodes on any path are red. The number of leaves of a binary tree with n internal nodes = n + 1. This leads to:

2**h’  <= n + 1 -> h’  <= log(n + 1) [Taking log on both sides] -> h/2 <= log(n + 1) -> h <= 2*log(n+1) ~ O(log n)

Rotations

A picture is worth more than a thousand words :). The arrow from left-to-right shows the right rotation and the reverse denotes the left.

Left Rotation


void leftrotate(tree *t, node *x)

{

     node *y;

     node *nil = t->nil;

     y = x->right;

     x->right = y->left;

     if(y->left != nil)

              y->left->parent = x;

     y->parent = x->parent;

     if(x == x->parent->left)

              x->parent->left = y;

     else 
              x->parent->right = y;

     y->left = x;

     x->parent = y;

     return ;

}

Right Rotation


void rightrotate(tree *t, node *x)

{

     node *y;

     node *nil = t->nil;

     x = y->left;

     y->left = x->right;

     if(x->right != nil)

                x->right->parent = y;

     x->parent = y->parent;

     if(y == y->parent->left)

                y->parent->left = x;

     else 
                y->parent->right = y;

     x->right = y;

     y->parent = x;

     return ;

}

Insertion

Now that we learnt rotations, we will see how they are used in Insertion.

Given a node, it is colored red and inserted like it normally is inserted into a Binary Search Tree. Its easy to see that only property 3 will be violated. In order to balance this violation, we apply 3 cases and their reverse(interchanging left and right) as well.

color = 1 denotes red and color = 0 denotes black.

Code for insertion – http://pastebin.com/HQJn9QcB

Deletion

Deletion is similar to insertion but is slightly more complicated. CLRS is a good source for looking up the Delete and Delete Fix-up functions.

Applications

C++ STL containers map is implemented using Red-Black Trees.

Linux Kernel’s completely fair scheduler makes use of Red-Black Trees.

Problems

http://www.spoj.com/problems/ORDERSET/ – This problem can be solved by augmenting red black trees to store the information of how many times each value has occurred. Insertion and Deletion functions are the same. Kth smallest can be found by normal BST search with the count information.

Red-Black Trees can also solve many problems which can be solved using Heaps(the ones related to order statistics).

Inspiration

The inspiration for this post came after watching Erik Demaine’s lecture on Red-Black Trees – http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-10-red-black-trees-rotations-insertions-deletions/. What a lecture !

 

Advertisements

Secure Communication over Insecure Channels – Merkle’s Puzzles

Let’s say that Alice(X) and Bob(Y) want to share some secrets between them. They are looking for a secure way to do this. It is obvious that there exists Eve(Z), who is going to eavesdrop on their conversation,  but they wish to establish a secure communication in spite of that. The normal and key channels are completely exposed to Z, which implies that she will come to know of the secure key at some point of time.

X and Y decide to relax their restriction a bit. They say that Z should put polynomially more effort than X and Y to figure out their secure key.

Is it possible ? 

Merkle proved that it is and “Merkle’s Puzzles” was born.

How?

X and Y agree upon a number N beforehand. Y generates N puzzles* and assigns an unique identifier to each of them randomly.  Y sends these puzzles to X. X chooses one of the puzzles randomly and solves it. X sends back the unique identifier of the puzzle she solved to Y. Y knows which puzzle X has solved with the help of the unique identifier. X and Y have in fact established a secure communication.

What will Z do now ?

Z has access to two things – N puzzles and the identifier. She can’t do anything with the identifier as she doesn’t know which puzzle is assigned to which identifier. Hence, she decides to solve all the puzzles. If each puzzle takes O(N) time (equally hard to solve), then the total time taken by Z is O(N * N) [as there are N puzzles]. Hence, Z’s effort is quadratic compared to X and Y’s linear efforts.

Why is it of interest ? 

Merkle came up with this idea when he was an undergraduate student in 1974. This technique shows signs of public-key cryptosystem even before it became a terminology. Very recently, [1] showed that any random key exchange protocol in which the honest users make n queries to the random oracle, can be broken by an adversary. making O(n^2) queries to the oracle. I am planning to drill into [1] in the next post.

*puzzle – In this context, puzzle is a cryptogram which is designed to be broken.

[1]”Merkle Puzzles are Optimal — an O(n^2)-query attack on key exchange from a random oracle”(http://www.boazbarak.org/Papers/merkle.pdf)

Tackling an open problem – Bounding number of pairwise distinctive sets for a Regular Language

From the Myhill-Nerode Theorem(sai16vicky.wordpress.com/2014/09/24/myhill-nerode-theorem-a-beautiful-alternative-to-pumping-lemma), its can be concluded that Regular Languages have finite pairwise distinctive(PD) sets. This leads to a very interesting problem – Given a Regular Language in the form of a Regular Expression, is it possible to come up with bounds on the number of PD sets ?

Let’s say we are given a Regular Expression R of length K. We convert R to its lowest terms (R’). K now becomes length(R’).

Example

If R = (0* + 1*).(2* + 3*), we expand it to make R’= 0*2* + 0*3* + 1*2* + 1*3*.  K = length(R’) = 8.

Intuition – 

The immediate idea would be to visualize R’ as a string S with length K(considering only the alphabets allowed in the language). If we generate all subsets of this string and match every pair of them, then we will be getting some PD sets(say Val). Can we conclude that Val is the total number of PD sets? It looks tempting to say so,  but we need a solid proof.

Whenever there is something to be proved and we don’t know how to go about it,  Induction comes in handy :).

Proof – 

For a string of length 1, we get a 2*2 matrix. The number of PD sets can at most be 1 and hence, this matrix is sufficient to conclude.

For a string of length k, let’s assume that there exists a (2**k)*(2**k) matrix for which this property is satisfied.

For a string of length k + 1,

There already exists a string of length k which satisfies this property. We are adding an additional element c’ now. This element is added to the string,  and has to be added to every row and column of the subset matrix. Hence, we need an additional row and additional column into the matrix which makes it a ( 2**k’)*(2**k’) matrix, where k’ = k + 1.

We ended up proving our intuition correctly. Once we fix the dimensions of the matrix, we can also conclude that every PD set will be of length at most 2*K, where K is the length of the Regex in its lowest terms.

Moving forward

Though we got a solution by reducing the Regex to its lowest terms,  it is not very efficient. Is it possible to solve this problem without this reduction? Any Suggestions ?

Warning – 

The bounds we found are invalid if the Regex is not reduced to its lowest terms. Example – (0* + 1*).(2* + 3*).(4* + 5*). Here if K = 6,  a 6*6 matrix does not cover all the subsets of this string.