Developers Notebook-Huffman
From WxWiki
Contents |
[edit] Huffman
Part of Developers_Notebook-Huffman-Compression Methods
Sometimes called "entropy coding".
bottom-down
optimal when all symbol probabilities are powers of 1/2.
[edit] Static Huffman
General Algorithm (sort of taken from the comp.compression FAQ) -
- Rank all symbols (bytes) in order of probability of occurance.
- Successively combine the two synbols of the lowest probability to form a new composite symbol';' eventually we will build a binary tree where each node is the probability of all nodes beneath it.
- Trace a path to each leaf, noticing the direction at each node.
[edit] Canonocal Static Huffman
Static huffman that follows two rules -
- Codes with shorter lengths have higher values than codes with longer lengths
- Codes with the same length increase the code value as the symbol value (in the alphabet) increases
Psuedo-code - <html><code>
//example - #codes[16]={0,2,2,3,2,0,0,0,0,0,0,0}
//Compute the first code for the maximum length, it is MAX_LENGTH bits long and it's value is all 0's
Count = 5; //max code index + 1 (max code)
Code = 0;
Start_code[Count] = Code;
//Set last_#_codes to the number of codes that the maximum length has
last_#_codes = #codes[Count];
--Count;
for (; Count > 0; --Count) //down to 0 but not 0
{
Code += last_#_codes;
Code >>= 1;
Start_code[Count] = code;
last_#_codes = #codes[Count]; //number codes in current length
}
</code></html>
[edit] Dynamic/Adaptive Huffman
Rather than making two runs of the file for character count and encoding, Dynamic Huffman readjusts the tree while scanning the file, so there's no need to run through a file twice.
There are two algorithms for Dynamic Huffman - FGK & V. V is FGK with more strictly enforced rules, and is preferred over FGK.
[edit] Decoding
<code><pre> if (Direction exists in tree) {
if (Position == 0-node)
{
AddToTree();
}
UpdateModel();
} else {
OutputCode(); IncrementFrequency(); UpdateModel();
} </pre></code>
[edit] Encoding
<code><pre>
</pre></code>
[edit] FGK
<code><pre> UpdateModel() {
IncrementFrequencyOfAllParents(); //...
} </pre></code>
[edit] V
<code><pre>
</pre></code>
[edit] Implementations
[edit] Static
- Part of Karl Malbrain's BWT Implementation is a good tiny version of one.
- Local copy at Karl Marlbrian BWT.
[edit] Dynamic
- Karl Malbrain has has a c version of the Vitter model.
[edit] Resources
[edit] Static
- Arturo Campos has an ok tutorial.
- Canonical algorithm.
- Somewhat poor english - difficult to understand.
- Canonical algorithm.
[edit] Dynamic
- A rather definitive Resource.
- Prof Jonathan Low has a great tutorial also
