Dienstag, 13. September 2022

How Much Information Has One Single Letter In An N-Gram?

Information content of single letter

Shannons Information-Theory is very concise when it comes about calculating the information content of a single letter a..z:

Source: https://de.wikipedia.org/wiki/Informationsgehalt 

So, when we know the appearance probability for a single letter, e.g. 'e', we can calculate its information content:
Source: https://de.wikipedia.org/wiki/Buchstabenhäufigkeit

So for example the delivered information of letter 'e' at the single appearance within the word 'household' can be calculated via:

But this information value does not respect the context of 'e' 's appearance. So if we regard another appearance of 'e' in the word 'enchiladas', the information content for 'e' would be the same 2.52 bit. 

We can conclude: The information content calculation is based on the median appearance probability of the single isolated letter 'e' over a huge german text corpus (~17,4%).