illustrates how the introduction of noise in a channel can be compensated with the introduction of redundancy, by sending multiple copies of every bit.

2020-03-20 - Talk 6 - Saurabh B
illustrates how the introduction of noise in a channel can be compensated with the introduction of redundancy, by sending multiple copies of every bit.
We looked at various data compression techniques
Mostly considered random input vectors coming from simple probability distribution
We looked at various data compression techniques
Mostly considered random input vectors coming from simple probability distribution
Mostly each component was independent of others
we need to know how to work with models that include dependences
Betters compression for real world data which often has interesting co-relation
Noisy channel with input x and output y can be looked at as joint ensembles where x and y are dependent
we need to know how to work with models that include dependences
if they were independent, it would be impossible to communicate over the channel
Betters compression for real world data which often has interesting co-relation
Noisy channel with input x and output y can be looked at as joint ensembles where x and y are dependent
Information content in context of noisy channels
we need to know how to work with models that include dependences
if they were independent, it would be impossible to communicate over the channel

H(X)=∑x∈AxP(x)log21P(x)
H(X)=∑x∈AxP(x)log21P(x)
H(X,Y)=∑xy∈AxAyP(x,y)log21P(x,y)
H(X)=∑x∈AxP(x)log21P(x)
H(X,Y)=∑xy∈AxAyP(x,y)log21P(x,y)
H(X,Y)=H(X)+H(Y)⟺P(x,y)=P(x)P(y)
H(X|y=bk)=∑x∈AxP(x|y=bk)log21P(x|y=bk)
H(X|y=bk)=∑x∈AxP(x|y=bk)log21P(x|y=bk)
H(X|Y)=∑y∈AyP(y)[∑xP(x|y)log21P(x|y)]
H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)
H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)
Proof-
H(X,Y)=∑xy∈AxAyP(x,y)log21P(x,y)
H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)
Proof-
H(X,Y)=∑xy∈AxAyP(x,y)log21P(x,y)
H(X,Y)=∑xy∈AxAyP(x,y)log21P(x|y)P(y)
H(X,Y)=−∑xy∈AxAyP(x,y)log2P(x|y)P(y)
H(X,Y)=−∑xy∈AxAyP(x,y)log2P(x|y)P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑x∑yP(x,y)log2P(y)
H(X,Y)=−∑xy∈AxAyP(x,y)log2P(x|y)P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑x∑yP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑y∑xP(x,y)log2P(y)
H(X,Y)=−∑xy∈AxAyP(x,y)log2P(x|y)P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑x∑yP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑y∑xP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑yP(y)log2P(y)
H(X,Y)=−∑xy∈AxAyP(x,y)log2P(x|y)P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑x∑yP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑y∑xP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑yP(y)log2P(y)
H(X,Y)=H(X|Y)+H(Y) QED
H(X,Y)=−∑xy∈AxAyP(x,y)log2P(x|y)P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑x∑yP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑y∑xP(x,y)log2P(y)
H(X,Y)=−∑x∑yP(x,y)log2P(x|y)−∑yP(y)log2P(y)
H(X,Y)=H(X|Y)+H(Y) QED
Decrease of the uncertainty about an outcome of a random variable given an outcome of another random variable
Mutual information measures information X contains about Y or vice versa
Relative entropy or KL divergence between P(x) and Q(x)
DKL(P∥Q)=∑xP(x)logP(x)Q(x)
Decrease of the uncertainty about an outcome of a random variable given an outcome of another random variable
Mutual information measures information X contains about Y or vice versa
Relative entropy or KL divergence between P(x) and Q(x)
DKL(P∥Q)=∑xP(x)logP(x)Q(x)
Decrease of the uncertainty about an outcome of a random variable given an outcome of another random variable
Mutual information measures information X contains about Y or vice versa
Relative entropy or KL divergence between P(x) and Q(x)
DKL(P∥Q)=∑xP(x)logP(x)Q(x)
I(X;Y)=DKL(P(x,y)∥P(x)P(y))
Decrease of the uncertainty about an outcome of a random variable given an outcome of another random variable
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x,y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x,y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x|y)P(y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x,y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x|y)P(y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x|y)P(x)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x,y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x|y)P(y)P(x)P(y)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x|y)P(x)
I(X;Y)=∑xy∈AxAyP(x,y)log2P(x|y)P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x,y)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x,y)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x,y)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x)log2P(x)
I(X;Y)=−∑P(x)log2P(x)+∑P(x,y)log2P(x|y)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x,y)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x)log2P(x)
I(X;Y)=−∑P(x)log2P(x)+∑P(x,y)log2P(x|y)
I(X;Y)=∑P(x)log21P(x)−∑P(x,y)log21P(x|y)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x,y)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x)log2P(x)
I(X;Y)=−∑P(x)log2P(x)+∑P(x,y)log2P(x|y)
I(X;Y)=∑P(x)log21P(x)−∑P(x,y)log21P(x|y)
I(X;Y)=H(X)−H(X|Y)
QED
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x,y)log2P(x)
I(X;Y)=∑P(x,y)log2P(x|y)−∑P(x)log2P(x)
I(X;Y)=−∑P(x)log2P(x)+∑P(x,y)log2P(x|y)
I(X;Y)=∑P(x)log21P(x)−∑P(x,y)log21P(x|y)
I(X;Y)=H(X)−H(X|Y)
QED
I(X;Y)=I(Y;X)
information is symmetric

A discrete memoryless channel Q is characterized Ax and Ay and P(y|x)
Q defines conditional prob. P(y|x), if we choose an input dist. P(X) we then have a joint dist.
P(x,y)=P(x|y)p(y)
A discrete memoryless channel Q is characterized Ax and Ay and P(y|x)
Q defines conditional prob. P(y|x), if we choose an input dist. P(X) we then have a joint dist.
P(x,y)=P(x|y)p(y)
P(x|y)=P(y|x)P(x)P(y)
P(x=1|y=0)=P(y=0|x=1)P(x=1)P(y=0)
P(x=1|y=0)=P(y=0|x=1)P(x=1)P(y=0)
P(x=1|y=0)=0.15×0.10.9×0.85+0.1×0.15
P(x=1|y=0)=P(y=0|x=1)P(x=1)P(y=0)
P(x=1|y=0)=0.15×0.10.9×0.85+0.1×0.15
P(x=1|y=0)=0.015×0.10.78=0.019
P(x=1|y=0)=P(y=0|x=1)P(x=1)P(y=0)
P(x=1|y=0)=0.15×0.10.9×0.85+0.1×0.15
P(x=1|y=0)=0.015×0.10.78=0.019
P(x=1|y=0)=P(y=0|x=1)P(x=1)P(y=0)
P(x=1|y=0)=0.15×0.10.9×0.85+0.1×0.15
P(x=1|y=0)=0.015×0.10.78=0.019
I(X;Y)=H(Y)−H(Y|X)
P(x=1|y=0)=P(y=0|x=1)P(x=1)P(y=0)
P(x=1|y=0)=0.15×0.10.9×0.85+0.1×0.15
P(x=1|y=0)=0.015×0.10.78=0.019
I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H2(0.22)−H2(0.15)=0.76−0.61=0.15bits.
H2(p)=plog1p+(1−p)log1(1−p)
C(Q)=maxPxI(X;Y)
There may be multiple optimal input distributions achieving the same value of I(X;Y)
C(Q)=maxPxI(X;Y)
The distribution Px that achieves the maximum is called the optimal input distribution, denoted by P∗x
Eventually we will learn that the C(Q) does indeed measure the maximum amount of error-free information that can be transmitted over the channel per unit time.
There may be multiple optimal input distributions achieving the same value of I(X;Y)
C(Q)=maxPxI(X;Y)
The distribution Px that achieves the maximum is called the optimal input distribution, denoted by P∗x
Eventually we will learn that the C(Q) does indeed measure the maximum amount of error-free information that can be transmitted over the channel per unit time.
If we look at BSC channel again with f=0.15,p0=0.9,p1=0.1, mutual information was:
I(X;Y)=H2(0.22)−H2(0.15)=0.76−0.61=0.15bits.
There may be multiple optimal input distributions achieving the same value of I(X;Y)
C(Q)=maxPxI(X;Y)
The distribution Px that achieves the maximum is called the optimal input distribution, denoted by P∗x
Eventually we will learn that the C(Q) does indeed measure the maximum amount of error-free information that can be transmitted over the channel per unit time.
If we look at BSC channel again with f=0.15,p0=0.9,p1=0.1, mutual information was:
I(X;Y)=H2(0.22)−H2(0.15)=0.76−0.61=0.15bits.
C(QBSC)=H2(0.5)−H2(0.15)=1−0.61=0.39bits.
There may be multiple optimal input distributions achieving the same value of I(X;Y)
I(X;Y)=H(X)−H(X|Y)
Note that we are writing in terms of Y and not X, I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H(X)−H(X|Y)
We can again use symmetry argument at use 0.5, 0.5 distribution, so H(X)=H2(0.5)
As for H(X|Y), we are uncertain about x given y only when y is ?
Above as probability f/2+f/2, so H(X|Y)=fH2(0.5)
So C=I(X;Y)=H2(0.5)−fH2(0.5)=1−f
Note that we are writing in terms of Y and not X, I(X;Y)=H(Y)−H(Y|X)
This can also be done without assuming symmetry first
Consider prob distribution p0,p1
This can also be done without assuming symmetry first
Consider prob distribution p0,p1
With prob of y=? as p0f+p1f=f, H(X|Y)=fH2(p1)
This can also be done without assuming symmetry first
Consider prob distribution p0,p1
With prob of y=? as p0f+p1f=f, H(X|Y)=fH2(p1)
I(X;Y)=H2(p1)−fH2(p1)=(1−f)H2(p1)
This can also be done without assuming symmetry first
Consider prob distribution p0,p1
With prob of y=? as p0f+p1f=f, H(X|Y)=fH2(p1)
I(X;Y)=H2(p1)−fH2(p1)=(1−f)H2(p1)
I(X;Y) achieves maximum when p1=0.5
0→0↗pe1→1
P(y=1)=p1(1−f)
P(y=1)=p1(1−f)
I(X;Y)=H(Y)−H(Y|X)
P(y=1)=p1(1−f)
I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H2(p1(1−f))−(p0H2(0)+p1H2(f))
P(y=1)=p1(1−f)
I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H2(p1(1−f))−(p0H2(0)+p1H2(f))
I(X;Y)=H2(p1(1−f))−p1H2(f))
P(y=1)=p1(1−f)
I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H2(p1(1−f))−(p0H2(0)+p1H2(f))
I(X;Y)=H2(p1(1−f))−p1H2(f))
p∗1=1/(1−f)1+2(H2(f)/(1−f))
P(y=1)=p1(1−f)
I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H2(p1(1−f))−(p0H2(0)+p1H2(f))
I(X;Y)=H2(p1(1−f))−p1H2(f))
p∗1=1/(1−f)1+2(H2(f)/(1−f))
P(y=1)=p1(1−f)
I(X;Y)=H(Y)−H(Y|X)
I(X;Y)=H2(p1(1−f))−(p0H2(0)+p1H2(f))
I(X;Y)=H2(p1(1−f))−p1H2(f))
p∗1=1/(1−f)1+2(H2(f)/(1−f))
For f=0.15 we get p∗1=0.445 to maximize mutual information
With this prob dist C(Qz)=0.685
this means can communicate slightly more information by using input symbol 0 more frequently than 1

C=maxPxI(X;Y)
C=H(Y)−H(Y|X)=log(27)−log(3)=log(9)bits
C=maxPxI(X;Y)
C=H(Y)−H(Y|X)=log(27)−log(3)=log(9)bits
C=maxPxI(X;Y)
C=H(Y)−H(Y|X)=log(27)−log(3)=log(9)bits
To make sure we have actually understood it, let's have a quick and easy quiz
What happens if we only have 26 letters i.e A, B, .... Z and excluded the last -?
What prob dist can we use? what's the capacity of this new channel ?


C=H(Y)−H(Y|X)=log(26)−log(2)=log(13)bits
For a channel Q, there is a non-negative number C (channel capacity), with the following property
For any ϵ>0 and R<C, for large enough N, there exists a block code of length N and rate≥R and a decoding algorithm, such that
For a channel Q, there is a non-negative number C (channel capacity), with the following property
For any ϵ>0 and R<C, for large enough N, there exists a block code of length N and rate≥R and a decoding algorithm, such that
the maximal probability of block error is ≤ϵ
For the noisy typewriter, we can define prob distribution such that only third letter is used
These letters form a non-confusable subset of the input alphabet, block code of length N = 1
For the noisy typewriter, we can define prob distribution such that only third letter is used
These letters form a non-confusable subset of the input alphabet, block code of length N = 1
the error-free information rate of this system is log2(9)bits
If we use any channel lot of times (say N) in sequence then that collection behaves much like noisy typewriter channel
What applies to noisy typewriter channel applies to noisy channel in general
If we use any channel lot of times (say N) in sequence then that collection behaves much like noisy typewriter channel
What applies to noisy typewriter channel applies to noisy channel in general
Consider extended BSC channel, each block code of length N and total input space (X) of size 2N
If we use any channel lot of times (say N) in sequence then that collection behaves much like noisy typewriter channel
What applies to noisy typewriter channel applies to noisy channel in general
Consider extended BSC channel, each block code of length N and total input space (X) of size 2N
For BSC output alphabets will be same as input, input can map any number of outputs Y

But for large N, there will be output typical set with collection size 2NH(Y)
Size of each typical set is 2NH(Y|X)
But for large N, there will be output typical set with collection size 2NH(Y)
Size of each typical set is 2NH(Y|X)

We can choose prob dist Px so that we get non confusable typical set for input channel
almost certainly non confusable input sets = 2NH(Y)2NH(Y|X)
We can choose prob dist Px so that we get non confusable typical set for input channel
almost certainly non confusable input sets = 2NH(Y)2NH(Y|X)
2N(H(Y)−H(Y|X))=2NI(X;Y)≤2NC
We can choose prob dist Px so that we get non confusable typical set for input channel
almost certainly non confusable input sets = 2NH(Y)2NH(Y|X)
2N(H(Y)−H(Y|X))=2NI(X;Y)≤2NC
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |