Accurate entropy measurement

Measurement

We can (in a way) cheat here. A common model for a generic entropy source (repurposed for the Photonic Instrument) is:-

Photonic Instrument wrt generic model of an entropy source.

Photonic Instrument wrt generic model of an entropy source.

A degree of conditioning of the raw data is allowable as long as the total entropy rate of the source is not increased. Not deterministic technique can. But we can reduce it by our magical chop & stack transformation. The conditioning transforms a non deterministic normally distributed as $\mathcal{N}(\mu, \sigma^2)$ length JPEG file to a fixed length. The technique is outlined below:-

Head & tailing original JPEG file to standardise it's length.

Head & tailing original JPEG file of length $\mathcal{N}(\mu, \sigma^2)$.

Start by chopping off the first 1,000 bytes; Header information lives there. Then chop off everything after $\mu - 6 \sigma$ bytes. A segment $\mu - 6 \sigma - 1,000$ bytes long will be left. Thus of fixed length. This length may have to be slightly tweaked to allow folding(re-shaping) of the block on divisible by four byte boundaries. The Six Sigma value is a somewhat arbitrary management fad, but it covers 99.999 999 802 7% of our JPEG sizes and still allows for long term sensor drift. Then fold and stack the fixed length segment as:-

Folding standardised length JPEG segment and modular addition.

Folding standardised length JPEG segment and modular addition.

Once the folds are summed modulo 256, we should have an IID block of fixed length $ \frac{\mu - 6 \sigma - 1,000 - t}{4} $ bytes, where $t$ is the tweak (-2 bytes in this case). Which is implemented in Python 3 thusly:-

"""
Chop & stack JPEG method to create IID data.
Mean file/jpeg size = 19.46 kB
      Std. dev. = 0.161 kB

Works very well with 4 folds. 
So outputs will be 4,373 B each.
"""

import matplotlib.pyplot as plt
import numpy
import requests

url = 'http://192.168.***.***/snapshot.cgi?user=******&pwd=*************'
no_frames = 1_000
head_chop = 1_000
tail_chop = 19_460 - (6 * 161) - 2  # (-2) tweak needed so that can fold exactly 4.0 times.

with open('/tmp/r', 'wb') as f:
    for i in range(no_frames):
        print(i)
        response = requests.get(url).content
        a = numpy.fromiter(response, numpy.uint8)
        b = a[head_chop:tail_chop]  # Head & tail.
        c = numpy.reshape(b, (4, -1))  # Stack folds.
        d = c.sum(axis=0) % 256  # Modular addition.
        f.write(d.astype(numpy.uint8))


# Graph last frame to visually check.
plt.plot(d, color='purple')
plt.xlabel('Position')
plt.ylabel('Value')
plt.show()

The following waveform and histogram demonstrates the quality of our conditioning. Compare it with a histogram of a raw JPEG from the Photonic Instrument here. But we will test the IID hypothesis rigorously for JPEG files of $\mathcal{N}(19460, 161^2)$ bytes…

Sample and histogram of transformation output.

Sample and histogram of transformation output.

From the above probability mass function, we can immediately visually estimate $ H_{\infty} \approx -\log2(0.004) = 7.97$ bits/byte. Let’s check for IIDdness:-

Our fast IID test of transformation output.

Our fast IID test of transformation output.

Expand our slow IID test over 1,000 post processed JPEG files:-
Our slow IID test of transformation output.

Our slow IID test of transformation output.

Expand NIST ea_iid test over 1,000 post processed JPEG files:-

Our fast, slow and the NIST 800-90B IID tests all confirm that the JPEG segments are indeed IID. Whoopie! So we get min(H_original, 8 X H_bitstring): 7.940423 bits/byte. Let’s just say $H_{\infty} = 7.9$ bits/byte for 4,373 byte long transformed JPEG segments. That’s a min.entropy of 34.5 kbits per segment/frame.

This 4 MB test file r, made from 1,000 conditioned Photonic JPEG frames is available under Related Files at the bottom of the page.


But there’s more! Our chop & stack technique is so good that the IID output file passes cryptographic randomness test suite Special Publication (NIST SP) - 800-22 Rev 1a as detailed below:-

Expand NIST STS test over 1,000 post processed JPEG files:-

Looking at the proportion of tests passed, it’s a pass 😏 Which is interesting in that a 4 MB file has passed NIST’s STS with $H_{\infty} = 7.94$ bits/byte. We won’t use it like this though as we’re aiming for a world beating $H_{\infty} = 1 - 2^{-128}$ bits/bit.

Supplemental

Whilst the fixed length segment was folded and stacked into four pieces, we have tried our IID transformation with three pieces. And it still produces fully verified IID data as shown from our slow IID test output below:-

Our slow IID test of 3 stack transformation output.

Our slow IID test of 3 stack transformation output.

So the entropy rate can be safely increased to 46 kbits per 5,831 byte segment/frame, still at $H_{\infty} = 7.9$ bits/byte. The three stack variant unfortunately fails NIST STS randomness testing by a clear margin, but that’s not the objective at this point as randomness extraction will follow. The test file (r_3-stack) is available below:-

  • r (3906 kB)
  • r_3-stack (5694 kB)