# Entropy analysis

We firstly check that entropy is actually being generated by Mata Hari’s entropy source. We are trying to confirm that irreducible information is being generated in a near homogeneous manner (ergodicity), rather than some initial chaotic start up event. The complexity test below shows a good uniformity of entropy production, with all ten normalised segments having a value of around 1.

Essentially the device works due to quantization error when sampling a random (uncorrelated) source at a 10 bit resolution. And the beauty is that we can form an additive noise model for the circuit whereby the Zener generated noise rides atop any ambient noise like radio, EMI or anything induced from the mains supply.

The above probability density function shows the range of quantized signal values via the internal ADC sampling at the default nominal rate of 10 kSa/s $(\epsilon = 1.1 \text{mV}, \tau = 112 \: \mu \text{s})$ or 8.93 kSa/s actual. We have fitted a curve to it, and it appears to be slightly skewed which we infer to be a log-normal distribution. This is proof that some Zener avalanche effect occurs within our 8.2 V diode with slightly over 0.8 V compliance (given a healthy battery).

But this signal can’t be used as is. `ent`

gives a lag = 1 autocorrelation ($R$) value of 0.049733. Typically $R \le 10^{-3}$ if the data set is to be considered IID. IIDness dramatically increases our confidence in min.entropy ($H_{\infty}$) determination and bypasses all those dodgy NIST 90B non-IID shenanigans and hoo-ha detailed here. We will not even bother to perform an IID test on this raw data; it would fail.

So in keeping with our Three Golden rules, we will modify our $(\epsilon, \tau)$ sampling methodology and reduce $N_{\epsilon}$ to only one bit, equivalent to a one bit 1.1 mV digitiser. We will sample as `(analogRead(portNo) & 0b1)`

. Hopefully this will result in IID samples. Thus testing…

These two tests confirm that our sampling regime is IID. Therefore we can compress without worry the individual bits into a byte to increase the efficiency of transferring the entropy off-circuit. We can do this utilising a left shift and some `OR`

s. Incidentally, the extra processing has the effect of ever so slightly decreasing the sample rate ($\tau \uparrow $) and thus enhancing IIDness, but we couldn’t exactly quantify this small difference ourselves $(\tau \approx 112 \: \mu \text{s})$. The internal Mata Hari sample loop is like so:-

```
for (int16_t i = 0; i < totalSamples; i++) {
uint8_t sample = 0;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
sample <<= 1;
sample |= (analogRead(portNo) & 0b1);
buffer[i] = sample;
}
```

`buffer`

is then off loaded via a binary `Serial.write(buffer, totalSamples)`

statement. Some firmware is in ‘Related files’. Testing the resulting 8 bit binary output, we now get:-

We find from `ent`

over the 8 bits/byte compressed output:-

```
$ ent -b /tmp/mata-hari-10mb-x8bit.bin
Entropy = 0.999995 bits per bit.
Optimum compression would reduce the size
of this 80000000 bit file by 0 percent.
Chi square distribution for 80000000 samples is 587.11, and randomly
would exceed this value less than 0.01 percent of the times.
Arithmetic mean value of data bits is 0.4986 (0.5 = random). <<<<
Monte Carlo value for Pi is 3.151026060 (error 0.30 percent).
Serial correlation coefficient is -0.000459 (totally uncorrelated = 0.0). <<<<
```

We can ignore all the metrics other than arithmetic mean and serial correlation. An arithmetic mean of 0.4968 suggests the expected bit bias of $\epsilon = 0.0014$, or $ \epsilon = 2^{-9.5}$ away from $0.5$ towards $0$, calculated as $ 9.5 = \frac{\log \big (\frac{1}{0.5-0.4986} \big)}{\log(2)} $. Not exactly NIST’s $\epsilon \le 2^{-64}$ next bit requirement, but then **this is not a TRNG. It’s the entropy source for a TRNG**. Randomness extraction follows which reduces the bias to a negligible degree. And notice the correlation. $|R| = 0.000459$ is very good indeed being an order of magnitude below what’s expected of IID data. Which we further confirm by three independent means…

Both our and NIST’s IID tests confirm that the compressed data sample is IID. Therefore the entropy $(H_{\infty})$ rate is simply $-\log_2 (p_{max})$ or `min(H_original, 8 X H_bitstring): 7.947502`

taken from the NIST test output above.

Conclusion: That’s an internal on board generation rate of **7.9 bits/byte @ 8.8 kbits/s of pure IID entropy**. Using a 9V battery. Of course this rate is constrained by transferring the entropy off board, and we have chosen 9600 baud for security and device compatibility reasons. Thus the Mata Hari kit feeds a client device over USB at **7.9 bits/byte @ 4.4 kbits/s**.

Please just remember, this is an entropy source not a fully fledged TRNG. We then leverage the Leftover Hash Lemma to reduce the bias to a world beating $2^{-128}$.