A data scientist’s way to solve the 8809=6 problem

Last time, one of my colleagues posted this question. It is related to feature extraction of machine learning.

The question starts simple:

8809=6 7111=0 2172=0 6666=4 1111=0 3213=0 7662=2 9313=1 0000=4 2222=0 3333=0 5555=0 8193=3 8096=5 7777=0 9999=4 7756=1 6855=3 9881=5 5531=0
So, 2581=?

The answer itself was easy: number of circles. For these kind of numeric puzzles, I think there should be a general solution that solves these puzzles automatically.

By checking the pattern of the equation list, one can guess that, the answer can be the summation of these patterns, so one can list the digits by its frequency, and this problem becomes a linear regression for a 10-dimension (x0-x9) space.

For example, 8809=6 gives 100000021 = 6 by listing the frequency of each digits, where 1,2,1 are parameters for ‘0’, ‘8’ and ‘9’, which means the weight summation of each digits is 6. The regression function can be dot product for a linear function and tells us why it is 6 for 8+8+0+9.

Thus, one can solve it by linear regression. The regression function can be:

```def __residual(params, xdata, ydata):#guess the function is cosine dot
return (ydata - numpy.dot(xdata, params))
```

And one can use numpy’s least square for the linear regression:

```leastsq(__residual, x0, args=(xdata, ydata))
```

The full source code in python can be found at https://github.com/phunterlau/8809-6

Some discussions:

The output:

```(array([ 1.00000000e+00, 4.75790411e-24, 4.75709632e-24, 4.75588463e-24, 1.00000000e+00, 4.75951970e-24, 1.00000000e+00, 4.75790411e-24, 2.00000000e+00, 1.00000000e+00]), 2)
```

where each value in the array means the correlations between each digit and its’ weight. One can discover that, ‘0’,’4′,’6′,’9′ have weight of 1, and ‘8’ has the weight of 2.

However, there was a bug in the code: the initial value:

```x0=numpy.array([1,1,1,1,1,1,1,1,1,1])#initial guess
```

The initial guess was weight 1 for each digit. After checking with the input, one can find ‘4’ does not exist, so one should take off ‘4’ as weight 1.

2. Why so complicated?

Why it is associated with the topic of ‘feature extraction’ in machine learning?

If one is smart enough, one can tell that, the pattern is actually the number of circles in each digit, so ‘8’ has two circles and its weight is 2.

However, no one can be smart at all time. Some patterns are hidden. The procedure in the example is kind of latent feature extraction: one does not have to know the pattern is about the number of circles in each digit, one can still get that ‘8’ has weight of 2 by using this automatic code.