A data scientist’s way to solve the 8809=6 problem

Last time, one of my colleagues posted this question. It is related to feature extraction of machine learning.

The question starts simple:

8809=6 7111=0 2172=0 6666=4 1111=0 3213=0 7662=2 9313=1 0000=4 2222=0 3333=0 5555=0 8193=3 8096=5 7777=0 9999=4 7756=1 6855=3 9881=5 5531=0
So, 2581=?

The answer itself was easy: number of circles. For these kind of numeric puzzles, I think there should be a general solution that solves these puzzles automatically.

By checking the pattern of the equation list, one can guess that, the answer can be the summation of these patterns, so one can list the digits by its frequency, and this problem becomes a linear regression for a 10-dimension (x0-x9) space.

For example, 8809=6 gives 100000021 = 6 by listing the frequency of each digits, where 1,2,1 are parameters for ‘0’, ‘8’ and ‘9’, which means the weight summation of each digits is 6. The regression function can be dot product for a linear function and tells us why it is 6 for 8+8+0+9.

Thus, one can solve it by linear regression. The regression function can be:

def __residual(params, xdata, ydata):#guess the function is cosine dot
    return (ydata - numpy.dot(xdata, params))

And one can use numpy’s least square for the linear regression:

leastsq(__residual, x0, args=(xdata, ydata))

The full source code in python can be found at https://github.com/phunterlau/8809-6

Some discussions:

1. About number ‘4’

The output:

(array([ 1.00000000e+00, 4.75790411e-24, 4.75709632e-24, 4.75588463e-24, 1.00000000e+00, 4.75951970e-24, 1.00000000e+00, 4.75790411e-24, 2.00000000e+00, 1.00000000e+00]), 2)

where each value in the array means the correlations between each digit and its’ weight. One can discover that, ‘0’,’4′,’6′,’9′ have weight of 1, and ‘8’ has the weight of 2.

However, there was a bug in the code: the initial value:

x0=numpy.array([1,1,1,1,1,1,1,1,1,1])#initial guess

The initial guess was weight 1 for each digit. After checking with the input, one can find ‘4’ does not exist, so one should take off ‘4’ as weight 1.

2. Why so complicated?

Why it is associated with the topic of ‘feature extraction’ in machine learning?

If one is smart enough, one can tell that, the pattern is actually the number of circles in each digit, so ‘8’ has two circles and its weight is 2.

However, no one can be smart at all time. Some patterns are hidden. The procedure in the example is kind of latent feature extraction: one does not have to know the pattern is about the number of circles in each digit, one can still get that ‘8’ has weight of 2 by using this automatic code.

Some interview questions for data scientists

Last time, one of my friends asked me for some interview questions to test the candidates of data scientist jobs. I think it is good to share the questions. Later on, I may (if I got some free time) post some detailed solutions and discussions on them.

1-D Facebook (difficulty: medium, problem solving)

We are living in a 3-D world, with x, y and z coordinates. In an 1D world, there is only the x coordinate, and people can only see left and right. There is a startup social network company, we can call it ‘1-D-facebook.com’ , and it wants to build ‘find your friends’ program for 1-D. In 1-D world, people has no size but a dot (no worry about diet 🙂 ) It has a list of 1M people’s location info, represented as an 1-D float array of 1M length, unsorted . Now, giving a people at array position index N, please find the closest two (left and right) friends.This position array can be very long (but fit in the memory), and unsorted, so sorting and search can be OK, but not yet the optimal solution. preprocess is OK, no space limit.

Reference: this problem was from some real-life cases, for example, it is the final step of amazon’s collaborative filter about ‘people bought this also bought’ after calculated all probability combinations (here this problem uses distance, and amazon using division). Another example is the spelling corrector which needs to find the closest-spelling words from a big dictionary where the distance is defined by edit-distance. A real spelling corrector is much more tricky on the data structure. Here I just simplified the edit-distance to position difference. PS: google’s spelling correction using bayesian models instead of edit-distance.

2-D Facebook (difficulty: hard, problem solving + algorithm)

Since you have solved 1-D Facebook, a 2-D world Facebook company, ‘flat-land.com’, wants to hire you and make the same program for 2-D people, where each people, with position x and y, can see left/right and up-down, and then find 4 closest friends. Surely, it does not have to be up-friend, down-friend etc, any direction is OK.

Reference: this is a real Facebook problem (not interview, no time limit), called ‘small world’ from Facebook Puzzle (this webpage is taken down by Facebook).

DVD Auto-classification (difficulty: medium, problem solving + machine learning)

A DVD-rent company wants to beat Netflix, so they want to build a smart machine learning algorithm to auto-classify DVD movies rather than manually labeling all movies. Fortunately, they only host romantic movies and action movies, so things are easier than those in Netflix. They observed one thing that, in romantic movies, people kiss a lot; in action movies, people fight a lot, but romantic movies can have fights too. Can you use this information to build a classifier which can tell if a new movie is action or romantic?

Reference: From the book ‘Machine Learning in action’. It is also from a real-life problem from my current project, but I simplified it to numerical features.

Super long feature similarity (difficulty: medium-hard, programming+machine learning)

Some machine learning models produced list of features for two soft drinks, for example, value of the content of sugar etc. One wants to compare the similarity of these two drinks using machine learning, how? (Interviewee should answer cosine similarity or dot-product or some other distance functions to compare two feature vectors).

Let’s take cosine similarity for example. Now, the real situation is that, there are millions of features from machine learning models, and some drinks may miss many features, in the other words, the feature is very sparse. So, if we want to compare two drinks with sparse features, where one drink can have many features that the other drink does not have. Do we really need to multiple each feature for these many zero values?

Calculate square-root of integer N. (difficulty: medium-hard. Numerical methods and programming)

This question can have some variations:

  • (easy) How to tell if an integer N is a perfect square number (N=k*k where k is an integer).
  • (medium) Given a very large integer N, and the number m where m*m<=N and (m+1)*(m+1)>N.
  • (hard, needs hint) How to determine a number is a Fibonacci number? The hint should be given by the interviewer: a Fibonacci number can be represented either in 5*N**2+4 or 5*N**2 -4, so simply to test if this number plus/minus 4 divided by 5 is a perfect square number.
  • (medium-hard) How to determine a number is summation of two perfect square numbers?

What if N is very large, and one can not build a table of square numbers?

Essay-copying (difficulty: medium-hard, NLP, machine learning, modeling)

In the final test of the university, the professor received 200 essays from students, about 1000 words each. Badly, he found some students were copying other people’s essays. But these students were smart: they did not copy the entire essay, maybe change words in some sentences, may copy from 2-3 other persons (but surely, they do not copy from all the other 200 students, no enough time 🙂 ). Please build a machine learning system to help professor find these bad students.

Reference: clustering using nature language processing is very important in the real life. This is an example.