Winning solution of Kaggle Higgs competition: what a single model can do?

This blog is for describing the winning solution of the Kaggle Higgs competition. It has the public score of 3.75+ and the private score of 3.73+ which has ranked at 26th. This solution uses a single classifier with some feature work from basic high-school physics plus a few advanced but calculable physical features.

Github link to the source code.

1. The model

I choose XGBoost which is a parallel implementation of gradient boosting tree. It is a great piece of work: parallel, fast, efficient and tune-able parameters.

The parameter tuning-up is simple. I know GBM can have a good learning result by providing a small shrinkage eta value. According to a simply brute-force grid search for the parameters and the cross-validation score, I choose :

  • eta = 0.01 (small shrinkage)
  • max_depth = 9 (default max_depth=6 which is not deep enough)
  • sub_sample = 0.9 (giving some randomness for preventing overfitting)
  • num_rounds = 3000 (because of slow shrinkage, many trees are needed)

and leave other parameters by default. This parameter works good, however, since I only have limited knowledge of GBM, this model is not very optimal for:

  • shrinkage too slow and training time too long, so a training takes about 30 minutes and cross-validation takes longer time. On the forum, some faster parameters are published which can limit the training time to less than 5 min.
  • sub_sample parameter gives some randomness, especially when complicated non-linear features are involved, and the submission AMS score is not stable and not 100% reproducible. In the feature reduction section, I have a short discussion.

The cross validation is done as usually. A reminder about the AMS is that, from the AMS equation, one can simplify it to

{\rm AMS} = \frac{S}{\sqrt{B}}

which means, if we evenly partition the training set for K-fold cross validation where the S and B have applied the factor of (K-1)/K, the AMS score from CV is artificially lowered by approximately \sqrt{(K-1)/K}. The same with estimating the test AMS: the weight should be scaled by the number of test samples.

XGboost also supports customized loss function. People have some discussions in this thread and I have some tries. I haven’t applied this customized function for my submission.

2. The features

The key for this solution is the feature engineering. With the basic physics features, I can reach the public leaderboard score around 3.71, and the other advanced physics features push it to public 3.75 and private 3.73.

2.1 General idea for feature engineering

Since the signal (positive) samples are Higgs to tau-tau events where two taus coming from the same Higgs boson, while the background (negative) samples are tau-tau-look-like events where particles have no correlation with the Higgs boson, I have the general idea that, in the signal, these particles should have their kinematic correlations with Higgs, while the background particles don’t. This kinematic correlation can be represented as:

  • Simply as the open angles (see part 2.2)
  • Complicated as CAKE features (see this link in the Higgs forum, this feature is not developed by me nor being used in the submission/solution)

2.2 Basic physics features

Because of the general idea of finding the “correlation” in the kinematic system, the correlation between each pair of particles can be useful. The possible features are:

  • The open angles in the transverse (phi) plane and the longitudinal plan (eta angle). The reminder is that, the phi angle difference must be mapped into ±pi, and the absolute values of these angles work better for xgboost.
  • Some careful study on the open angle can find that, tau-MET open angles in phi direction is somehow useless. It is understandable that in tau-l-nu case the identified tau angle has no much useful correlation with nu angle.
  • Cartesian of each momentum value (px, py, pz): it works with xgboost.
  • The momentum correlation in the longitudinal direction (Z direction), for example, jet momentum in Z direction vs tau-lepton momentum in Z direction is important. This momentum in Z direction can be calculated using the pt (transverse momentum) and the eta angle.
  • The longitudinal eta open angle for the leading jet and the subleading jet: the physics reason is from the jet production mechanism from tau, but it is easy to be noticed when plotting PRI_jet_leading_eta – PRI_jet_subleading_eta without physics knowledge.
  • The transverse momentum ratio of tau-lep, jet-jet, MET to the total transverse momentum.
  • The min and max PT: this idea comes from the traditional cut-based analysis where different physics channel, e.g. 2-jet VBF vs 1 jet jet suppression to lepton, there are minimal and maximal PT cut. In this approach, I give them as features instead of splitting the model, and xgboost picks them up in a nice way.

Overlapping the feature distribution for the signal and the background is a common technique for visualizing features in the experimental high energy physics, for example, the jet-lepton eta open angle distribution for the positive and negative samples can be visualized as following:

Jet-lepton open angle distributions

2.3 Advanced physics features

If one reads the CMS/ATLAS Higgs to tau-tau analysis paper, one can have some advanced features for eliminating particular background. For example, this CMS talk has covered the fundamental points of Higgs to tau-tau search.

In Higgs to tau tau search, one of the most important background is the Z particle where Z can decay into lepton pairs (l-l, tau-tau) and mimic the Higgs signal. Moreover, considering the known Higgs mass is 126 GeV which is close to Z mass 91 GeV, the tau-tau kinematics can be very similar in Z and Higgs.

The common technique for eliminating this background is reconstructing Z invariant mass peak which is around 91 GeV. The provided ‘PRI’ features only have tau momentum and lepton momentum, from which we can’t have precise reconstruction of Z particle invariant, however, this invariant mass idea gives some hint that, the pseudo transverse invariant mass can be a good feature where the transverse invariant mass distribution can be :

Tau-lepton pseduo invariant mass

QCD and W+jet background are also important where lepton/tau can be mis-identified as jet, however, these two have much lower cross-section so the feature work on these two background are not very important. Tau-jet pseudo invariant mass features are useful too.

Some other not important features are:

  • For tau-tau channel, the di-tau momentum summation can be roughly estimated by tau-lep-jet combination although it is far from truth.
  • For 2-jets VBF channel, the jet-jet invariant mass should be high for signal.

Some comments about SVFIT and CAKE-A:

  • I know some teams (e.g. Lubos’ team) are using SVFIT, which is the CMS counter-partner of ALTAS’ MMC method (the DER_invariant_MMC feature). SVFIT has more variables and more considerable features than MMC, so SVFIT can do better Higgs invariant mass construction. Deriving SVFIT feature using the current provided features is very hard, so I haven’t done it.
  • CAKE-A feature is basically the likelihood if a mass peak belongs to Z mass or Higgs mass. CAKE team claims that, it is a very complicated feature and some positive reports exist in the leaderboard, so ATLAS should investigate this feature for their Higgs search model.

2.4 Feature selection

Gradient boosting tree method is generally sensitive to confusing samples. In this particular problem of Higgs to tau-tau, many background events can have very similar features to the signal ones, thus, a good understanding of features and feature selection can help reducing confusions for better model building.

The feature selection uses some logic and/or domain knowledge, and I haven’t applied any automatic feature selection techniques, e.g. PCA. The reason is that, most auto feature selection methods are designed for capturing the max errors while this competition is for maximizing the AMS score, so I am afraid some feature selection can accidentally drop some important features.

Because of the proton-proton collision in the longitudinal direction, the transverse direction is symmetric, thus the raw phi values of these particles are mostly useless and should be removed for less confusions for the boosting process. It is also the reason why ATLAS and CMS detectors are cylinders.

The tau and lep raw eta values don’t help much too. The reason behind it is the Higgs to tau tau production mechanism, but one can easily see it from the symmetric distributions of these variables.

Jets raw eta values are important. The physics reason behind it is from the jet production mechanism where the jets from the signal should be more centralized, but one can easily find it by overlapping the distributions of PRI_jet_(sub)leading_eta for the signal and the background without physics knowledge.

2.5 Discussions

More advanced features or not? It is a question. I only keep physically meaningful features for the final submission. I have experimented some other tricky features, e.g. the weighted transverse invariant mass with PT ratio, and some of them help scoring on the public LB. However, it doesn’t show significant improvement in my CV score. To be safe, I spend the last 3 days before the deadline removing these ‘tricky’ features, and keeping only the basic physics feature (linear combinations) as well as these pseudo invariant mass features, in order not to overfit the public LB. After checking the private LB scores, I find some of them can help, but only a little. @Raising Spirit on the forum has posted a feature which is DER_mass_MMC*DER_pt_ratio_lep_tau/DER_sum_pt and Lubos has a nice comment if it is good idea or not.

CAKE features effect. I have used 3 test submissions for testing CAKE-A and CAKE-B feature. With CAKE A+B, both of my public and private LB submission score drops around 0.01; with CAKE-A, my model score has almost no change (reduce 0.0001); with CAKE-B, my model score improves 0.01. I think it is because CAKE-A feature may have strong correlations with my current feature, while CAKE-B is essentially the MT2 variable in physics which can help for the hadronic (jet) final state. I haven’t include these submissions in my final scoring ones, but thanks to CAKE team for providing these features.

3. Conclusion and lessons learned

What I haven’t used:

  • loss function using AMS score: In these two posts (post 1 and post 2), they proposed the AMS loss function. XGboost has a good interface for these customized loss function, but I just didn’t have chance to tune up the parameters.
  • Tricky non-linear non-physical features.
  • Vector PT summations of tau-lep, lep-tau and other particle pairs, and their open angles with other particles. They are physically meaningful, but my model doesn’t pick them up :-(
  • Categorizing the jet multiplicity (PRI_jet_num). Usually this categorizing technique works better since it increase the feature dimension for better separation, but not for this time, maybe because of my model parameters.
  • Split models by PRI_jet_num. In the common Higgs analysis, the analysis is divided into different num-of-jets categories, e.g. 0-jet, 1-jet, 2-jets, because each partition can have different physical meanings in the Higgs production. XGboost has caught this partition nicely with features, and it handles the missing values in a nice way.

Lesson learned

  • Automate the process: script for filing CV, script for the full workflow of training + testing, script for parameter scan, library of adding features so CV and training can have consistent features. It can save very much time.
  • Discuss with the team members and check the forum.
  • Renting a cloud is easier than buying a machine.
  • I show learn ensembling classifiers for better score in future.
  • Spend some time: studying the features and the model needs some time. I paused my submission for 2 months for preparing for my O1-A visa application (I got no luck in this year’s H1B lottery, so I had to write ‘a lot of a lot’ for this visa instead) and only fully resumed it about 2 weeks before deadline when my visa was approved, so my VM instance has run like crazy for feature work, model tuning and CV since then while I sleep or during daily work. Fortunately, these work (plus electricity bills to Google) has good payback on the rank.

4. Acknowledgment

I want to thank @Tianqi Chen (the author of xgboost), @yr, @Bing Xu, @Luboš Motl for their nice work and the discussions with them. I also want to thank to Google Cloud Engine for providing 500$ free credit for using their cloud computer, so I don’t have to buy my own computer but just rent a 16-core VM.

5. Suggestions to CMS, ATLAS, ROOT and Kaggle

To CMS and ATLAS: In the physics analyses, ML was called ‘multi-variate analysis’ (MVA, so ROOT’s ML package is called TMVA) where features were ‘variates’. This method of selecting variates (features) was from the traditional cut-based analysis where each variate must be kind of strong physically meaningful so they were explainable, for example, in CMS’s tau-tau analysis, we had 2-jet VBF channel where tau-tau required jet-jet momentum was greater than 110 GeV etc. Using this cut-based feature selection idea in the MVA technique gave some limits of feature work where features were carefully selected by physicists using their experience and knowledge, which was good but very limited. In this competition, I came up some intuitive ‘magic’ features using my physics intuition, and the model + feature selection techniques in ML helped removing non-sense ones as well as finding new good features. So, my suggestion to CMS and ATLAS on the feature work is that, we should introduce more ML techniques for helping feature/variate engineering work and use machine’s power for finding more good features.

To ROOT TMVA: XGboost is a great idea for parallelizing the GBM learning. ROOT’s current TMVA is using single thread which is slow, and ROOT should have some similar idea of xgboost into the next version of TMVA.

To Kaggle: it might be a good idea of having some sponsored computing credits from some cloud computing providers and giving them to the competitors, e.g. Google Cloud and Amazon AWS. It can remove the obstacles of computing resources for competitors, and also a good advertisement for these cloud computing providers.

6. Our background

My teammate @dlp78 and I are both data scientists and physicists. We used to work on the CMS experiment. He worked on Higgs -> ZZ and Higgs-> bb discovery channel, and I worked on Higgs->gamma gamma discovery channel and some supersymmetry/exotics particle search. My PhD dissertation is search for long-lived particle decay into photons, in which the method is inspired by the tau discovery method, and my advisor is one of the scientists who discovered the tau particle (well, she also discovered many others, for example: Psi, D0, Tau, jets, Higgs). I have my Linkedin page linked on my Kaggle profile page, and I would like to link to you great Kaggle competitors.

The best photography spots in San Francisco: what can data tell you?

In yesterday morning time, I was reading Danny Dong‘s wedding photos and amazed by his captures in the Golden Gate Bridge, City Hall in San Francisco. And I got this question: what the best photography spots are in San Francisco, besides these two places? If we know this, photographers do not waste time of searching for a good spot. To answer this question, I needed some data.

Fortunately, in SF data, I found this data set of Film Locations in San Francisco. There were about 200 movies filmed in San Francisco, from 1950s to present. In my mind, if professionals took photos at some spots, these spots were great photography spots.

Just the addresses themselves were not enough, I wanted to put them on to the map. Google map has the Geo service which can translate one address into a latitude/longitude location, for example:

Metro (1997) filmed at Bush Street at Jones, can be translated into “Bush Street & Jones Street, San Francisco, CA 94109, USA” and the location is 37.7895596 -122.4137276

With python, this is easy. I installed geopy (in pip), and used this simple piece of code:

from geopy import geocoders
g = geocoders.GoogleV3()

....

location+=', San Francisco, CA'
#I changed the source code of geopy and it returned multiple locations if ambiguity.
place, (lat, lng) = g.geocode(location)[0]

We can use google fusion table to draw these locations on the map. It looks like this:

Using the filming location from 200+ famous movies filmed in SF, we can extract the best photography locations from these movie professionals.

Using the filming location from 200+ famous movies filmed in SF, we can extract the best photography locations from these movie professionals.

One can click this link for the interactive map and go to the location (with streetview!): https://www.google.com/fusiontables/embedviz?q=select+col2+from+1nYwvzx2bNvANTwV03mQ-gge0imwW1iSz7WIxveg&viz=MAP&h=false&lat=37.801855527505346&lng=-122.43532960000005&t=1&z=13&l=col2&y=2&tmplt=2&hml=GEOCODABLE

However, SF is so beautiful that too many locations to read. A heat map works better. I used myheatmap and generated this hot spot map for photography locations. It looks like that the pier area is super hot: quite consistent with my opinion.

Using the filming location from 200+ famous movies filmed in SF, we can extract the best photography locations from these movie professionals.

Using the filming location from 200+ famous movies filmed in SF, we can extract the best photography locations from these movie professionals.

We can click this link and find out the details http://www.myheatmap.com/maps/u6o5WNbTIrM= (one may need to change the ‘decay’ to a smaller value on the right top of the interactive map).

In short, we now know what the best photography spots are, from data. Enjoy photography in San Francisco.

A data scientist’s way to solve the 8809=6 problem

Last time, one of my colleagues posted this question. It is related to feature extraction of machine learning.

The question starts simple:

8809=6 7111=0 2172=0 6666=4 1111=0 3213=0 7662=2 9313=1 0000=4 2222=0 3333=0 5555=0 8193=3 8096=5 7777=0 9999=4 7756=1 6855=3 9881=5 5531=0
So, 2581=?

The answer itself was easy: number of circles. For these kind of numeric puzzles, I think there should be a general solution that solves these puzzles automatically.

By checking the pattern of the equation list, one can guess that, the answer can be the summation of these patterns, so one can list the digits by its frequency, and this problem becomes a linear regression for a 10-dimension (x0-x9) space.

For example, 8809=6 gives 100000021 = 6 by listing the frequency of each digits, where 1,2,1 are parameters for ‘0’, ‘8’ and ‘9’, which means the weight summation of each digits is 6. The regression function can be dot product for a linear function and tells us why it is 6 for 8+8+0+9.

Thus, one can solve it by linear regression. The regression function can be:

def __residual(params, xdata, ydata):#guess the function is cosine dot
    return (ydata - numpy.dot(xdata, params))

And one can use numpy’s least square for the linear regression:

leastsq(__residual, x0, args=(xdata, ydata))

The full source code in python can be found at https://github.com/phunterlau/8809-6

Some discussions:

1. About number ‘4’

The output:

(array([ 1.00000000e+00, 4.75790411e-24, 4.75709632e-24, 4.75588463e-24, 1.00000000e+00, 4.75951970e-24, 1.00000000e+00, 4.75790411e-24, 2.00000000e+00, 1.00000000e+00]), 2)

where each value in the array means the correlations between each digit and its’ weight. One can discover that, ‘0’,’4′,’6′,’9′ have weight of 1, and ‘8’ has the weight of 2.

However, there was a bug in the code: the initial value:

x0=numpy.array([1,1,1,1,1,1,1,1,1,1])#initial guess

The initial guess was weight 1 for each digit. After checking with the input, one can find ‘4’ does not exist, so one should take off ‘4’ as weight 1.

2. Why so complicated?

Why it is associated with the topic of ‘feature extraction’ in machine learning?

If one is smart enough, one can tell that, the pattern is actually the number of circles in each digit, so ‘8’ has two circles and its weight is 2.

However, no one can be smart at all time. Some patterns are hidden. The procedure in the example is kind of latent feature extraction: one does not have to know the pattern is about the number of circles in each digit, one can still get that ‘8’ has weight of 2 by using this automatic code.

Some interview questions for data scientists

Last time, one of my friends asked me for some interview questions to test the candidates of data scientist jobs. I think it is good to share the questions. Later on, I may (if I got some free time) post some detailed solutions and discussions on them.

1-D Facebook (difficulty: medium, problem solving)

We are living in a 3-D world, with x, y and z coordinates. In an 1D world, there is only the x coordinate, and people can only see left and right. There is a startup social network company, we can call it ‘1-D-facebook.com’ , and it wants to build ‘find your friends’ program for 1-D. In 1-D world, people has no size but a dot (no worry about diet :-) ) It has a list of 1M people’s location info, represented as an 1-D float array of 1M length, unsorted . Now, giving a people at array position index N, please find the closest two (left and right) friends.This position array can be very long (but fit in the memory), and unsorted, so sorting and search can be OK, but not yet the optimal solution. preprocess is OK, no space limit.

Reference: this problem was from some real-life cases, for example, it is the final step of amazon’s collaborative filter about ‘people bought this also bought’ after calculated all probability combinations (here this problem uses distance, and amazon using division). Another example is the spelling corrector which needs to find the closest-spelling words from a big dictionary where the distance is defined by edit-distance. A real spelling corrector is much more tricky on the data structure. Here I just simplified the edit-distance to position difference. PS: google’s spelling correction using bayesian models instead of edit-distance.

2-D Facebook (difficulty: hard, problem solving + algorithm)

Since you have solved 1-D Facebook, a 2-D world Facebook company, ‘flat-land.com’, wants to hire you and make the same program for 2-D people, where each people, with position x and y, can see left/right and up-down, and then find 4 closest friends. Surely, it does not have to be up-friend, down-friend etc, any direction is OK.

Reference: this is a real Facebook problem (not interview, no time limit), called ‘small world’ from Facebook Puzzle (this webpage is taken down by Facebook).

DVD Auto-classification (difficulty: medium, problem solving + machine learning)

A DVD-rent company wants to beat Netflix, so they want to build a smart machine learning algorithm to auto-classify DVD movies rather than manually labeling all movies. Fortunately, they only host romantic movies and action movies, so things are easier than those in Netflix. They observed one thing that, in romantic movies, people kiss a lot; in action movies, people fight a lot, but romantic movies can have fights too. Can you use this information to build a classifier which can tell if a new movie is action or romantic?

Reference: From the book ‘Machine Learning in action’. It is also from a real-life problem from my current project, but I simplified it to numerical features.

Super long feature similarity (difficulty: medium-hard, programming+machine learning)

Some machine learning models produced list of features for two soft drinks, for example, value of the content of sugar etc. One wants to compare the similarity of these two drinks using machine learning, how? (Interviewee should answer cosine similarity or dot-product or some other distance functions to compare two feature vectors).

Let’s take cosine similarity for example. Now, the real situation is that, there are millions of features from machine learning models, and some drinks may miss many features, in the other words, the feature is very sparse. So, if we want to compare two drinks with sparse features, where one drink can have many features that the other drink does not have. Do we really need to multiple each feature for these many zero values?

Calculate square-root of integer N. (difficulty: medium-hard. Numerical methods and programming)

This question can have some variations:

  • (easy) How to tell if an integer N is a perfect square number (N=k*k where k is an integer).
  • (medium) Given a very large integer N, and the number m where m*m<=N and (m+1)*(m+1)>N.
  • (hard, needs hint) How to determine a number is a Fibonacci number? The hint should be given by the interviewer: a Fibonacci number can be represented either in 5*N**2+4 or 5*N**2 -4, so simply to test if this number plus/minus 4 divided by 5 is a perfect square number.
  • (medium-hard) How to determine a number is summation of two perfect square numbers?

What if N is very large, and one can not build a table of square numbers?

Essay-copying (difficulty: medium-hard, NLP, machine learning, modeling)

In the final test of the university, the professor received 200 essays from students, about 1000 words each. Badly, he found some students were copying other people’s essays. But these students were smart: they did not copy the entire essay, maybe change words in some sentences, may copy from 2-3 other persons (but surely, they do not copy from all the other 200 students, no enough time :-) ). Please build a machine learning system to help professor find these bad students.

Reference: clustering using nature language processing is very important in the real life. This is an example.

 

Check if a number is a Fibonacci number: speed comparison for two Quora answers in python

It is interesting to read this Quora thread
What is the most efficient algorithm to check if a number is a Fibonacci Number? The top two answers are both very good and efficient.

Background, what is Fibonacci number? wikipedia

Anders gave a nice matrix exponentiation algorithm in Haskell while John posted his Java solution using square numbers. There was an argument which one was faster, so I made this comparison using python.

First of all, John needed a faster algorithm to determine if a number is a perfect square number. He used a loop from 2 to sqrt(N), and it was surely not an efficient way.

A better way to do it is the newton’s method of integer square number (wikipedia), in short, it equals using newton’s method to solve x*x-n = 0. In python, it can be easily solved by this (original code from this link):

def isqrt(x):
    if x < 0:
        raise ValueError('square root not defined for negative numbers')
    n = int(x)
    if n == 0:
        return 0
    a, b = divmod(n.bit_length(), 2)
    x = 2**(a+b)
    while True:
        y = (x + n//x)//2
        if y >= x:
            return x
        x = y

And then, John’s method is basically testing if 5*N*N+4 or 5*N*N-4 is a perfect square number. If the answer is yes for either one, this number is a Fibonacci number.

Actually the square root algorithm can have more optimizations using the 64 bit magic number  0x5fe6eb50c7b537a9 (the DOOM trick), please check wikipedia for more interesting details. To be platform independent, here I just used the original newton’s method.

Secondly, Anders code was in Haskell, so I rewrote them into Python for a fair comparison.

def fibPlus((a,b),(c,d)):
  bd = b*d
	return (bd-(b-a)*(d-c), a*c+bd)

def unFib((a,b),n):
	if n<a:
		return (0,0,1)
	else:
		(k,c,d) = unFib(fibPlus((a,b),(a,b)),n)
		(e,f) = fibPlus((a,b),(c,d))
		if n<e: return (2*k, c, d)
		else:
			return (2*k+1,e,f)

def isfib(n):
	(k,a,b) = unFib((1,1),n)
	return n==a

The full source code can be found on my github https://github.com/phunterlau/checkFibonacci

To test these two algorithm, I downloaded the first 500 Fibonacci numbers from http://planetmath.org/listoffibonaccinumbers and ran 100 times for each algorithm on this list of number. The result is interesting: python optimization makes difference. Unit is in second.

If run in python 2.7, John’s method won for 10% time:

python is_fib.py Anders method: 1.52931690216 John method: 1.36000704765

If run in pypy 1.9, Anders method is highly optimized for 2x speed:

pypy is_fib.py Anders method: 0.799499988556 John method: 2.0126721859

To conclude, both of these two algorithms are very good.

One another question to follow this:

Given a number N, if N is not a Fibonacci number, print out the largest Fibonacci smaller than N.

Hint: John’s method.

[challenge accepted] palindromic prime number of 5 digits

Thanks to the first comment in the last post which directed me to a math problem (http://contestcoding.wordpress.com/2013/03/22/palindromic-primes/) . From a scientist point of view, the answer itself, 98689, is just simple, but the way to answer it is kind of interesting. Let’s have a look at it step by step how a scientist solved it.

1. How to generate a list of palindromic number?

‘A palindromic number is a number that reads the same backwards as forwards (1991 for example).’ One can iterate in all numbers and check if it is palindromic:

str(num)==str(num)[::-1]

, or, a smarter way to generate them [1]:

from itertools import product

def palindromeNum(n):
   return [n*'%s'%tuple(list(i)+list(i[n*(n-1)/2%(n-1)-1::-1])) for i in product(*([range(1,10)]+[range(0,10)]*((n+1)/2-1)))]

2. Now we have a list of palindromic number, how do we determine it is a prime or not?

Simple. The is_prime() function from textbooks should work with no problem. One can loop from 2 to the square root of N and see if it can divide N. Let me introduce you a better way, Miller Rabin primality test, a pro’s way.

wikipedia: http://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test

In general, it is a faster and more efficient way to test if a number is a prime. Some other methods can be AKS primality test (wikipedia) and so on. It is good to know them :-)

There is a nice implementation of Miller Rabin in python here, one can have a look at this beautiful algorithm.

So, we can build a loop from the end of the list of palindromic number that we generated at step 1, and test each of them using Miller Rabin primality test, we can easily find the number is 98689.

3. Some additional thoughts

  • The problem asked about 5 digits palindromic number. Also, these kind of numbers exist for 3 digits, 7 digits, 9 digits and so on, but never exist for 4 digits, 6 digits. Can you guess why? hint: some thing about number 11.
  • There is a nice website http://oeis.org/A114835 for the set of palindrome prime numbers. There are some other interesting sets of numbers.
  • Do you know the largest 11 digits palindromic number is 99999199999, 13 digits 9999987899999?

Source code:

Actually the function palindromeNum() can be optimized by generator and yield, which does real time calculation rather than unnecessarily generating the whole list.

from itertools import product
import miller_rabin

def palindromeNum(n):
 return [n*'%s'%tuple(list(i)+list(i[n*(n-1)/2%(n-1)-1::-1])) for i in product(*([range(1,10)]+[range(0,10)]*((n+1)/2-1)))]

for i in palindromeNum(5)[::-1]:
 if miller_rabin.miller_rabin(int(i)):
 print i
 break

A scientist’s ‘attitude’

Source code and data file can be found at https://github.com/phunterlau/attitude info me before use it, please.

You might have heard of this piece of junk about ‘attitude’ for many times, now a real scientist is showing you the secret behind this.

First of all, it is like this:

A small truth to make our Life 100% successful.. ……..

If A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Is equal to 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26

Then H+A+R+D+W+O+ R+K = 8+1+18+4+23+ 15+18+11 = 98%

K+N+O+W+L+E+ D+G+E = 11+14+15+23+ 12+5+4+7+ 5 = 96%

L+O+V+E=12+15+ 22+5=54%

L+U+C+K = 12+21+3+11 = 47%

(None of them makes 100%)
………… ……… ……… .
Then what makes 100%
Is it Money? ….. No!!!!!
Leadership? …… NO!!!!

Every problem has a solution, only if we perhaps change our “ATTITUDE”.
It is OUR ATTITUDE towards Life and Work that makes
OUR Life 100% Successful..

A+T+T+I+T+U+ D+E = 1+20+20+9+20+ 21+4+5 = 100%

Well, it might be true. But, is ‘attitude’ the only word which can make 100%? Let’s have a look.

What you need is python and a list of English words. I have done some google search and found this nice site of a list of 109582 English words: http://www-01.sil.org/linguistics/wordlists/english/ .

Open this list with python, and type in this piece of code:

print ‘,’.join([x.strip() for x in words_f.readlines() if sum([ord(i)-96 for i in x.strip()])==100])

We can easily get 1297 English words which can make 100%. Beside ‘attitude’, we also have, for example, :

alienation,inefficient,infernos……

This a scientist’s ‘attitude': never trust 100%. We have something in the life much more important than just 100%, just like ‘love’=54%.