Tag Archives: wilmott

My columns published in the Wilmott Magazine – a well-known publication targeted at quantitative finance professionals.

Benford and Your Taxes

Nothing is certain but death and taxes, they say. On the death front, we are making some inroads with all our medical marvels, at least in postponing it if not actually avoiding it. But when it comes to taxes, we have no defense other than a bit of creativity in our tax returns.

Let’s say Uncle Sam thinks you owe him $75k. In your honest opinion, the fair figure is about the $50k mark. So you comb through your tax deductible receipts. After countless hours of hard work, fyou bring the number down to, say, $65k. As a quant, you can estimate the probability of an IRS audit. And you can put a number (an expectation value in dollars) to the pain and suffering that can result from it.

Let’s suppose that you calculate the risk of a tax audit to be about 1% and decide that it is worth the risk to get creative in you deduction claims to the tune of $15k. You send in the tax return and sit tight, smug in the knowledge that the odds of your getting audited are fairly slim. You are in for a big surprise. You will get well and truly fooled by randomness, and IRS will almost certainly want to take a closer look at your tax return.

The calculated creativity in tax returns seldom pays off. Your calculations of expected pain and suffering are never consistent with the frequency with which IRS audits you. The probability of an audit is, in fact, much higher if you try to inflate your tax deductions. You can blame Benford for this skew in probability stacked against your favor.

Skepticism

Benford presented something very counter-intuitive in his article [1] in 1938. He asked the question: What is the distribution of the first digits in any numeric, real-life data? At first glance, the answer seems obvious. All digits should have the same probability. Why would there be a preference to any one digit in random data?

figure1
Figure 1. The frequency of occurrence of the first digits in the notional amounts of financial transactions. The purple curve is the predicted distribution. Note that the slight excesses at 1 and 5 above the purple curve are expected because people tend to choose nationals like 1/5/10/50/100 million. The excess at 8 is also expected because it is considered a lucky number in Asia.

Benford showed that the first digit in a “naturally occurring” number is much more likely to be 1 rather than any other digit. In fact, each digit has a specific probability of being in the first position. The digit 1 has the highest probability; the digit 2 is about 40% less likely to be in the first position and so on. The digit 9 has the lowest probability of all; it is about 6 times less likely to be in the first position.

When I first heard of this first digit phenomenon from a well-informed colleague, I thought it was weird. I would have naively expected to see roughly same frequency of occurrence for all digits from 1 to 9. So I collected large amount of financial data, about 65000 numbers (as many as Excel would permit), and looked at the first digit. I found Benford to be absolutely right, as shown in Figure 1.

The probability of the first digit is pretty far from uniform, as Figure 1 shows. The distribution is, in fact, logarithmic. The probability of any digit d is given by log(1 + 1 / d), which is the purple curve in Figure 1.

This skewed distribution is not an anomaly in the data that I happened to look at. It is the rule in any “naturally occurring” data. It is the Benford’s law. Benford collected a large number of naturally occurring data (including population, areas of rivers, physical constants, numbers from newspaper reports and so on) and showed that this empirical law is respected.

Simulation

As a quantitative developer, I tend to simulate things on a computer with the hope that I may be able to see patterns that will help me understand the problem. The first question to be settled in the simulation is to figure out what the probability distribution of a vague quantity like “naturally occurring numbers” would be. Once I have the distribution, I can generate numbers and look at the first digits to see their frequency of occurrence.

To a mathematician or a quant, there is nothing more natural that natural logarithm. So the first candidate distribution for naturally occurring numbers is something like RV exp(RV), where RV is a uniformly distributed random variable (between zero and ten). The rationale behind this choice is an assumption that the number of digits in naturally occurring numbers is uniformly distributed between zero and an upper limit.

Indeed, you can choose other, fancier distributions for naturally occurring numbers. I tried a couple of other candidate distributions using two uniformly distributed (between zero and ten) random variables RV1 and RV2: RV1 exp(RV2) and exp(RV1+RV2). All these distributions turn out to be good guesses for naturally occurring numbers, as illustrated in Figure 2.

figure2
Figure 2. The distribution of the first digits in the simulation of “naturally occurring” numbers, compared to the prediction.

The first digits of the numbers that I generated follow Benford’s law to an uncanny degree of accuracy. Why does this happen? One good thing about computer simulation is that you can dig deeper and look at intermediate results. For instance, in our first simulation with the distribution: RV exp(RV), we can ask the question: What are the values of RV for which we get a certain first digit? The answer is shown in Figure 3a. Note that the ranges in RV that give the first digit 1 are much larger than those that give 9. About six times larger, in fact, as expected. Notice how pattern repeats itself as the simulated natural numbers “roll over” from the first digit of 9 to 1 (as an odometer tripping).

figure3a
Figure 3a. The ranges in a uniformly distributed (between 0 and 10) random variable RV that result in different first digits in RV exp(RV). Note that the first digit of 1 occurs much more frequently than the rest, as expected.

A similar trend can be seen in our fancier simulation with two random variables. The regions in their joint distributions that give rise to various first digits in RV1 exp(RV2) are shown in Figure 3b. Notice the large swathes of deep blue (corresponding to the first digit of 1) and compare their area to the red swathes (for the first digit 9).

figure3b
Figure 3b. The regions in the joint distribution of two uniformly distributed (between 0 and 10) random variables RV1 and RV2 that result in different first digits in RV1 exp(RV2).

This exercise gives me the insight I was hoping to glean from the simulation. The reason for the preponderance of smaller digits in the first position is that the distribution of naturally occurring numbers is usually a tapering one; there is usually an upper limit to the numbers, and as you get closer to the upper limit, the probably density becomes smaller and smaller. As you pass the first digit of 9 and then roll over to 1, suddenly its range becomes much bigger.

While this explanation is satisfying, the surprising fact is that it doesn’t matter how the probability of natural distributions tapers off. It is almost like the central limit theorem. Of course, this little simulation is no rigorous proof. If you are looking for a rigorous proof, you can find it in Hill’s work [3].

Fraud Detection

Although our tax evasion troubles can be attributed to Benford, the first digit phenomenon was originally described in an article by Simon Newcomb [2] in the American Journal of Mathematics in 1881. It was rediscovered by Frank Benford in 1938, to whom all the glory (or the blame, depending on which side of the fence you find yourself) went. In fact, the real culprit behind our tax woes may have been Theodore Hill. He brought the obscure law to the limelight in a series of articles in the 1990s. He even presented a statistical proof [3] for the phenomenon.

In addition to causing our personal tax troubles, Benford’s law can play a crucial role in many other fraud and irregularity checks [4]. For instance, the first digit distribution in the accounting entries of a company may reveal bouts of creativity. Employee reimbursement claims, check amounts, salary figures, grocery prices — everything is subject to Benford’s law. It can even be used to detect market manipulations because the first digits of stock prices, for instance, are supposed to follow the Benford distribution. If they don’t, we have to be wary.

Moral

figure4
Figure 4. The joint distribution of the first and second digits in a simulation, showing correlation effects.

The moral of the story is simple: Don’t get creative in your tax returns. You will get caught. You might think that you can use this Benford distribution to generate a more realistic tax deduction pattern. But this job is harder than it sounds. Although I didn’t mention it, there is a correlation between the digits. The probability of the second digit being 2, for instance, depends on what the first digit is. Look at Figure 4, which shows the correlation structure in one of my simulations.

Besides, the IRS system is likely to be far more sophisticated. For instance, they could be using an advanced data mining or pattern recognition systems such as neural networks or support vector machines. Remember that IRS has labeled data (tax returns of those who unsuccessfully tried to cheat, and those of good citizens) and they can easily train classifier programs to catch budding tax evaders. If they are not using these sophisticated pattern recognition algorithms yet, trust me, they will, after seeing this article. When it comes to taxes, randomness will always fool you because it is stacked against you.

But seriously, Benford’s law is a tool that we have to be aware of. It may come to our aid in unexpected ways when we find ourselves doubting the authenticity of all kinds of numeric data. A check based on the law is easy to implement and hard to circumvent. It is simple and fairly universal. So, let’s not try to beat Benford; let’s join him instead.

References
[1] Benford, F. “The Law of Anomalous Numbers.” Proc. Amer. Phil. Soc. 78, 551-572, 1938.
[2] Newcomb, S. “Note on the Frequency of the Use of Digits in Natural Numbers.” Amer. J. Math. 4, 39-40, 1881.
[3] Hill, T. P. “A Statistical Derivation of the Significant-Digit Law.” Stat. Sci. 10, 354-363, 1996.
[4] Nigrini, M. “I’ve Got Your Number.” J. Accountancy 187, pp. 79-83, May 1999. http://www.aicpa.org/pubs/jofa/may1999/nigrini.htm.

Photo by LendingMemo

Quant Life in Singapore

Singapore is a tiny city-state. Despite its diminutive size, Singapore has considerable financial muscle. It has been rated the fourth most active foreign exchange trading hub, and a major wealth management center in Asia, with funds amounting to almost half a trillion dollars, according to the Monitory Authority of Singapore. This mighty financial clout has its origins in a particularly pro-business atmosphere, world class (well, better than world class, in fact) infrastructure, and the highly skilled, cosmopolitan workforce–all of which Singapore is rightfully proud of.

Among the highly skilled workforce are scattered a hundred or so typically timid and self-effacing souls with bulging foreheads and dreamy eyes behind thick glasses. They are the Singaporean quants, and this short article is their story.

Quants command enormous respect for their intellectual prowess and mathematical knowledge. With flattering epithets like “rocket scientists” or simply “the brain,” quants silently go about their jobs of validating pricing models, writing C++ programs and developing complicated spreadsheet solutions.

But knowledge is a tricky thing to have in Asia. If you are known for your expertise, it can backfire on you at times. Unless you are careful, others will take advantage of your expertise and dump their responsibilities on you. You may not mind it as long as they respect your expertise. But, they often hog the credit for your work and present their ability to evade work as people management skills. And people managers (who may not actually know much) do get better compensated. This paradox is a fact of quant life in Singapore. The admiration that quants enjoy does not always translate to riches here.

This disparity in compensation may be okay. Quants are not terribly interested in money for one logical reason–in order to make a lot of it, you have to work long hours. And if you work long hours, when do you get to spend the money? What does it profit a man to amass all the wealth in the world if he doesn’t have the time to spend it?

Besides, quants seem to play by a different set of rules. They are typically perfectionist by nature. At least, I am, when it comes to certain aspects of work. I remember once when I was writing my PhD thesis, I started the day at around nine in the morning and worked all the way past midnight with no break. No breakfast, lunch or dinner. I wasn’t doing ground-breaking research on that particular day, just trying to get a set of numbers (branching ratios, as they were called) and their associated errors consistent. Looking back at it now, I can see that one day of starvation was too steep a price to pay for the consistency.

Similar bouts of perfectionism might grip some of us from time to time, forcing us to invest inordinate amounts of work for incremental improvements, and propelling us to higher levels of glory. The frustrating thing from the quants’ perspective is when the glory gets hogged by a middle-level people manager. It does happen, time and again. The quants are then left with little more than their flattering epithets.

I’m not painting all people managers with the same unkindly stroke; not all of them have been seduced by the dark side of the force. But I know some of them who actively hone their ignorance as a weapon. They plead ignorance to pass their work on to other unsuspecting worker bees, including quants.

The best thing a quant can hope for is a fair compensation for his hard work. Money may not be important in and of itself, but what it says about you and your station in the corporate pecking order may be of interest. Empty epithets are cheap, but it when it comes to showing real appreciation, hard cash is what matters, especially in our line of work.

Besides, corporate appreciation breeds confidence and a sense of self-worth. I feel that confidence is lacking among Singaporean quants. Some of them are really among the cleverest people I have met. And I have traveled far and wide and met some very clever people indeed. (Once I was in a CERN elevator with two Nobel laureates, as I will never tire of mentioning.)

This lack of confidence, and not lack of expertise or intelligence, is the root cause behind the dearth of quality work coming out of Singapore. We seem to keep ourselves happy with fairly mundane and routine tasks of implementing models developed by superior intelligences and validating the results.

Why not take a chance and dare to be wrong? I do it all the time. For instance, I think that there is something wrong with a Basel II recipe and I am going to write an article about it. I have published a physics article in a well-respected physics journal implying, among other things, that Einstein himself may have been slightly off the mark! See for yourself at http://TheUnrealUniverse.com.

Asian quants are the ones closest to the Asian market. For structures and products specifically tailored to this market, how come we don’t develop our own pricing models? Why do we wait for the Mertons and Hulls of the world?

In our defense, may be some of the confident ones that do develop pricing models may move out of Asia. The CDO guru David Li is a case in point. But, on the whole, the intellectual contribution to modern quantitative finance looks disproportionately lopsided in favor of the West. This may change in the near future, when the brain banks in India and China open up and smell blood in this niche field of ours.

Another quality that is missing among us Singaporean parishioners is an appreciation of the big picture. Clichés like the “Big Picture” and the “Value Chain” have been overused by the afore-mentioned middle-level people managers on techies (a category of dubious distinction into which we quants also fall, to our constant chagrin) to devastating effect. Such phrases have rained terror on techies and quants and relegated them to demoralizing assignments with challenges far below their intellectual potential.

May be it is a sign of my underestimating the power of the dark side, but I feel that the big picture is something we have to pay attention to. Quants in Singapore seem to do what they are asked to do. They do it well, but they do it without questioning. We should be more aware of the implications of our work. If we recommend Monte Carlo as the pricing model for a certain option, will the risk oversight manager be in a pickle because his VaR report takes too long to run? If we suggest capping methods to renormalize divergent sensitivities of certain products due to discontinuities in their payoff functions, how will we affect the regulatory capital charges? Will our financial institute stay compliant? Quants may not be expected to know all these interconnected issues. But an awareness of such connections may add value (gasp, another managerial phrase!) to our office in the organization.

For all these reasons, we in Singapore end up importing talent. This practice opens up another can of polemic worms. Are they compensated a bit too fairly? Do we get blinded by their impressive labels, while losing sight of their real level of talent? How does the generous compensation scheme for the foreign talents affect the local talents?

But these issues may be transitory. The Indians and Chinese are waking up, not just in terms of their economies, but also by unleashing their tremendous talent pool in an increasingly globalizing labor market. They (or should I say we?) will force a rethinking of what we mean when we say talent. The trickle of talent we see now is only the tip of the iceberg. Here is an illustration of what is in store, from a BBC report citing the Royal Society of Chemistry.

China Test
National test set by Chinese education authorities for pre-entry students As shown in the figure, in square prism ABCD-A_1B_1C_1D_1,AB=AD=2, DC=2\sqrt(3), A1=\sqrt(3), AD\perp DC, AC\perp BD, and foot of perpendicular is E,

  1. Prove: BD\perp A_1C
  2. Determine the angle between the two planes A_1BD and BC_1D
  3. Determine the angle formed by lines AD and BC_1 which are in different planes.
UK Test
Diagnostic test set by an English university for first year students In diagram (not drawn to scale), angle ABC is a right angle, AB = 3m BC = 4m

  1. What is the length AC?
  2. What is the area of triangle ABC (above)?
  3. What is the tan of the angle ABC (above) as a fraction?

The end result of such demanding pre-selection criteria is beginning to show in the quality of the research papers coming out of the selected ones, both in China and India. This talent show is not limited to fundamental research; applied fields, including our niche of quantitative finance, are also getting a fair dose of this oriental medicine.

Singapore will only benefit from this regional infusion of talent. Our young nation has an equally young (professionally, that is) quant team. We will have to improve our skills and knowledge. And we will need to be more vocal and assertive before the world notices us and acknowledges us. We will get there. After all, we are from Singapore–an Asian tiger used to beating the odds.

Photo by hslo