Road to PhD

Saturday, May 8, 2010

First week at TD Securities

After one term of no blogging, I'm still feeling lazy to write down stuff. Even today, as I gather of my will to write something, facing my huge window of a 48th floor condo with a view on Lake Ontario and the CN Tower, I really don't feel like putting in the pain to format the following math stuff in LaTeX...

For this work term, I had the choice between being a "Financial Software Developer" at Bloomberg, NY, or being a "Quantitative Financial Engineer" at TD Securities, TO, in the Equity Derivatives group. I ditched Bloomberg. Best choice ever.

The first day at TD Securities was pretty awful. Only one of my two bosses came in to work, and he was the one introducing me around. But since he was kind of socially awkward, I didn't get to know anybody really. Very quickly we sat down at our desks, with me not even knowing where the washrooms or the kitchen are. The environment is pretty nice. Not for anyone though. Completely open space on the trading floor with phones ringing, traders screaming, beeping here and there, I sat in front of my dual screens, probably 24 inch each, with two other screens for a Bloomberg terminal. My boss, W, introduced me to the whole software framework. Base code in C++, wrappers to Excel, Murex, Polaris, and I don't know what else. Without any background in "industrial coding", I look incredibly stupid listening to W and simply nodding. He showed me various APIs, version control with WinCVS, debugging and all those features with Visual C++, and WinMerge for differencing code. Holy crap that was information overload... Good thing he then talked to me about something slightly more recognizable after a few hours of me clicking around the codebase. He showed me the problem the last coop was working on maybe three weeks before I came in:

All valid correlation matrices are positive definite, but with inevitable errors in the data, or maybe someone fiddling with the data for experimentations, a "correlation matrix" doesn't always turn out to be positive definite. What do we do in those cases?

Since the set of correlations generated a non positive definite matrix, the last coop tried to find the maximal subset of correlations such that the resulting matrix was still positive definite. I didn't go through his code in depth, but it seems that he was using a Cholesky decomposition on a larger and larger set of correlations, and when it didn't work (A matrix is Cholesky decomposable iff it is positive semidefinite and symmetric) he would swap the last stock and tried another set, until he couldn't add more correlations to the matrix.

For the rest of the day I worked on this problem. I kept the idea of trying to find the maximal subset that still generates a positive definite matrix, but I found the way he was trying to find such subset was incredibly brute-force-stupid, and it wouldn't give any insight on "why" the matrix was not positive definite.

Second day comes. My other boss D, the one who interviewed me, arrives. He says hi and goes on working at his desk, next to W who is next to me. Great... I continue trying to familiarize myself with the code for another 3-4 hours, have lunch at my desk, and move on to the correlation problem.

After finding inspiration from teh internetz, and fiddling around with Matlab, I came up with an intuitive method.

Given a non positive definite correlation matrix S (symmetric, diagonals of 1, off-diagonals of (-1,1) but with at least one negative eigenvalue), we diagonalize it S = QLQ* where L is the diagonal matrix of real eigenvalues. This was possible because S is real symmetric. Let L' = L, except when L(i,i) is smaller than 0. For those cases we let L'(i,i) = eps. Now by construction, S' = QL'Q* is positive definite. Also, S' is the closest positive definite matrix in terms of the operator norm. Now we normalize S', a covariance matrix, to get S'' the new correlation matrix. Because of the normalization, S'' is not the closest valid correlation matrix to S, but it seems to be close enough when the S is not too badly behaved. There is an exchange between mathematical rigour and practicality. With this new valid correlation matrix S'', we look at the entry-wise difference squared with S, and plot it. This plot seems to show spikes of large differences in the rows/columns for certain stocks. Some correlations stand out by a large amount and are usually the "cause" for non positive definiteness. So we can take out those stocks associated with those problematic correlations and check if the resulting matrix is positive definite. What I have found experimentally is that by removing such stocks, the matrix really become positive definite. With a 90 stocks sample, 5 seem to show high spikes in the difference plot. Removing those 5 made the matrix positive definite, and re-adding any one of them made it non positive definite. Of course it doesn't show that this is the maximal subset generating a positive definite matrix, but it is a nice step away from being clueless, and this is definitely not a brute-force algorithm.

I arrived to this point around 4pm on Tuesday and immediately informed D. After my short presentation, he seemed very satisfied and excited. He wanted me to code it up immediately in C++ and make a new function for Excel for the traders to use. So he spent the next two and a half hours setting up my workspace and showing me how to compile stuff and test it on Excel (wtf.. if I didn't come up with those results how would I even start working with nothing set up?). It was a good day.

Third day, I start to actually fiddle with the code, modifying stuff and learning how a big project is organized. I start writing my newly designed test and I obviously encounter a myriad of little problems, each of which teach me something new about programming. Weirdly enough, I had to write my own matrix multiplication function in the matrix class, and I had to find a diagonalization method online to copy it in our code (diagonalization is not as simple as it looks theoretically, especially when you try to account for time and memory efficiency, and all the itty-gritty details).

Thursday was great. Thursday May 6th 2010, greatest intraday drop in the DOW due to some error, apparently by some unknown trader entering billion instead of million. PG spiked down by 30%, and various other smaller stocks plummeted to less than one measly cent and bounced right back up. That trader probably lost his job and ended his career by now as the SEC decides to intervene by cancelling trades that made/lost more than 60%. I continue working on my code in the midst of traders almost screaming as loud as in the movies and telephones ringing endlessly. Magical experience.

Friday I finish coding with all the output formatting, documentation, clean up, etc. At the end of the day I do some proofreading on some methodology documentation. D tells me I'll present this to the traders next Tuesday. Cool.

It's raining outside now.

Thursday, January 7, 2010

Winter '10

I decided to not take six classes. Five is a good balance between academics and non-academics.

Winter '10

CS 240 Data Structures

AM 250 Intro to Differential Equations

CO 350 Linear Optimization

ACTSC 371 Corporate Finance I

CS 476 Numeric Computation for Financial Modelling

First week of classes aren't very interesting as I know most of the material.

I'm also trying to attend psychology seminars. First attempt failed miserably today when I sat through Psych450 Close Relationships, and had to introduce myself, revealing my non-artsy reality. The prof then told me I couldn't audit. I'll try for Psych457 Hypnosis.

Friday, December 11, 2009

On financial advisors

I have become interested in finance relatively recently (say less than two years). At first, it was obviously because of the potential big pay, big bonuses, cash-cow financial industry. I don't think it's a shameful thing to say. So many people around me are clearly driven by the potential monetary returns of their respective field of study. However, the more I think about it, the more the whole financial sector seems like a huge corrupt monopoly with almost unsurmountable barriers of entry. This sorta makes things even more interesting to me.

The first doubt I had was back in summer 2008. I worked for a financial advisor in hopes to learn about stock picking, portfolio building, analyzing companies, and the such. I believed back then that one could, if smart enough, consistently outperform the market just by doing research, or whatever people did. To my surprise, almost none of the financial advisors at the firm had an exclusive finance education. My boss was a former engineer, and his colleague, a former criminologist. Sometimes, I would see the criminologist play poker on PokerStar. I wasn't long before I connected the dots.

Being a financial advisor has nothing to do with building a good stock portfolio. Most of their revenue is generated from attracting new clients to the company (that is where I came in, cold calling random people), and a bit of fees from their existing bank of clients. They have absolutely no incentive to do any research to pick good stocks (if they could actually do that). Briefly put, a financial advisor does not necessarily have more knowledge in how to outperform the market. They are mostly salesmen, with a four-month education of finance based on the Canadian Securities Course.

Now I have to say, to the financial advisor's defense, that they are not completely useless. In fact, most people have no clue about how things work in the financial industry (foreshadowing a future topic). Also, people's intuition on probabilities and exponentials is usually very weak. They do not know much about balancing risk and returns. I have seen people accepting to have a near zero rate of interest on their savings during my time at the firm, due the the sole fact that their investment was guaranteed. I have also seen people take huge bets on options, without seeing the incredibly leveraged potential losses.

Hence, a good financial advisor's job is to truly find his client's goals and risk tolerances, and find an appropriate balance of investment vehicles with his knowledge of risk management. They should also educate the client; but that's a bit much to ask. Obviously this is not your everyday financial advisor. They get paid in commissions when they sell products from the firm, and collect fees on transactions. When one say they want to put all their money in this, this and that stock; for the advisor, it's kind of hard to say no. Clients are still clients. And clients will go elsewhere if they are not satisfied. It is very difficult for a good advisor to show that a portfolio appreciation does not automatically validates the actions one took to get there. In an uneducated client's mind, or even anybody's mind, it is unnatural to frown on excessive risk-taking when given spectacular returns. These are the many problems an advisor must face.

[DM]

Sunday, December 6, 2009

Discrete Fourier Transform, intro to

I have been reluctant on writing a post about the discrete Fourier transform (DFT) for a while because I don't feel comfortable with all the details. In fact, a few weeks ago was the first time I saw the DFT in an academic setting. But even then, the introduction given in CS371 was a high-level overview of the theory. So here it goes. Reader discretion is advised.

It can be shown that a "nice" function $f(x)$ (the true meaning of "nice" is beyond the scope of my current knowledge) defined on the interval $x \in [a,b]$ can be approximated by a Fourier series:

$$g(x) = \frac{a_0}{2} + \sum_{k=1}^{\infty}\bigg[a_k\cos(k\frac{2\pi x}{b-a})+b_k\sin(k\frac{2\pi x}{b-a})\bigg]$$

with

$$a_k = \frac{2}{b-a}\int_a^bf(x)\cos(k\frac{2\pi x}{b-a})dx$$

$$b_k = \frac{2}{b-a}\int_a^bf(x)\sin(k\frac{2\pi x}{b-a})dx$$

Just like the Taylor series, this result is quite miraculous if not more. The way I see it is that the set of all continuous functions $u: D \subseteq \mathbb{R} \mapsto \mathbb{R}$ form an infinite dimensional vector space. Then, we define the distance between two functions $u(x)$ and $v(x)$ to be $\sqrt{\int_a^b (u(x)-v(x))^2dx}$. Now that we have a notion of distance, we can take look at the set $B = \{1, \sin(2\pi x), \cos(2\pi x), \sin(4\pi x), \cos{4\pi x}, \cdots\}$ and grind it through Gram-Schmidt to get an orthonormal set of vectors. It's easy to see that the elements were linearly independent, and we could have started with an arbitrary finite number of them. Now, we just proceed to project our initial function $f(x)$ onto the subspace generated by $B$. The result is $g(x)$ the Fourier series of $f(x)$.

We can now interpret the coefficients of $g(x)$ as "how much of certain frequency of oscillation does the initial function contains". Now to finally define the Fourier transform, a FT is an operator $\mathcal{F}$ which takes in a "nice" function $f(x)$ and spits out two functions $a_n$ and $b_n$ defined on the non-negative integers representing the coefficients of the sines and cos.

In practice, one main problem. We never have continuous data in the real world. To deal with that, we simply discretize the integral into a $N$ part Riemann sum, where $N$ is the number of "equally distanced" data points we have, depending on the context. This is where the "discrete" part of the DFT comes in. Usually, when we perform a DFT, we don't care much about the higher frequency terms because they are usually small.

Now one will never see the Fourier transform under this form because it's so ugly to write out. We bring in Euler's Identity to simplify a few things with complex numbers. We can write:

$$g(x) = \sum_{k=-\infty}^{\infty}c_ke^{2\pi ikx}$$

$$c_k = \frac{1}{2\pi}\int_{-\pi}^{\pi}f(x)e^{-2\pi ikx}dx$$

It is easy, but tedious to work out how the coefficients $c_k$ are related to $a_k$, $b_k$. One thing is for sure, the coefficients with the same $k$ correspond to the same frequencies.

Finally, we discretize and cut off high frequencies. Assume we get a time-series $f_k$ for $k = 1\cdots N$, we apply the DFT:

$$F_k = \frac{1}{N}\sum_{n=N/2+1}^{N/2}f_ne^{\frac{-2\pi ikn}{N}}$$

We go a bit further to make things look nice and substitute $\omega = e^{\frac{2\pi i}{N}}$. So,

$$F_k = \frac{1}{N}\sum_{n=N/2+1}^{N/2}f_n\omega^{-kn}$$

Before I end the post, I'd like to give one application of the DFT. The reason we want to perform the DFT is to distinguish between various frequencies by looking at the corresponding coefficients.

One application of this is filtering out noise from a signal. Usually, noise is of much higher frequency than the signal we wish to process. A simple procedure would be to take the DFT of this signal, multiply the coefficients corresponding to "high" frequencies by zero (the meaning of high depends on the context), and take the inverse DFT (I haven't talked about the inverse DFT, but this post is getting long). This will yield a nice signal devoid of most of the unwanted noise.

Other applications of the DFT will be subject of future posts (and maybe a description of how to perform the DFT efficiently, an algorithm called the Fast Fourier Transform).

Friday, November 6, 2009

Generating Functions and Regular Languages (i)

Combinatorics is, relatively speaking, a new field of mathematics. And as with any new field of study, it still remains very unorganized. How unorganized? There are many many topics, subtopics, side-topics, stand-alone problems and ungeneralized results laying everywhere, just like my room. One can witness this very early in combinatorics. Many of the math competition problems are about counting, and every problem seems to require a clever trickery in order to solve. For this reason many mathematicians tend to frown on combinatorics as simply some random collection of brainteasers, and not a "deep" field to study. Nonetheless, I think combinatorics is a promising field. Timothy Gowers, Fields medalist, is an example of a supporter of combinatorics. In fact, his award-winning research made surprising and insightful links between problems in combinatorics and other fields of mathematics.

The topic of generating functions of sets is an example of combinatorics boiling down many counting problems. Before talking about generating functions, let's look at a motivation from computer science. There are many more motivations, but this one I think fits best my knowledge at the moment.

In computer science, one pillar of research is dedicated to formal languages (automata theory). It is clearly an important topic since it studies the representation of information and of machines that computes with it as input. This field answers questions such as computability, and gives a solid theoretical framework for actually creating good computer languages along with their compilers and parsers.

Among the many classes of languages, one is called a "regular language".

For example, $\Sigma = \{0, 1\}$ is an alphabet. $101101$ and $\epsilon$ are words over that alphabet, and $L = \{101101, \epsilon\}$ can be a language. We allow languages with infinitely many words such as $L = \{\epsilon, 0, 1, 00, 01, 10, 11, 100, 101, \cdots\}$, which is basically every single possible word from the alphabet. Notice if we took out the words with leadings zeros except the first zero, we get the language of binary numbers.

Definition: A regular language $L$ is a set of words $w$ which are finite strings of symbols over a finite alphabet $\Sigma$. A word can be the empty word, which we denote $\epsilon$. A language can contain uniquely the empty word $L = \{\epsilon\}$, or can contain nothing, the empty language $L = \emptyset$.

A language $L$ can be formed from other languages say $L_a$ and $L_b$ from three rules.

Union: $L = L_a \cup L_b$, where $L = \{a | a \in L_a$or$a \in L_b\}$.

Concatenation: $L = L_aL_b$, where $L = \{ab | a \in L_a, b \in L_b\}$

Repetition: $L = \{L_a\}^{\star}$, where $L = \{\epsilon \cup L_a \cup L_aL_a \cup \cdots\}$

The base cases are sets containing one element in the alphabet.

For example, $\{0, 1\}^{\star}$ is the set of all binary strings (0's and 1's, whereas $\{0, 11\}^{\star}$ is the set of all binary strings with even length blocks of 1's.

[to be cont'd...]