Some of my favorite writings on career
My Computer
So I finally got my computer assembled. That was a pretty large investment (yes, an investment, I am hoping for it to pay off). I should point out that the direct monetary investment is about comparable to the time investment. I probably spent like 30 hours choosing parts, look for benchmarks and make up my mind. Due to inexperience, it probably took another 10 hours to assemble and install software. So that’s one whole work week spent on this computer. But it was a fun experience and I would say a must-go-through-process for any computer person. I hope the fact that I can do things faster and the CUDA power will eventually pay off this investment.
On the software side, I am doing something interesting. I installed Win7 on my SSD, then Ubuntu 10.04 on a hard drive. The interesting thing is that I can boot into Ubuntu OR virtualize that Ubuntu under Win7. The point is that I expect to do most of my core work in Ubuntu, but I want to access that machine even when I am working in windows. The truth is that you have to give up a lot to live without Windows and Linux distributions are still far away in terms of device support, ease of use etc. and of course, there is no reason to switch from Windows to Mac even with the legal issues of running OSX on a PC aside.
Prize
Recently I was invited to attend the year end celebration of the math department. It was a celebration in which I knew nobody and the average age in the celebration is probably twice of mine. I’m still happy since I got a chapters gift card as a prize for the undergrad math contest. I thought since it is a prize awarded by the math department, it should contribute to math, so I used it to buy the Princeton Companion to Mathematics. It is a thick and beautiful book filled with introductory articles. In my studies, I always come across concepts that I’m not familiar with. Nowadays I can go to Wikipedia or any of the myriads of online resources, which of course helps a lot. They will give me a definition and maybe a few examples. But to someone quite new to the idea, giving a potentially complicated definition does not help much. It is often helpful and more efficient to get an idea of the big picture first before filling all the details. Even if I understand a complicated definition, what I really want to know is why is this concept important, what can I do with it, and what is the intuition behind the formalism? From the few articles I read so far, this book does a good job of addressing this gap of the missing intuition in online articles and other reference works.
I just read the section on computational number theory. It gives a probabilistic heuristics of why the Fermat theorem is true. I then looked up ergodic theorems since I came across it in some papers I read, I went to the wikipedia article a few times and I could not get too much from it but the short article in this book actually gives me an idea of what it is. From now on, when I come across the need to know some new math, I will look at this book first to get a big picture and then dive into the details if necessary.
New computer
So I just bought a computer from newegg.ca. It is my first time shopping there, it is also my biggest purchase.
From planning to comparing around I probably spent 20-30 hours. and I spent 80 hours worth of my wage.
I have not been excited about any purchases for a while, but this time I am pretty excited. It feels like when I got my lego set after saving coins for 8 months back when I was 9 years old. Now I find myself habitually looking at deals at newegg and browsing for benchmarks, bad habits.
I am sometimes worried about the drain on my time, I would rather spend this time doing useful work, read papers, etc. Afterall, this computer is for my research. and I am hopeful that it will help in the long run despite being a time sink in the short run.
Bruno Olshausen
With an over-complete set of basis vectors and a prior over coefficients, we can try to maximize the posterior probability of coefficients (the author actually mentions MAP, so what’s the difference between ML and MAP? This is basically maximum likelihood over the posterior). We can do gradient ascent on the space of coefficients and get to a set of likely coefficients (however, gradient ascent usually does not guarantee finding the global maximum, although the author seems to imply it. Is it because the choice of convex function? I am not sure). Moreover, if the prior over coefficient is heavy tailed and peaked at 0, the resulting coefficients should be sparse.
The more interesting part is that the set of basis vectors is also learnt. The basis vectors should maximize the probability of generating images in the dataset if we draw the coefficients from the fixed prior that encourages sparseness. In reality, the author takes the difference between an actual image, and the maximum likelihood reconstruction of that image. Multiply this difference by the coefficients of the maximum likelihood reconstruction and use the product as the error signal for the basis vectors (looks like chain rule). The author might take random images in the dataset and get the error signal this way (if this is the case, then he is doing stochastic gradient ascent).
The really interesting part is the nature of these basis vectors. They are all Gabor-filter-like edge detectors. And they arise solely from using natural images and insisting on a sparse representation.
Then the author suggests measuring if real neurons sparsify their activities, and he tried this on this model. He showed that in this model, the effect of sparsifying is to suppress small coefficients and to magnify large coefficients. I do not really understand what he means by reverse-correlation and linear prediction. My guess is that he obtained the MAP coefficients for white noise images, and then he represents another image using a linear combination of these white noises images. The linear prediction (in my guess) of a coefficient is the linear combination of coefficients of the white noise images. I do not understand figure 13.4b.
Essentially the same method is then extended to video, except the basis have a time component as well. The basis vectors look even more interesting now. They are now translations of those Gabor-like filters. It is not too surprising given the insistence of sparseness, since most short interval of natural video tend to be moving in a fixed direction and a relatively constant instantaneous velocity.
leave a comment