Difference between revisions of "Main Page"

From LQ's wiki
Jump to: navigation, search
Line 22: Line 22:
 
This is a project I came up with when I had about two weeks of free time during Christmas break back in 2013. I felt that my JavaScript was getting a bit rusty and I wanted to explore something related to HTML5, so after looking at the emerging technologies for a while, I settled on a project that explores music technology for the web.
 
This is a project I came up with when I had about two weeks of free time during Christmas break back in 2013. I felt that my JavaScript was getting a bit rusty and I wanted to explore something related to HTML5, so after looking at the emerging technologies for a while, I settled on a project that explores music technology for the web.
  
Progress on the project is now frozen partly due to the scope of the project being too big for two weeks (I couldn't gauge how much work it was going to be as the technology is new to me), and partly due to poor scaling of a DOM-based UI. As it stands, it has a reasonable suite of instruments, speed control, equalizers for individual instruments, an undo/redo stack (which was non-trivial to implement for a program like this), file import/export, and most importantly, you can sequence simple music with it!
+
Progress on the project is now frozen partly due to the scope of the project being too big for two weeks (I couldn't gauge how much work it was going to be as the technology is new to me), and partly due to poor scaling of a DOM-based UI. As it stands, it has a reasonable suite of musical instruments, speed control, equalizers for individual instruments, an undo/redo stack (which was non-trivial to implement for a program like this), file import/export, and most importantly, you can sequence simple music with it!
  
*For the developer's diary, see [[Dev:SynthJS]].
+
* For the developer's diary, see [[Dev:SynthJS]].
*To play with it, [http://lqkhoo.com/synthjs click here]. Have fun!
+
* To play with it, [http://lqkhoo.com/synthjs click here]. Have fun!
 
==Internships==
 
==Internships==
  
 
===Microsoft Research Cambridge===
 
===Microsoft Research Cambridge===
 +
I joined Microsoft for 8 weeks through the Bright Minds Internship Competition programme for undergraduates. I was supervised by [http://research.microsoft.com/en-us/um/people/pkohli/ Pushmeet Kohli] and [http://research.microsoft.com/en-us/people/yobach/ Yoram Bachrach].
 +
 +
  
 
===UniEntry===
 
===UniEntry===
 +
UniEntry is a startup company which I worked at in summer 2013. It aims to better inform students to when picking UK universities by aggregating user ratings and information from the Higher Education Statistics Agency, and recommend a good spread of choices based on the student's grades. It is founded as a part time venture by two individuals, who hired myself and another developer to develop a pilot site over the summer.
 +
 +
The pilot site is supposed to be used as a proof of concept to get schools to be involved and to look for funding. We used the agile development model, so we had daily stand-ups, sprint planning, progress burndown charting, the works. The site was developed in ASP.NET, and I was the primary front-end developer and designer. After the initial three months, I maintained contact with the company and submitted bugfixes when requested.
  
 
===Other===
 
===Other===
My other experiences are related to my brief stint in medical school, rather than computer science.
+
My other experiences are related to my brief stint in medical school rather than computer science, but I consider them to be valuable and one-of-a-kind experiences.
* Work shadowing in van Andel Institute, Michigan, USA. I generally observed the activity within a biomedical research lab - automatic sequencing, running DNA microarrays etc.
+
* Work shadowing in van Andel Institute, Michigan, USA. I generally observed the environment and working atmosphere within a biomedical research lab. I learned about what the researchers do on a regular basis, things like automatic sequencing, running DNA microarrays etc. I learned how they made knockout mice (mice with certain genes deactivated) to study its effects, and how they highlight sections of DNA by binding highly specific fluorescent molecules to them, using techniques (with really fancy names) like spectral karyotyping and fluorescent in-situ hybridization. Looking back at my experiences in the lab, it always reminds me of how different the nature of the work in the various fields of science can be.
* In Malaysia, I had a work placement in a hospital's critical care unit and department of anaesthesia, and then later on, in the Department of Public Health of Penang.
+
* In Malaysia, I had a work placement in a hospital's critical care unit and department of anaesthesia. I remember how meticulous everything was - the cleanliness precautions we had to take, the nurses charting the patient's statistics every few hours, etc. Later on, I was attached with the Department of Public Health of Penang, and we went off checking the safety of water supplies and fogging areas with reported cases of dengue fever.
  
 
==Research==
 
==Research==
Line 48: Line 54:
  
 
===SmartFence===
 
===SmartFence===
*Please see [[#Microsoft Research Cambridge]]
+
* Please see [[#Microsoft Research Cambridge]]
  
 
===Task Identification using Search Engine Query Logs===
 
===Task Identification using Search Engine Query Logs===
Line 58: Line 64:
 
We ran into problems of ambiguity - for example, <tt>Java</tt> may mean the programming language, or the place in Indonesia, or a dozen other things. We disambiguate by comparing the number of similar classes terms belong to. For example, if a search session contains the terms <tt>Scala</tt> and <tt>Java</tt>, we can be sure that <tt>Java</tt> means the programming language. We ended up discarding many sessions which did not give us enough data to disambiguate, and we didn't have enough data in the end to populate the tree beyond the first 3 to 4 layers. We were extremely time-constrained (3 months) so we couldn't refine our methods to improve the results, but for our efforts, the project was awarded best research project in our year.
 
We ran into problems of ambiguity - for example, <tt>Java</tt> may mean the programming language, or the place in Indonesia, or a dozen other things. We disambiguate by comparing the number of similar classes terms belong to. For example, if a search session contains the terms <tt>Scala</tt> and <tt>Java</tt>, we can be sure that <tt>Java</tt> means the programming language. We ended up discarding many sessions which did not give us enough data to disambiguate, and we didn't have enough data in the end to populate the tree beyond the first 3 to 4 layers. We were extremely time-constrained (3 months) so we couldn't refine our methods to improve the results, but for our efforts, the project was awarded best research project in our year.
  
*[[Task Identification Using Search Engine Query Logs|Task Identification Using Search Engine Query Logs (Lit review coursework)]]
+
* [[Task Identification Using Search Engine Query Logs|Task Identification Using Search Engine Query Logs (Lit review coursework)]]
*[[:File:Task Identification Using Search Engine Query Logs - Report.pdf|Task Identification Using Search Engine Query Logs (Results report)]]
+
* [[:File:Task Identification Using Search Engine Query Logs - Report.pdf|Task Identification Using Search Engine Query Logs (Results report)]]
  
 
==Pastimes==
 
==Pastimes==

Revision as of 15:18, 4 October 2014

Page currently under reconstruction (4th October 2014). Expected to finish in several hours. Please check back later :)


Welcome!

I'm Li, a 4th year student at University College London currently working on an MEng in Computer Science. I'm most interested in applications of machine learning to large data sets. I haven't decided on a specific research area, primarily because I don't think I've seen enough of the field yet. However, my current interests slant towards applying machine learning to areas related to data mining, semantic computation, and natural language processing. The data I've worked with in the past are web-based (AOL search logs, Bing session data, mined Twitter data, YAGO2).

Why computer science? At first, I entered the field because I love building things. Stacks. Factories. Interfaces. Semaphores. Software are teeming cities running like clockwork on top of layers and layers of abstraction. I thought that I wanted to be a developer for sure, but then I began to see some really interesting problems and approaches to solving them in the field, so I focused my efforts on research too. Computer science (and AI / machine learning) is very much in the middle of interdisciplinary research, and I think this is where the most exciting things are happening. Before this, I was a medical student in Imperial College London - I left after two years - but that's a story for another time ;)

+ For people unfamiliar with computer science, machine learning really is just pattern recognition. If you can reduce a problem to a pattern recognition problem, then you can apply machine learning to solve it. It is a powerful technique that we can use to try and find features / trends / patterns hidden within huge amounts of data (DNA, stock ticks, the internet), or to classify that data into different categories (think algorithm that recognizes faces, road signs, or system intrusions based on anomalous behaviour patterns).

Resume

Projects

SynthJS

Synthjs-scrshot-03.png

This is a project I came up with when I had about two weeks of free time during Christmas break back in 2013. I felt that my JavaScript was getting a bit rusty and I wanted to explore something related to HTML5, so after looking at the emerging technologies for a while, I settled on a project that explores music technology for the web.

Progress on the project is now frozen partly due to the scope of the project being too big for two weeks (I couldn't gauge how much work it was going to be as the technology is new to me), and partly due to poor scaling of a DOM-based UI. As it stands, it has a reasonable suite of musical instruments, speed control, equalizers for individual instruments, an undo/redo stack (which was non-trivial to implement for a program like this), file import/export, and most importantly, you can sequence simple music with it!

Internships

Microsoft Research Cambridge

I joined Microsoft for 8 weeks through the Bright Minds Internship Competition programme for undergraduates. I was supervised by Pushmeet Kohli and Yoram Bachrach.


UniEntry

UniEntry is a startup company which I worked at in summer 2013. It aims to better inform students to when picking UK universities by aggregating user ratings and information from the Higher Education Statistics Agency, and recommend a good spread of choices based on the student's grades. It is founded as a part time venture by two individuals, who hired myself and another developer to develop a pilot site over the summer.

The pilot site is supposed to be used as a proof of concept to get schools to be involved and to look for funding. We used the agile development model, so we had daily stand-ups, sprint planning, progress burndown charting, the works. The site was developed in ASP.NET, and I was the primary front-end developer and designer. After the initial three months, I maintained contact with the company and submitted bugfixes when requested.

Other

My other experiences are related to my brief stint in medical school rather than computer science, but I consider them to be valuable and one-of-a-kind experiences.

  • Work shadowing in van Andel Institute, Michigan, USA. I generally observed the environment and working atmosphere within a biomedical research lab. I learned about what the researchers do on a regular basis, things like automatic sequencing, running DNA microarrays etc. I learned how they made knockout mice (mice with certain genes deactivated) to study its effects, and how they highlight sections of DNA by binding highly specific fluorescent molecules to them, using techniques (with really fancy names) like spectral karyotyping and fluorescent in-situ hybridization. Looking back at my experiences in the lab, it always reminds me of how different the nature of the work in the various fields of science can be.
  • In Malaysia, I had a work placement in a hospital's critical care unit and department of anaesthesia. I remember how meticulous everything was - the cleanliness precautions we had to take, the nurses charting the patient's statistics every few hours, etc. Later on, I was attached with the Department of Public Health of Penang, and we went off checking the safety of water supplies and fogging areas with reported cases of dengue fever.

Research

I was fortunate enough to have the opportunity to be involved in several short-term research projects (2 months - 6 months) during my undergraduate years. Generally, internship opportunities for undergraduate students in the UK tend to be limited to development work.

Big Five Personality Classification of Twitter Profile by Machine Learning

This is the title for my Masters dissertation. At the time of writing, I've just begun to work on it, so everything is still highly tentative. Supervisor: Emine Yilmaz. Personal tutor: Dr. Kevin Bryson

By mining the text corpus of individual Twitter profiles, we hope to classify the user in the five categories of the Big Five model. We plan to do so by identifying adjectives in them labeled with a "weight" towards one end of each category. Such labels can be found from the seminal Allport-Odbert 1936 list and in similar works. We are scoping the project to only consider Twitter profiles in English.

We hope that the findings form a basis for further research into identifying individuals with potential signs of depression based on their Twitter activity. Depending on the speed of progress, we might have some time to consider this part of the problem.

SmartFence

Task Identification using Search Engine Query Logs

Root node of tree. Only selected subclass nodes (blue / red) are displayed. Orange nodes are the entities most often searched for in a class. Green nodes are the most frequently searched-for strings. Visualized using D3.js

This was my undergraduate university-based research project. The goal is to find out what users are most interested about when they search for a certain class of things. For example, using Google's related searches, if you search for Hawaii, it gives you results along the lines of Hotels in Hawaii, or Flights to Hawaii. Based on our understanding of how Google's system works, these are the most common strings appearing with Hawaii in searches.

We wanted to go one step further. By using a knowledge base like YAGO and the set of AOL logs leaked in 1997, we do the same, but with semantics. For example, Hawaii would be determined to be a place. Using relations like these (Hawaii {hasClass} place), we build up a tree of classes - Root --> Organism --> Human --> Artist --> Musician --> Singer --> Michael Jackson, for example, and we aggregate the related search strings using this tree. Hence, by querying nodes of this tree, we can find out the most popular entries when searching for a human being, for instance. I thought this was an extremely interesting problem to tackle.

We ran into problems of ambiguity - for example, Java may mean the programming language, or the place in Indonesia, or a dozen other things. We disambiguate by comparing the number of similar classes terms belong to. For example, if a search session contains the terms Scala and Java, we can be sure that Java means the programming language. We ended up discarding many sessions which did not give us enough data to disambiguate, and we didn't have enough data in the end to populate the tree beyond the first 3 to 4 layers. We were extremely time-constrained (3 months) so we couldn't refine our methods to improve the results, but for our efforts, the project was awarded best research project in our year.

Pastimes