Machine Learning

with Ed Borasky (@znmeb on twitter)

What are the benefits of machine learning?

We can let the machine do its own work. It's a hands off approach to managing.

Our interests in machine learning:

Natural Language Processing Data Hacks Genetic Algorithms K Nearest neighbor algorithms Clustering Support Vector Machines Scalability (a huge problem since some algorithms are in O(n^3) or worse time)

Projects we are working on or interested in:

Categorizing articles in RSS feeds to make a daily paper from the blogs you read. Finding new, eye opening, news sources and having them brought to us. Sentiment analysis- Commercial applications of determining if someone has a positive or negative opinion of product that they are talking about. This is a difficult problem, complicated by sarcasm and other language use factors. How can machine bridge the correlation to causality gap?

Applications to Twitter:

Latent Semantic Analysis to reduce the last 200 tweets to simple commonalities using singular value decomposition, shared subjects. We treat each persons tweets as a single document and make a matrix of the terms they used. This can be very slow in R. Bayesian classifiers to filter out annoying tweets. Every tweet is run through a constant time calculation to determine is class. RSS vs. Twitter as a data source. Blogs are more focused on specific topics.

What were the ideas behind the Netflix contest?

The benefit to Netflix is in the hundreds of millions of dollars.

Next Steps:

Come see the Write Your Own Bayesian Classifier talk by John Meleski at Open Source Bridge.

Possibility of an R language school that would meet twice. First day how to install and set up R. Second day, doing some modeling and data analysis.

Further Reading:

Toby Segaran’s book, “Programming Collective Intelligence” (O'Reilly, 2007). ?ADD that blog here?

Tools:

R has a comprehensive NLP library that allows clustering and other techniques. Python Helpers Libraries for Faster Numeric Computing: scipy numpy

b0d.txt · Last modified: 2009/05/03 03:37 by 67.168.197.55