# Machine Learning

**with Ed Borasky (@znmeb on twitter)**

### What are the benefits of machine learning?

We can let the machine do its own work. It's a hands off approach to managing.

### Our interests in machine learning:

Natural Language Processing Data Hacks Genetic Algorithms K Nearest neighbor algorithms Clustering Support Vector Machines Scalability (a huge problem since some algorithms are in O(n^3) or worse time)

### Projects we are working on or interested in:

Categorizing articles in RSS feeds to make a daily paper from the blogs you read. Finding new, eye opening, news sources and having them brought to us. Sentiment analysis- Commercial applications of determining if someone has a positive or negative opinion of product that they are talking about. This is a difficult problem, complicated by sarcasm and other language use factors. How can machine bridge the correlation to causality gap?

### Applications to Twitter:

Latent Semantic Analysis to reduce the last 200 tweets to simple commonalities using singular value decomposition, shared subjects. We treat each persons tweets as a single document and make a matrix of the terms they used. This can be very slow in R. Bayesian classifiers to filter out annoying tweets. Every tweet is run through a constant time calculation to determine is class. RSS vs. Twitter as a data source. Blogs are more focused on specific topics.

### What were the ideas behind the Netflix contest?

The benefit to Netflix is in the hundreds of millions of dollars.

### Next Steps:

Come see the Write Your Own Bayesian Classifier talk by John Meleski at Open Source Bridge.

Possibility of an R language school that would meet twice. First day how to install and set up R. Second day, doing some modeling and data analysis.

### Further Reading:

Toby Segaran’s book, “Programming Collective Intelligence” (O'Reilly, 2007). ?ADD that blog here?

### Tools:

R has a comprehensive NLP library that allows clustering and other techniques. Python Helpers Libraries for Faster Numeric Computing: scipy numpy