집단 지성 프로그래밍에 관한 책

한글로 책 제목을 직역하니 거창한 제목이 나오는군. ㅋㅋ

요즘 책을 많이 읽고 있기는 하지만 그닥 이 블로그에다 소개를 하지는 않고 있다. 뭔가 삘이 안온다고 해야하나?

그러다 엄청 삘을 받는 책을 발견했다.

Programming Collective Intelligence: Building Smart Web 2.0 Applications

오라일리에서 이런 책이 나오다니… 정말 대단하다.

실제로 학교에서 배우는 Machine Learning이나 Data Mining이 약간은 현실과 동떨어진 학술적인 느낌으로 여태 다가왔다면 이 책이 그 개념들을 현실과 매칭을 시켜 줄 것이라 생각한다.

이런 책이 나오길 얼마나 기다렸는가?

제목을 보면 혹여 웹 프로그램관련된 책이 아닐까 하지만 절대 그런 책은 아니다.
방금 사파리북에서 책의 내용을 봤는데 예를 들어 어떤 인터넷 구매 사이트에서 사용자의 구매 로그를 활용해 물건을 추천하는 시스템을 만들고 싶을때 그런 어플리케이션의 로직을 꾸밀때 쓰일 수 있는 책이다. 그러니까 최종적인 어플리케이션의 형태만을 지칭하는 제목이다.(웹이 요즘 대세니…)

사실 이런 내용을 심도깊게 알기 위해서는 수학에 대한 이해가 필수인데, 책내용에서는 수학 수식을 찾아보기 힘들었고 다만 여러 알고리즘들을 그림으로 개념적인 설명을 하고 넘어가고 있다. 그래서 이런 알고리즘을 실무에 쓸때 깊이 있게 사용하기에는 이 책만으로는 부족하리라 생각되지만 처음 이런 알고리즘을 접근할때 본책을 사용하는것은 꽤 괜찮은 선택이라 생각한다.

아래는 목차이다.

[#M_목차보기 |목차닫기|

Praise for Programming Collective Intelligence

Copyright
preface

Chapter 1. Introduction to Collective Intelligence
Section 1.1. What Is Collective Intelligence?
Section 1.2. What Is Machine Learning?
Section 1.3. Limits of Machine Learning
Section 1.4. Real-Life Examples
Section 1.5. Other Uses for Learning Algorithms
Chapter 2. Making Recommendations
Section 2.1. Collaborative Filtering
Section 2.2. Collecting Preferences
Section 2.3. Finding Similar Users
Section 2.4. Recommending Items
Section 2.5. Matching Products
Section 2.6. Building a del.icio.us Link Recommender
Section 2.7. Item-Based Filtering
Section 2.8. Using the MovieLens Dataset
Section 2.9. User-Based or Item-Based Filtering?
Section 2.10. Exercises
Chapter 3. Discovering Groups
Section 3.1. Supervised versus Unsupervised Learning
Section 3.2. Word Vectors
Section 3.3. Hierarchical Clustering
Section 3.4. Drawing the Dendrogram
Section 3.5. Column Clustering
Section 3.6. K-Means Clustering
Section 3.7. Clusters of Preferences
Section 3.8. Viewing Data in Two Dimensions
Section 3.9. Other Things to Cluster
Section 3.10. Exercises
Chapter 4. Searching and Ranking
Section 4.1. What’s in a Search Engine?
Section 4.2. A Simple Crawler
Section 4.3. Building the Index
Section 4.4. Querying
Section 4.5. Content-Based Ranking
Section 4.6. Using Inbound Links
Section 4.7. Learning from Clicks
Section 4.8. Exercises
Chapter 5. Optimization
Section 5.1. Group Travel
Section 5.2. Representing Solutions
Section 5.3. The Cost Function
Section 5.4. Random Searching
Section 5.5. Hill Climbing
Section 5.6. Simulated Annealing
Section 5.7. Genetic Algorithms
Section 5.8. Real Flight Searches
Section 5.9. Optimizing for Preferences
Section 5.10. Network Visualization
Section 5.11. Other Possibilities
Section 5.12. Exercises
Chapter 6. Document Filtering
Section 6.1. Filtering Spam
Section 6.2. Documents and Words
Section 6.3. Training the Classifier
Section 6.4. Calculating Probabilities
Section 6.5. A Naïve Classifier
Section 6.6. The Fisher Method
Section 6.7. Persisting the Trained Classifiers
Section 6.8. Filtering Blog Feeds
Section 6.9. Improving Feature Detection
Section 6.10. Using Akismet
Section 6.11. Alternative Methods
Section 6.12. Exercises
Chapter 7. Modeling with Decision Trees
Section 7.1. Predicting Signups
Section 7.2. Introducing Decision Trees
Section 7.3. Training the Tree
Section 7.4. Choosing the Best Split
Section 7.5. Recursive Tree Building
Section 7.6. Displaying the Tree
Section 7.7. Classifying New Observations
Section 7.8. Pruning the Tree
Section 7.9. Dealing with Missing Data
Section 7.10. Dealing with Numerical Outcomes
Section 7.11. Modeling Home Prices
Section 7.12. Modeling “Hotness”
Section 7.13. When to Use Decision Trees
Section 7.14. Exercises
Chapter 8. Building Price Models
Section 8.1. Building a Sample Dataset
Section 8.2. k-Nearest Neighbors
Section 8.3. Weighted Neighbors
Section 8.4. Cross-Validation
Section 8.5. Heterogeneous Variables
Section 8.6. Optimizing the Scale
Section 8.7. Uneven Distributions
Section 8.8. Using Real Data—the eBay API
Section 8.9. When to Use k-Nearest Neighbors
Section 8.10. Exercises
Chapter 9. Advanced Classification: Kernel Methods and SVMs
Section 9.1. Matchmaker Dataset
Section 9.2. Difficulties with the Data
Section 9.3. Basic Linear Classification
Section 9.4. Categorical Features
Section 9.5. Scaling the Data
Section 9.6. Understanding Kernel Methods
Section 9.7. Support-Vector Machines
Section 9.8. Using LIBSVM
Section 9.9. Matching on Facebook
Section 9.10. Exercises
Chapter 10. Finding Independent Features
Section 10.1. A Corpus of News
Section 10.2. Previous Approaches
Section 10.3. Non-Negative Matrix Factorization
Section 10.4. Displaying the Results
Section 10.5. Using Stock Market Data
Section 10.6. Exercises
Chapter 11. EVOLVING INTELLIGENCE
Section 11.1. What Is Genetic Programming?
Section 11.2. Programs As Trees
Section 11.3. Creating the Initial Population
Section 11.4. Testing a Solution
Section 11.5. Mutating Programs
Section 11.6. Crossover
Section 11.7. Building the Environment
Section 11.8. A Simple Game
Section 11.9. Further Possibilities
Section 11.10. Exercises
Chapter 12. Algorithm Summary
Section 12.1. Bayesian Classifier
Section 12.2. Decision Tree Classifier
Section 12.3. Neural Networks
Section 12.4. Support-Vector Machines
Section 12.5. k-Nearest Neighbors
Section 12.6. Clustering
Section 12.7. Multidimensional Scaling
Section 12.8. Non-Negative Matrix Factorization
Section 12.9. Optimization
Appendix A. Third-Party Libraries
Section A.1. Universal Feed Parser
Section A.2. Python Imaging Library
Section A.3. Beautiful Soup
Section A.4. pysqlite
Section A.5. NumPy
Section A.6. matplotlib
Section A.7. pydelicious
Appendix B. Mathematical Formulas
Section B.1. Euclidean Distance
Section B.2. Pearson Correlation Coefficient
Section B.3. Weighted Mean
Section B.4. Tanimoto Coefficient
Section B.5. Conditional Probability
Section B.6. Gini Impurity
Section B.7. Entropy
Section B.8. Variance
Section B.9. Gaussian Function
Section B.10. Dot-Products
Colophon
Index

_M#]

목차를 보면 Python 라이브러리가 있는것이 눈에 띄는데 이 책의 전반적인 코드가 Python으로 짜여져 있는것을 확인 할 수 있다.
예전에 Python 모임을 하면서 Perky님께서 생물 분류쪽을 할때 Python을 굉장히 많이 쓴다는 말씀을 하셨는데 역시나 이 책에서 모든 코드를 Python으로 제공하고 있다는것을 알았다. 물론 Machine Learning 관련 라이브러리가 Python 만큼 활발하게 개발된 언어가 없어서도 그럴것이지만 코드 자체로 sudo pseudo 코드가 되는 Python이 쓰인다니 반가울 수밖에 없다.

from future import dream

당신의 나의 뜨거운 감자!

관련