05/27/2014 - 15:20 to 16:00
long talk (40 min)
- Understanding search quality: relevancy, snippets, user interface(?)
- How to measure search quality: metrics, comparison of two search systems request-by-request, classic evaluation of top N, by-pair evaluation with swiss system. The cheapest way.
- Examples of search quality problems.
- Production system. Which data available: clicks, queries, shows in SERPs.
- Text relevancy ranking: different approaches, absence of silver bullet. BM25, tf*idf, using hits of different types, using language models, quorum, words proximity in query and document.
- How to effectively mix all signals: manual linear model, polynomial model, gradient decision trees, known implementations. Where to get labels?
- Doing snippets well: candidates labeling, blind test, infrastructure for candidates ranking, features examples, infrastructure for candidates features and ranking, features examples,
- How to measure search quality using clicks?
- Other signals: comments, likes.
- Example project: Filesystem path classifier based on search results.