Search quality in practice

05/27/2014 - 15:20 to 16:00
long talk (40 min)

Session abstract: 

  1. Understanding search quality: relevancy, snippets, user interface(?)
  2. How to measure search quality: metrics, comparison of two search systems request-by-request, classic evaluation of top N, by-pair evaluation with swiss system. The cheapest way.
  3. Examples of search quality problems.
  4. Production system. Which data available: clicks, queries, shows in SERPs.
  5. Text relevancy ranking: different approaches, absence of silver bullet. BM25, tf*idf, using hits of different types, using language models, quorum, words proximity in query and document.
  6. How to effectively mix all signals: manual linear model, polynomial model, gradient decision trees, known implementations. Where to get labels?
  7. Doing snippets well: candidates labeling, blind test, infrastructure for candidates ranking, features examples, infrastructure for candidates features and ranking, features examples,
  8. How to measure search quality using clicks?
  9. Other signals: comments, likes.
  10. Example project: Filesystem path classifier based on search results.