Job Description

Required

  • Solid with Algorithms, Data Structures, Asymptotic Analysis
  • Search Engine Internals:
    • (Lucene) Text Analysis pipeline
      • Stemming, Lemmatisation, Shingles, Payloads
    • Index Format
      • Field types, storage formats and implications
      • Term Dictionary
      • Posting List
      • Field data
      • Index Files
      • Compression
    • Query Parsing, Query Processing Logic
      • Span/Phrase queries
      • Fuzzy Queries
      • Finite State Transducers
      • Matching/Filtering
      • Scoring: BoolQuery, DisMax, TF-IDF, Vector Space, DefaultSimilarity
      • Collectors
    • Understanding of performance and relevance tradeoffs between index-time and query-time processing
    • Lucene Plugin Points
  • Distributed Computing
    • CAP
    • Sharding & Replication
    • Consensus, Consistency models
    • SolrCloud/Elasticsearch
  • JVM, Java/Scala
  • 1+ years of dedicated Search Engine coding experience / working inside and directly with Lucene
  • Self-written, working search code in production

 

Desired

  • Text Analytics / Feature Engineering / Sparsity / IR Concepts / Machine Learning
    • vector space similarity, vector / matrix math (linear algebra)
    • probability-distributions
    • n-grams, Markov models
    • dimensionality reduction
    • text classification
    • clustering and other unsupervised techniques
    • feature engineering for improving search relevance
    • learning to rank / MLIR
  • Computer Science
    • Graph algorithms
    • Approximation Algorithms
    • Randomized Algorithms
    • Computational Geometry / Convex Optimization *
    • Complexity, NP-Completeness
  • IIIT Hyderabad: computational linguistics / KDD groups
 
Get Jobs Like This By EmailEmail This Job To A Friend

Contact

Kanika Aggarwal
+91 (0)12 0412 5927
India
 
×