Role

The AI Engineer will design and build scalable, low-latency AI inference microservices and production pipelines for video understanding and LLMs. The role emphasizes engineering-first model deployment, throughput and cost-efficiency, and converting experimental research into stable production features.

Key Responsibilities

Architect and implement scalable, low-latency inference microservices for high-volume video processing.
Build production pipelines integrating Video Understanding models and LLMs, focusing on throughput, cost, and backend integration.
Ensure high-standard “vibe coding”: use AI-assisted tools (e.g., Cursor, Copilot) while producing modular, type-safe, and well-tested code.
Profile and optimize Python/C++ code and model inference (quantization, batching, caching) to reduce GPU costs and latency.
Collaborate across teams to deploy models, integrate with backend services, and maintain operational reliability.
Conduct R&D on LLMs and multimodal models and rapidly refactor experimental prototypes into production-ready systems.

Requirements

Bachelor’s degree or above in Computer Science or a related field.
3+ years of relevant work experience (strong interns/new graduates with solid project experience considered).
Strong system design sense, including distributed systems, API design (REST/gRPC), asynchronous processing, and database interactions.
Fluent in Python (C++ or JavaScript is a plus).
Ability to write clean, SOLID, testable code.
Proficiency with Docker/containerization and CI/CD workflows.
Proficient with PyTorch or TensorFlow.
Familiarity with model serving frameworks (e.g., vLLM, TGI, Triton) and ONNX.
Experience in at least one of: Video Understanding/Computer Vision, LLM fine-tuning/RAG systems, or backend systems for AI (FastAPI, vector DBs, microservices).
Strong communication, self-motivation, and ownership.

Preferred (Bonus Points)

Full-stack AI experience end-to-end from prompt engineering to API deployment and DB schema design.
Inference optimization experience (TensorRT, quantization methods like AWQ/GPTQ, FlashAttention).
Experience managing vector stores at scale (Pinecone, Milvus, Weaviate).
Experience building APIs/services or open-source tools with ChatGPT/OpenAI APIs.
Publications or projects in top-tier conferences (ACL, CVPR, NeurIPS, etc.).

Compensation

Compensation Range: $142K - $300K

AI Engineer

About the Role

Job Description

Role

Key Responsibilities

Requirements

Preferred (Bonus Points)

Compensation

Tech Stack

Skills

Experience Level

Salary

Employment Type