AI Engineer
Explicitly requires vibe coding skills — uses Cursor and Copilot and expects AI-assisted code to meet production-quality engineering standards.
About the Role
Design and build scalable, low-latency AI inference microservices for high-volume video processing and deploy production pipelines for Video Understanding and LLMs. Focus on throughput, cost-efficiency, performance optimization, and turning R&D into production-ready features while maintaining high engineering standards for AI-assisted code.
Job Description
Role
The AI Engineer will design and build scalable, low-latency AI inference microservices and production pipelines for video understanding and LLMs. The role emphasizes engineering-first model deployment, throughput and cost-efficiency, and converting experimental research into stable production features.
Key Responsibilities
- Architect and implement scalable, low-latency inference microservices for high-volume video processing.
- Build production pipelines integrating Video Understanding models and LLMs, focusing on throughput, cost, and backend integration.
- Ensure high-standard “vibe coding”: use AI-assisted tools (e.g., Cursor, Copilot) while producing modular, type-safe, and well-tested code.
- Profile and optimize Python/C++ code and model inference (quantization, batching, caching) to reduce GPU costs and latency.
- Collaborate across teams to deploy models, integrate with backend services, and maintain operational reliability.
- Conduct R&D on LLMs and multimodal models and rapidly refactor experimental prototypes into production-ready systems.
Requirements
- Bachelor’s degree or above in Computer Science or a related field.
- 3+ years of relevant work experience (strong interns/new graduates with solid project experience considered).
- Strong system design sense, including distributed systems, API design (REST/gRPC), asynchronous processing, and database interactions.
- Fluent in Python (C++ or JavaScript is a plus).
- Ability to write clean, SOLID, testable code.
- Proficiency with Docker/containerization and CI/CD workflows.
- Proficient with PyTorch or TensorFlow.
- Familiarity with model serving frameworks (e.g., vLLM, TGI, Triton) and ONNX.
- Experience in at least one of: Video Understanding/Computer Vision, LLM fine-tuning/RAG systems, or backend systems for AI (FastAPI, vector DBs, microservices).
- Strong communication, self-motivation, and ownership.
Preferred (Bonus Points)
- Full-stack AI experience end-to-end from prompt engineering to API deployment and DB schema design.
- Inference optimization experience (TensorRT, quantization methods like AWQ/GPTQ, FlashAttention).
- Experience managing vector stores at scale (Pinecone, Milvus, Weaviate).
- Experience building APIs/services or open-source tools with ChatGPT/OpenAI APIs.
- Publications or projects in top-tier conferences (ACL, CVPR, NeurIPS, etc.).
Compensation
Compensation Range: $142K - $300K
Tech Stack
Skills
Experience Level
Salary
USD 142,000 - 300,000/year