← Back to Jobs
OpusClip logo

AI Engineer

OpusClip
👥51-200
AI/ML & Data
Palo Alto, CA
$142k - $300k
1 week ago
🤖 AI-First🛠️ Cursor-friendly🚀 Startup💻 Open Source
Apply →

Explicitly requires vibe coding skills — uses Cursor and Copilot and expects AI-assisted code to meet production-quality engineering standards.

About the Role

Design and build scalable, low-latency AI inference microservices for high-volume video processing and deploy production pipelines for Video Understanding and LLMs. Focus on throughput, cost-efficiency, performance optimization, and turning R&D into production-ready features while maintaining high engineering standards for AI-assisted code.

Job Description

Role

The AI Engineer will design and build scalable, low-latency AI inference microservices and production pipelines for video understanding and LLMs. The role emphasizes engineering-first model deployment, throughput and cost-efficiency, and converting experimental research into stable production features.

Key Responsibilities

  • Architect and implement scalable, low-latency inference microservices for high-volume video processing.
  • Build production pipelines integrating Video Understanding models and LLMs, focusing on throughput, cost, and backend integration.
  • Ensure high-standard “vibe coding”: use AI-assisted tools (e.g., Cursor, Copilot) while producing modular, type-safe, and well-tested code.
  • Profile and optimize Python/C++ code and model inference (quantization, batching, caching) to reduce GPU costs and latency.
  • Collaborate across teams to deploy models, integrate with backend services, and maintain operational reliability.
  • Conduct R&D on LLMs and multimodal models and rapidly refactor experimental prototypes into production-ready systems.

Requirements

  • Bachelor’s degree or above in Computer Science or a related field.
  • 3+ years of relevant work experience (strong interns/new graduates with solid project experience considered).
  • Strong system design sense, including distributed systems, API design (REST/gRPC), asynchronous processing, and database interactions.
  • Fluent in Python (C++ or JavaScript is a plus).
  • Ability to write clean, SOLID, testable code.
  • Proficiency with Docker/containerization and CI/CD workflows.
  • Proficient with PyTorch or TensorFlow.
  • Familiarity with model serving frameworks (e.g., vLLM, TGI, Triton) and ONNX.
  • Experience in at least one of: Video Understanding/Computer Vision, LLM fine-tuning/RAG systems, or backend systems for AI (FastAPI, vector DBs, microservices).
  • Strong communication, self-motivation, and ownership.

Preferred (Bonus Points)

  • Full-stack AI experience end-to-end from prompt engineering to API deployment and DB schema design.
  • Inference optimization experience (TensorRT, quantization methods like AWQ/GPTQ, FlashAttention).
  • Experience managing vector stores at scale (Pinecone, Milvus, Weaviate).
  • Experience building APIs/services or open-source tools with ChatGPT/OpenAI APIs.
  • Publications or projects in top-tier conferences (ACL, CVPR, NeurIPS, etc.).

Compensation

Compensation Range: $142K - $300K

Tech Stack

PythonC++JavaScriptPyTorchTensorFlowvLLMTGITritonONNXCeleryRedisDockerFastAPIPineconeMilvusWeaviateTensorRTAWQGPTQFlashAttentionChatGPT/OpenAI APIsRESTgRPCCI/CDCursorCopilotVector DBsMicroservices

Skills

System DesignAPI DesignDistributed SystemsAsynchronous ProcessingDatabase InteractionsSoftware TestingPerformance OptimizationContainerizationCI/CDResearch to ProductionCommunicationOwnershipPrompt EngineeringModel Serving

Experience Level

Mid

Salary

USD 142,000 - 300,000/year

Employment Type

Full-time