Directly building agentic LLM systems with memory, tool integrations, and observability (uses Langfuse/LangSmith and LangGraph concepts).
About the Role
Join Joist AI as an Agentic Systems Engineer to build production-grade agentic applications that automate proposal writing for the AEC industry, focusing on multi-agent orchestration, memory layers, tool integrations, and observability. The role emphasizes Python engineering, LLM/agent concepts, production debugging, and reliable evals and tests.
Job Description
Role
Joist AI is hiring an Agentic Systems Engineer with ~2–4 years of experience to help build the next generation of agentic applications that streamline proposal writing for the AEC industry. The role centers on building modular agents, adding memory layers, wiring tool integrations, and ensuring production-quality behavior through tests, evals, and observability.
Key Responsibilities
- Build agents as modular, plug-and-play components that integrate with the broader stack.
- Implement memory layers (short-term, long-term, summarization, retrieval-backed) in running systems.
- Wire up tool integrations, MCP servers, and skills.
- Own feature quality: tests, evaluation harnesses, observability, and monitoring.
- Investigate production traces, diagnose issues, and implement fixes.
Requirements
- 2–4 years of production software engineering experience.
- Strong Python skills with emphasis on clean OOP, separation of concerns, idiomatic syntax, and meaningful tests.
- Solid understanding of agentic and LLM concepts (RAG, prompting patterns, tool use, structured outputs, streaming, context management, generative AI fundamentals).
- Experience building non-trivial projects with modern agent toolkits (side projects, prototypes, or production work).
- Ability to read and navigate unfamiliar codebases quickly.
- Attention to detail and a methodical problem-solving approach.
- Data-driven mindset: comfortable using production traces, eval numbers, and logs to inform decisions.
- Hands-on experience with LLM tracing/observability tools such as Langfuse or LangSmith.
Preferred / Nice-to-have Experience
- Search and retrieval experience: embeddings, vector databases, hybrid retrieval, and rerankers.
- End-to-end LLM evaluation experience: designing evals, choosing metrics, building harnesses, and maintaining score consistency as models/prompts evolve.
- Deep familiarity with LangGraph: building custom graphs, context-management nodes (summarizers, windowing, state pruning), and checkpointers.
Interview Process
- 30-minute introductory Zoom meeting.
- 45-minute Python / agentic coding proficiency test (2 problems: one hand-coded, one using generative AI).
- 60-minute project deep dive with a short presentation (20–25 minutes) plus Q&A.
- 45-minute interview on generative AI / LLM fundamentals.
- 30-minute culture-fit interview.