Senior Production Systems Engineer, Edge AI

Remote - Canada

$170,000 - $220,000 + ~0.1-0.5%

Your opportunity

Our client is a well-funded, seed-stage AI startup that builds agents for the factory floor. They develop and distribute a software-first agent layer that plugs into the cameras and machines factories already have. Their models run and act at the edge so agents can see, decide, and act in real time. Events and metrics flow into a dashboard that provides plant teams immediate visibility. They’re approaching a large (~$14B) and underserved market with a disruptive, asset-light alternative to hardware-heavy robotics and batch analytics and they’ve already found early traction with clients in the food & beverage, pharma/cosmetics, and materials processing verticals.

As a senior production systems engineer, you’ll own the control and data planes for a fleet of edge devices. You’ll ship over the air updates and versioned model releases, provision and monitor devices, instrument and stream telemetry, and backhaul prioritized data for retraining. You’ll integrate with camera and industrial equipment, enforce safe rollout/rollback policies, and lead incident response through root-cause analysis and remediation.

You’ll be joining a flat, dynamic environment in the midst of its scale-up phase that’s led by an accomplished ex-Deepmind researcher with specialization in reinforcement learning, deep learning and robotics. The company closed a $13.9M CAD seed round in March of 2025 and are scaling R&D and delivery to meet accelerating demand, with headcount tracking to double by year-end.

Please note that this role involves occasional travel to client sites across Canada and the US.

Key responsibilities 

  • Systems architecture & reliability: Design and implement fault-tolerant services on Linux across on-prem and cloud environments; apply resilience patterns (circuit breakers, retries, bulkheads, failover) to meet deterministic SLOs under real-world load

  • Real-time performance & optimization: Profile hotspots, tune latency and throughput, and optimize concurrency, memory, and I/O paths to sustain strict timing constraints for production workloads

  • Edge integration & platform engineering: Integrate software with embedded/edge hardware and interfaces; package and deploy services via containers to bare metal/VMs/Kubernetes while accounting for resource constraints at the edge

  • Observability, on-call & continuous improvement: Instrument systems with metrics, tracing, and logs (Prometheus/Grafana/ELK); define actionable alerts, lead incident response and post-mortems, and convert findings into reliability upgrades

  • CI/CD, testing & secure delivery: Own unit/integration test strategy and automated pipelines; enforce secure coding practices, guardrails, and reviews that keep releases fast, stable, and auditable

Tech stack

  • Operating system: Linux

  • Orchestration & compute: Kubernetes, on-prem bare metal, VMs

  • Cloud providers: AWS, Azure, GCP

  • Containers: Docker

  • Monitoring, observability & logging: Prometheus, Grafana, ELK

  • Messaging & IoT: MQTT, HTTP/REST, RabbitMQ, Apache Kafka

  • Edge platforms: NVIDIA Jetson, Raspberry Pi (ARM)

  • Cameras & vision I/O: GenICam, GigE Vision, USB3 Vision

  • Industrial automation: PLC integration; protocols: Ethernet/IP, Modbus, Profinet, OPC UA

  • Backend: Python (Flask, FastAPI), TypeScript/Node.js

  • Frontend: TypeScript/React

  • Databases & storage: SQL, InfluxDB, MongoDB

  • Scientific computing: NumPy, Pandas

  • Computer vision: OpenCV

  • GPU/acceleration: CUDA, TensorRT, ONNX, OpenVINO

  • ML/DL frameworks: PyTorch, TensorFlow, Keras, scikit-learn

Your know-how

  • You have 3+ years of experience designing and operating scaled production environments for manufacturing, robotics, IoT and/or industrial automation applications

  • You have a software engineering skillset and a fantastic command of C/C++, Python or TypeScript

  • You have experience building latency-sensitive deterministic systems

  • You have experience with monitoring, observability and alerting stacks and best practices

  • You have experience integrating software with embedded/edge hardware

  • You have experience collaborating effectively within and across cross-functional delivery teams

  • You are a contagiously curious person with entrenched learning habits

It’s a bonus if

  • You are predisposed to mentorship and crafting a culture of continuous improvement

  • You have deep expertise in computer vision, robotics, or manufacturing automation

  • You have production experience deploying AI models at the edge

  • You have experience scaling an AI and/or B2B SaaS venture

  • You have an academic research background in machine learning, computer vision, and/or artificial intelligence (likely, but not necessarily, reflected in a graduate degree in these fields)

Interested in learning more?

Please upload your resume or a .pdf export of your LinkedIn profile using the following “Apply Now” button, or send your resume or LinkedIn profile URL to talent@lutrapartners.com with “Senior Production Systems Engineer, Edge AI” as the subject line. One of our talent partners will be in contact shortly.

Apply Now