Devire

AI Performance Engineer

Full time office work in our R&D Lab
Contract of employment with annual bonus
Private healthcare package. We offer premium private healthcare package for our employees.
Sport Cards. Our employees can choose from many options within sport subscriptions and sport associations.
Benefit Platform. You can choose your benefits on our Benefit Platform e.g.: cinema/theater tickets and discounts, shopping cards and many more.
Special discounts for employees. We cooperate with various local companies to offer unique promotions only for our Employees.
Office massages. It is a 15-minute chair-based massage. The massage therapist focuses on your back, neck, shoulders and arms.

Deep understanding of GPU or NPU architecture, including execution units, memory hierarchy, interconnects, and thread scheduling, as well as performance bottleneck analysis methodologies.
Familiarity with mainstream deep learning frameworks such as PyTorch, TensorFlow, or JAX.
Hands-on experience in deep learning operator/kernel development and performance tuning, with the ability to implement and optimize complex operators.
Proficiency with performance analysis and profiling tools (e.g., Nsight Compute, nvprof, torch.profiler), and ability to conduct quantitative analysis and performance modeling.
Strong system design and software engineering skills, with the ability to balance performance, maintainability, and generality in complex systems.

Education

Master’s or Ph.D. degree in Computer Architecture, Compiler Design, High Performance Computing, or a related field.

Lead performance optimization of AI models on Ascend NPUs, including performance analysis, bottleneck identification, and optimization implementation for both training and inference workloads.
Analyze performance bottlenecks of multimodal models and large language models (LLMs) on the Ascend platform, covering operators, kernels, memory access patterns, and scheduling; design and implement optimization strategies.
Develop and optimize critical operators/kernels, continuously improving execution efficiency, memory access patterns, parallelization strategies, and hardware resource utilization.
Research and apply advanced techniques such as auto-tuning, operator fusion, graph optimization, and scheduling optimization in real-world production scenarios.
Build and lead an NPU performance optimization team; communicate findings to cross-functional teams and leadership, and contribute to the evolution of next-generation Ascend NPU architecture.

200

Jesteśmy Devire – firmą rekrutacyjną, której celem jest łączenie świetnych ludzi ze świetnymi pracodawcami.

Niezależnie czy rozglądasz się za nową pracą na stałe czy projektem w formie współpracy B2B – możesz polegać na naszym wsparciu na każdym kroku.

Współpracujemy z pracodawcami z terenu całej Polski i realizujemy rekrutacje we wszystkich kluczowych obszarach technologicznych.