Performance Evaluation for LLM Inferences on RISC-V based Tenstorrent AI Accelerators


This thesis examines the performance of Large Language Model (LLM) inferences on next-generation AI accelerators beyond GPUs, focusing on Tenstorrent’s RISC-V-based AI hardware. The study aims to evaluate various performance aspects, including factors such as inference latency, memory access pattern, and power consumption, to understand how these accelerators compare to traditional AI hardware.

Different LLM architectures will be analyzed to explore how their computational characteristics impact performance on specialized AI accelerators. The research will provide insights into the effectiveness of these accelerators in handling diverse AI workloads and identify potential optimizations for improving inference efficiency.

By benchmarking multiple models and assessing key performance trade-offs, this thesis aims to contribute to the broader understanding of AI hardware acceleration and guide future developments in efficient LLM deployment on emerging AI architectures. 

Requirements

Basic knowledge in

- LLMs

- computer architecture

- neural network compression

- AI compiler

- solid programming background

Please send your application email with cv and transcripts (in English) to binqi.sun@tum.de. 

(Students from CIT are welcome to apply for this topic as an IDP or Master's thesis)

Thesis Type

Bachelorarbeit | Semesterarbeit | Masterarbeit

Contact

Binqi Sun

Gebäude 5501 Raum 2.102a

+49 (89) 289 - 55183

binqi.sun@tum.de