Performance Evaluation for LLM Inferences on RISC-V based Tenstorrent AI Accelerators
This thesis examines the performance of Large Language Model (LLM) inferences on next-generation AI accelerators beyond GPUs, focusing on Tenstorrent’s RISC-V-based AI hardware. The study aims to evaluate various performance aspects, including factors such as inference latency, memory access pattern, and power consumption, to understand how these accelerators compare to traditional AI hardware.
Different LLM architectures will be analyzed to explore how their computational characteristics impact performance on specialized AI accelerators. The research will provide insights into the effectiveness of these accelerators in handling diverse AI workloads and identify potential optimizations for improving inference efficiency.
By benchmarking multiple models and assessing key performance trade-offs, this thesis aims to contribute to the broader understanding of AI hardware acceleration and guide future developments in efficient LLM deployment on emerging AI architectures.
Requirements
Basic knowledge in
- LLMs
- computer architecture
- neural network compression
- AI compiler
- solid programming background
Please send your application email with cv and transcripts (in English) to binqi.sun@tum.de.
(Students from CIT are welcome to apply for this topic as an IDP or Master's thesis)
Thesis Type
Bachelorarbeit | Semesterarbeit | Masterarbeit
Contact
Gebäude 5501 Raum 2.102a
+49 (89) 289 - 55183