If there is a Chatbot with massive users and uses your own NLP model, it might face a bottleneck that back-end service is not easy to handle such NLP concurrency. To solve this kind problem, Nvidia has introduced an open-source inference server, Triton, which allowed to deploy AI model on GPU or CPU. It will maximize utilization of GPU/CPU for faster inference.
In this session, I will introduce Triton Inference Server and deploy NLP model Triton with a practical sample.
Date and Time : November 5, 2022 / 14:15-14:45 ( UTC+8 ) Language : English Speaker : Mr. Ko Ko / Chatbot Developers Taiwan / Taiwan
Speaker Introduction
Mr. Ko Ko
Ko Ko is a Microsoft AI MVP. He is dedicated to sharing AI and Chatbot related technology. He is a famous technical lecturer and got invited by many large conferences, such as COSCUP, .NET CONF, PyCon APAC and so on. And Ko Ko is also a core member in Chatbot Developers Taiwan.