Setup
This guide builds off of our others for finding the best gpu and for deploying gpu containers. We have a python notebook already to go for you to deploy this model that you can find here. The requirements are simple, so in a python environment with (requests
+ optionally openai
) installed:
basic_serving_vllm.ipynb
you will need to input your Shadeform API Key.
Serving a Model
Once we have an instance, we deploy a model serving container with this request payload.Checking on our Model server
There are three main steps that we need to wait for: VM Provisioning, image building, and spinning up vLLM.Watch via the notebook
Once the model is ready, this code will output the model list and a response to our query. We can use either requests or OpenAI’s completions library.Watching with the Shadeform UI
Or once we’ve made the request, we can watch the logs under Running Instances. Once it is ready to serve it should look something like this: