5 Things to Consider Before Deploying Open-Source DeepSeek Models on Your Cloud Service

Since the release of DeepSeek’s new open-source reasoning models - R1, we have seen significant attention drawn to these solutions. Thanks to their impressive performance and low cost, DeepSeek is providing businesses with a viable alternative to external providers such as OpenAI and Anthropic—especially for companies concerned about security and data privacy. While self-hosting these open-source models on a cloud service like AWS is an attractive option, there are several important factors to consider before taking the plunge.

1. Why Choose DeepSeek Open-Source Models?

DeepSeek has made a breakthrough by significantly reducing the cost of deploying large language models—up to 90% less than traditional solutions — while bypassing the need for industry-standard GPU processing like CUDA. DeepSeek’s innovative training methods allow it to match the performance of leading models like OpenAI’s o1. Using the distillation method, DeepSeek leverages data from its flagship R1 model to train and enhance other smaller, open-source models. This approach has made high-quality, cost-effective open-source models available in various sizes, allowing them to be deployed and adapted for different use cases. In other words, you can achieve excellent performance and data privacy without the high costs typically associated with such technology.

Another thing to be aware of is that although DeepSeek is a Chinese company, the open-source models are fully compliant with the latest open-source license. You can use them for any purpose, including commercial use. When self-hosting these models on your own infrastructure, you maintain full control over your data. However, be aware that using DeepSeek's API through their official website means your data will be processed through their servers, which could potentially expose sensitive information to Chinese jurisdiction. Another thing to consider is that the open-source model is trained on Chinese data, so it may not be suitable for all use cases. For example, if the prompt is related to sensitive topics like politics, the model may not be able to provide a neutral answer.

2. Selecting the Right Open-Source Model for Your Needs

DeepSeek R1 is their flagship reasoning model, boasting impressive capabilities similar to top-tier competitors. This open-source model has 671 Billion parameters and requires around 1.3T VRAM to deploy the original version. The performance of this model matches OpenAI o1 model. See the below graph, DeepSeek-R1 (the blue bar) matches the OpenAI reasoning model OpenAI-o1 in nearly every aspect (including coding, math and english).

Although the full-scale model is massive and resource intensive, DeepSeek leverages the data from R1 to fine-tune a range of smaller models. They are ranging from the smallest model DeepSeek-R1-Distill-Qwen-1.5B around 3GB that can even fit in a browser (Check out the PlusAI Browser for an example of using this model in a browser) to DeepSeek-R1-Distill-Llama-70B around 150GB may need 2-3 nvidia professional H-series card or 10 nvidia gaming card to run. The performance of these smaller models are also impressive. The graph above also shows that DeepSeek-R1-32B delivers answers comparable to OpenAI's o1-mini, the smaller version of OpenAI’s reasoning model o1.

DeepSeek has also released other open-source models tailored for specific tasks. For example, DeepSeek Coder is designed for programming, DeepSeek V3 specializes in scalable natural language processing, and DeepSeek VL2 excels in visual question answering and OCR tasks. You can choose the right model given your specific use case.

3. Choosing Your Cloud Service

Amazon Bedrock Custom Model Import

Amazon Bedrock is designed to simplify the process of deploying generative AI models. With its Custom Model Import feature, you can bring in pre-trained models—including open-source ones—and deploy them as fully managed, serverless endpoints.

Since the service is serverless, you don’t need to worry about provisioning or managing the underlying infrastructure. AWS handles scaling automatically, which is especially beneficial during variable workloads. Bedrock can automatically adjust resources based on demand. This makes it particularly useful if you expect your model’s usage to fluctuate or if you want to quickly ramp up capacity without manual intervention. Because it abstracts away much of the infrastructure management, your team can focus on model fine-tuning and application development rather than on operations.

However, Bedrock Custom Model Import is currently only available in specific AWS regions, such as US-EAST-1 and US-WEST-2. This might be a drawback if your business or data privacy regulations require data residency in other parts of the world. Depending on your industry or country, you may have strict rules regarding data residency and privacy. The limited regional support means that Bedrock might not meet the necessary compliance standards in all international markets.

Amazon EC2

Amazon EC2 (Elastic Compute Cloud) provides virtual server instances in the cloud. Deploying your LLM on EC2 gives you full control over the computing environment, from the choice of hardware to the operating system configuration.

EC2 allows you to select specific instance types that match your performance requirements. This flexibility is crucial if you need a custom hardware configuration (for example, GPU instances tailored for deep learning workloads). With EC2, you can configure the environment exactly as needed, including the network settings, security policies, and storage options. This is ideal for organizations with specialized computing or compliance needs. Since you are selecting and managing the instances, you can ensure that the hardware remains consistent and meets your performance standards for both training and inference tasks.

Unlike serverless solutions, EC2 instances incur a fixed cost regardless of actual usage. This means you might end up paying for idle resources if your model’s workload is intermittent. Deploying on EC2 requires managing the instances yourself, including patching, scaling, and monitoring. For teams without dedicated DevOps resources, this additional complexity can be a disadvantage.

Amazon SageMaker

Amazon SageMaker is a fully managed service built specifically for machine learning. It streamlines the entire model development lifecycle—from training and fine-tuning to deployment and monitoring—making it an attractive option for rapid experimentation and production deployment.

SageMaker provides tools for model fine-tuning, testing, and evaluation within a single platform. Its built-in Jupyter Notebook instances and pre-configured environments reduce setup time and allow for seamless integration with other AWS services. With SageMaker, you can quickly iterate over different model configurations and track experiments. This is particularly useful during the development phase, as it enables fast prototyping and troubleshooting. SageMaker offers various inference options (real-time, asynchronous, serverless) that automatically scale based on demand. This reduces the need for manual scaling and simplifies the deployment process for production workloads.

While SageMaker provides a lot of convenience, this convenience comes at a premium. For organizations with tight budgets or for projects in the early testing phases, the higher cost of SageMaker may be a concern compared to more hands-on solutions like EC2. Although SageMaker abstracts many operational details, this can be a drawback if you require granular control over the underlying infrastructure. Some organizations prefer to directly manage their instances to optimize for performance or cost.

4. Security and Data Privacy

A major reason for adopting open-source models is the control they offer over data privacy. By hosting models on your own cloud service, you reduce the risk of sensitive data being shared with external providers. However, open-source does not equate to “no security.” . Moreover, open-source models can be more vulnerable to specific types of attacks, such as prompt injections or adversarial manipulations. We recommend implementing strong guardrails—such as input monitoring and security services offered by AWS—to protect your applications.

Opensource model is trained by people using data. Models are only as unbiased as the data they are trained on; if the training data includes sensitive or biased content, the model’s output may reflect those issues.

For example, DeepSeek has been trained on a vast dataset, which is subject to regulations by the Chinese government. This means that when faced with sensitive topics, the model may produce biased responses or even refuse to answer entirely.

Additionally, open-source models often have greater security vulnerabilities, making them more susceptible to attacks such as prompt injections, encoding bypasses, and adversarial manipulation. For instance, an attacker could deceive the AI by framing a conversation as fictional or set within a movie, bypassing built-in safeguards. This could potentially lead to unauthorized data access or malicious actions performed through the AI system.

Recently, security researchers tested 50 well-known jailbreak techniques against DeepSeek’s latest model and found that it failed to block all of them. You can read the full analysis here: Wired Article.

To mitigate these security risks, it’s crucial to implement guardrails for open-source AI models. This includes restricting AI usage to approved business applications and actively monitoring for suspicious prompt inputs. Additionally, leveraging built-in security features from cloud providers, such as AWS Guardrails, can help enhance protection against potential exploits.

5. Cost Considerations

Deploying opensource models involves various cost factors depending on your chosen infrastructure. In this section, we consider the cost of deploying Deepseek models on EC2 instances.

EC2 Instance

To deploy an EC2 instance, we have options include On-Demand, Reserved Instances (for one year) (better for production with lower fixed costs), and Spot Instances (potentially cheaper but less predictable). On-demand instance is the most expensive choice but we can pay as you go and stop and delete the instance whenever we want. It is ideal for testing and development. Alternatively, we can reserve an instance for one year so that we can pay a cheaper price but we cannot cancel the subscription before it is due. For Spot instance, the price could be much lower but it is possible to wait for the capacity when using. For a balanced approach, I recommend using On-Demand Instances for development and testing, while Reserved Instances are a better fit for production to reduce long-term costs.

GPU Recommendation

The G4DN series utilizes mid-tier T4 GPUs, making it a solid choice for deploying small models, development, testing, and non-real-time workloads. It is particularly well-suited for small teams and internal applications. The detailed specifications and cost for the G4DN series can be seen in the below table. The instances we can consider are:

G4dn.xlarge – G4dn.8xlarge: These instances can run models like DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-7B. The 1.5B model is lightweight enough to run directly in a browser, while the 7B model performs well in mathematical reasoning and simple coding tasks. g4dn.xlarge starts at $241.63 per month, while g4dn.8xlarge, ideal for higher concurrency, costs approximately $1,000 per month.
G4dn.12xlarge: This option can handle DeepSeek-R1-Distill-Llama-8B and DeepSeek-R1-Distill-Qwen-14B, offering performance similar to OpenAI’s O1-mini reasoning model. It costs around $1,800/month.
G4dn.metal: Suitable for running larger models like DeepSeek-R1-Distill-Qwen-32B, delivering higher performance but at a significantly higher price approximately $3,600/month.

While G4DN instances offer a balance between cost and performance, they may face throughput limitations when serving external users at scale. For higher efficiency and lower latency, AWS Inferentia (Inf2) instances might be a better alternative.

Since G4DN uses mid-tier T4 GPUs, its throughput may become a bottleneck when serving external users at scale. For higher performance and efficiency, I recommend the Inf2 series, which is powered by AWS Inferentia chips. These AI-optimized chips are designed by AWS to deliver high throughput and low latency at a lower cost for deep learning (DL) and generative AI inference applications. To deploy 7B, 8B, or 14B models, you can use:

Inf2.xlarge – Suitable for lower concurrency workloads. Cost: ~$348.70/month
Inf2.8xlarge – Handles higher concurrency demands. Cost: ~$905/month

For larger models like DeepSeek-R1-Distill-Qwen-32B or 70B, a more powerful instance is needed:

Inf2.24xlarge – Provides the required compute power for these models. Cost: ~$2,985/month

With Inf2 instances, you can achieve greater scalability, lower inference latency, and better cost efficiency compared to G4DN, making them a strong choice for production environments handling external traffic.

Conclusion

Deploying open-source DeepSeek models on AWS offers a powerful and cost-effective way to leverage AI while maintaining greater control over data privacy and security. However, it’s important to carefully evaluate key factors before deployment, including model selection, cloud service options, security considerations, and cost management.

Each deployment method—whether using Amazon Bedrock, EC2, or SageMaker—has its own trade-offs in terms of scalability, infrastructure control, and pricing. Additionally, securing open-source models is crucial, as they can be vulnerable to attacks like prompt injections and adversarial manipulation. Implementing strong guardrails and leveraging cloud security features can help mitigate these risks.

By making informed decisions on infrastructure, security, and cost optimization, businesses can successfully integrate DeepSeek models into their workflows while ensuring efficiency, compliance, and scalability.