1. Enterprise AI Platform

TensorOpera is the next-gen cloud service for LLMs and Generative AI. It helps developers or AI/ML teams to launch complex model training, deployment, and federated learning anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones easily, economically, and securely.

You can buy our advanced plan or enterprise service for your specific needs in on-premise deployment and dedicated support in any aspect of the entire ML pipeline. More details are as follows:

Starter
$0/month
+compute
Advanced
$199/month
+compute
Enterprise
Contact us

Features Overview

2. Model Serving

2.1 Serverless Endpoints for Open Source Model

Pay-as-you-go for popular open-source LLM and generative AI models. Rapid API integration in minutes without any GPU server setup. Easy upgrade to dedicated endpoints with MLOps support.

Total Parameter or Image SizePrice
Large Language, Code, and Voice Models≤ 3.5B$0.10
3.5B - 7B$0.20
7B - 14B$0.30
14B - 40B$0.45
40B - 70B$0.90
Mixture of Experts≤ 56B$0.50
56B - 154B$1.20
Embedding Models≤ 270M$0.016
Image Models512 x 512$0.0001
1024 x 1024$0.0002
Video Models576 x 1024$0.009
Audio Models≤ 1.5B$0.001

* The prices listed are for 1 million tokens, which includes both the input and output tokens for chat, voice, and code models. For embedding models, the price is based only on the input tokens. Furthermore, the price is determined by the number of steps for image/video models and the number of output seconds for audio models. For multi-modal models like LLaVA, each image is counted as the equivalent of 576 prompt tokens for billing purposes.

2.2 Dedicated Endpoints for Your Own Model

You can host your own model on TensorOpera Secure Cloud with the TensorOpera Platform, which supports autoscaling across clouds, manual scaling, model versioning updates, logging, and system monitoring. We support both docker images and custom Python APIs for easy integration.

HardwarePriceGPUGPU MemoryCPURAM
RTX 3090$0.44 / Hour1 x NVIDIA-RTX-309024GB6x48GB
RTX 4090$0.47 / Hour1 x NVIDIA-RTX-409024GB8x128GB
A100 80GB$1.65 / Hour1 x NVIDIA-A100-80GB-PCIe80GB16x256GB
H100 80GB$3.00 / Hour1 x NVIDIA-H100-80GB80GB14x256GB

Enterprise service: If you need help with complex custom deployment (multiple models, cross-server workflow, distributed serving in multiple clouds, etc.) or throughput/latency optimization, don't hesitate to contact us .

3. AI Agent API

AI Agent API is similar to OpenAI Assistants API but can be customized with your own LLM or vector database. You can try it on the TensorOpera Platform (click the left-side "AI Agent" tab). The LLM-based AI Agent can utilize LLMs, tools, and knowledge to respond to user queries. The LLM Agent uses LLM as its “brain”, learns to call external APIs (tools) for additional information that is missing from the model weights, and also leverages the vector database-backed RAG (retrieval augmented generation) as its "memory".

The LLM tokens used for the LLM are billed at the chosen endpoint. It can be either dedicated endpoints or serverless endpoints. The additional cost is as follows:

ToolInput
Code Interpreter$0.03 / session
Retrieval$0.20 / GB / agent / day

Enterprise service: If you are not satisfied with the performance of your LLM Agent, we can provide you with guidance in LLM fine-tuning, RAG optimization, and prompt engineering ( contact us ).

4. Serverless Training, Fine-tuning, or Federated Learning

TensorOpera Platform supports serverless AI jobs with TensorOpera®Launch. You only need to pay per use for your job. We provide many free pre-built job templates (training, fine-tuning, or federated learning) in Studio or Job Stores on the TensorOpera platform.

TensorOpera Launch can swiftly pair AI jobs with economical GPU resources to auto-provision and effortlessly run the job, eliminating complex environment setup and management. Check for more details at https://https://docs.tensoropera.ai/launch.

HardwarePriceGPUGPU MemoryCPU CoresRAMCommunication Backend
RTX 3090$0.54 / Hour1 x NVIDIA-RTX-309024GB6x48GBPCIe
RTX 4090$0.55 / Hour1 x NVIDIA-RTX-409024GB8x128GBPCIe
A100 80GB$1.98 / Hour1 x NVIDIA-A100-80GB80GB16x256GBNVLink, InfiniBand: 8x 200Gbps ConnectX-6
H100 80GB$3.25 / Hour1 x NVIDIA-H100-80GB80GB14x256GBNVLink, InfiniBand: 8x 400Gbps ConnectX-7

Enterprise service: If you are not satisfied with the performance of your LLM Agent, we can provide you with guidance in LLM fine-tuning, RAG optimization, and prompt engineering contact us .

5. Compute

5.1 Serverless Secure GPU Cloud (On-demand)

TensorOpera provides your ML team with a fully managed GPU cluster with the pre-installed TensorOpera Platform. TensorOpera Platform provides useful features to accelerate your AI development, including GPU job scheduling, training, deployment, experimental tracking, and monitoring.

HardwarePriceGPUGPU MemoryCPU CoresRAMCommunication Backend
RTX 3090$0.54 / Hour1 x NVIDIA-RTX-309024GB6x48GBPCIe
RTX 4090$0.55 / Hour1 x NVIDIA-RTX-409024GB8x128GBPCIe
A100 80GB$1.98 / Hour1 x NVIDIA-A100-80GB80GB16x256GBPCIe
H100 80GB$3.25 / Hour1 x NVIDIA-H100-80GB-SXM80GB14x256GBNVLink, InfiniBand: 8x 400Gbps ConnectX-7

5.2 Dedicated Secure GPU Cloud

TensorOpera provides your ML team with a fully managed GPU cluster with the pre-installed TensorOpera} Platform. TensorOpera} Platform provides useful features to accelerate your AI development, including GPU job scheduling, training, deployment, experimental tracking, and monitoring.

HardwarePriceGPUGPU MemoryCPU CoresRAMCommunication Backend
RTX 3090$0.44 / Hour1 x NVIDIA-RTX-309024GB6x48GBPCIe
RTX 4090$0.47 / Hour1 x NVIDIA-RTX-409024GB8x128GBPCIe
A100 80GB$1.65 / Hour1 x NVIDIA-A100-80GB-PCIe80GB16x256GBPCIe
H100 80GB$3.00 / Hour1 x NVIDIA-H100-80GB-SXM80GB14x256GBNVLink, InfiniBand: 8x 400Gbps ConnectX-7

Note: Approximate pricing based on available GPUs in TensorOpera Secure Cloud

Enterprise service: If you need help with complex custom deployment (multiple models, cross-server workflow, distributed serving in multiple clouds, etc.) or throughput/latency optimization, don't hesitate to contact us .

6. Monetization Service

6.1 Monetize Your Model

We appreciate the model creator's effort, whether the model is open-sourced or closed-sourced. TensorOpera can provide model owners a payment and serving platform to monetize the model. Serve your model, turn it into a model API, and bind your credit, you will receive earnings when users call and pay for your APIs.

Monetize Your Model
Preview

Enterprise service: If you want to customize the revenue-sharing mechanism or the endpoint API and related landing page for your own customers, please contact us .

6.2 Share Your GPUs and Earn

Share your GPU(s) with our GPU marketplace and earn money to pay back your investment. More details are at https://docs.tensoropera.ai/launch/share-and-earn.

A developer who needs GPUs to run their job(s)
TensorOpera
  • Cloud GPU Provider
  • Edge GPU Provider
  • Individual GPU Provider

Enterprise service: If you want to integrate a large number of GPUs into TensorOpera Cloud backend systems to allow AI developers to run TensorOpera Launch serverless jobs, please contact us .