What is Oracle Cloud Infrastructure Streaming (OSS)?
The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable storage option for continuous, high-volume streams of data that you can consume and process in near real-time.
What does OSS manage on my behalf?
It's fully managed; from the underlying infrastructure to provisioning, deployment, maintenance, security patches, replication and consumer groups, which makes application development easier.
How does Oracle Cloud Infrastructure Streaming provide resiliency?
When you create a stream inside Oracle Cloud Infrastructure Streaming, Oracle automatically creates and manages 3 streaming nodes distributed across 3 different AD(s) (or fault domains for single AD-regions), ensuring that your streams stay highly available and your data highly durable.
What can I do with OSS?
OSS allows you to emit data and retrieve the data in near real time. The number of use cases are nearly unlimited, from messaging to complex data streams processing.
Here are some of the many possible uses for Streaming:
- Messaging: Use streaming to decouple components of large systems. Streaming provides a pull/buffer-based communication model with sufficient capacity to flatten load spikes and the ability to feed multiple consumers with the same data independently. Key-scoped ordering and guaranteed durability provide reliable primitives to implement a variety of messaging patterns, while high throughput potential allows for such a system to scale well.
- Metric and log ingestion: Use streaming as an alternative for traditional file-scraping approaches to help make critical operational data more quickly available for indexing, analysis, and visualization.
- Web/Mobile activity data ingestion: Use streaming for capturing activity from web sites or mobile apps (such as page views, searches, or other actions users may take). This information can be used for real-time monitoring and analytics as well as in data warehousing systems for offline processing and reporting.
- Infrastructure and apps event processing: Use streaming as a unified entry point for cloud components to report their life cycle events for audit, accounting, and related activities.
How do I use OSS?
Start using OSS by:
- Creating a new stream through the OSS Console or through the CreateTopic API
- Emitting data from producers to the topic (see documentation)
- Building consumers to read and process data from your stream
What are the limits of OSS?
Overall, the amount of throughput you can have access to doesn’t have any limits. You just need to proactively design your stream with the right number of partitions.
The hard limits of the system are:
- Duration of up to maximum 7 days for retention.
- The maximum size of a unique message is 1 MB
- Each partition can handle up to 1000 Emit API call per second and 5 Read API call per second
- Each partition can support up a maximum total data write rate of 1MB per second and read rate of 2MB per second
- Each tenancy has a limit of 5 partitions however you can request more partitions - Contact Us
What is a stream?
A Stream can be viewed as an append-only log file that contains your messages.
Streams are divided into a number of partitions for scalability. Partitions allow you to distribute a stream by splitting the messages across multiple nodes (or brokers) — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
A 64-bit encoded message is what you emit into a topic.
What is a offset?
Each message within a partition has an identifier called its offset. Consumers can read messages starting from a specific offset and are allowed to read from any offset point they choose. Consumers can also commit the latest processed offset so they can resume their work without replaying or missing a message if they stop and then restart.
What is a key?
A key is an identifier used to group related messages.
Creating a Stream
How to create a new stream?
You can create a new stream by using our Console or our API. See API here.
Your stream is created for a particular region and tenancy and optionally for a dedicated compartment. A steam's data is replicated across the entire region allowing it to tolerate AD loss or network splits without disrupting the service and offers built-in high availability in a region.
How long does the provisioning take?
The time to provision depends on the number of partitions. Creating a new partition takes up to 10 seconds.
How do I decide the number of partitions I need?
The number of partitions for your stream depends on the throughput expectations of your application (expected throughput = average recond size x maximum number of records written per sectond).
What is the minimum throughput I can request for a stream?
The throughput of a Oracle Cloud Infrastructure stream is defined by a partition. A partition provides 1MB/sec data input and 2MB/sec data output.
What is the maximum throughput I can request for a data stream?
The throughput of an Oracle Cloud Infrastructure stream is designed to scale without limits. By default, each tenancy can provision 5 partitions. Contact us if you want to increase partition limit per tenancy.
How many requests I can send to a partition?
You can send 1,000 requests per second to a partition.
Publishing Data to a Stream
How do I emit data into a stream?
Once a stream is created and active you can publish messages. For publishing, you can use the Write API (putMessages). The message will be published to a partition in the stream. If there is more than one partition, the partition where the message will be published is calculated using the message's key.
How will OSS store data if I send null key?
If the key is null, the partition will be calculated using a subset of the value. For messages with a null key, do not expect messages with same value to go on the same partition, since the partitioning scheme may change; sending a null key will effectively put the message in a random partition.
How do I ensure ordering of messages in OSS?
If you want to make sure that messages with the same value go to the same partition, you should use the same key for those messages.
How do I ensure that my message is durable?
As soon as the OSS API acknowledges your putMessage without error, this messages is durable.
How do you ensure consistency of data in a OSS stream?
OSS guarantees linearizable reads and writes to a partitioning key.
What happen if I emit more data than the maximum authorized?
When client requests exceed the limits, OSS denies the request and send out an error exception message.
Consuming Data from a Stream
How do I read and consume data from a stream?
Consuming messages requires you to:
- Create a cursor
- Use the cursor to read messages
Refer to the technical documentation for step by step guide on consuming data from a stream.
What are the different ways I can consume data from an OSS Stream?
OSS provides two kinds of consume API:
- Low-level inspection to precisely control partitions and offsets to read data from
- Consumer groups to simplify application development by offloading load balancing, coordination, and offset tracking to the service
How do consumer groups work?
Consumers can be configured to consume messages as part of a group. Stream partitions are distributed among members of a group so that messages from any single partition will only be sent to a single consumer.
Partition assignments are re-balanced as consumers join or leave the group.
How do I avoid duplicate messages to my consumers?
We recommend that consumer applications take care of duplicates.
How do I know whether consumers are falling behind?
If you want to know if your consumer is falling behind (you are producing faster than you are consuming), you can use the difference between timestamp of the message and the current time. If this number gets higher, you might want to spawn a new consumer to take over some of the partitions from your first consumer.
Managing an OSS Stream
Can I change the number of partitions later on?
We recommend customers allocate partitions slightly higher than their maximum throughput. This will help them to manage their application spikes as we currently don't support changing the number of partition once a stream is created.
Can I change the durability of my topic?
By default, we store data for 24 hours. You can set up the retention period up to 7 days while creating a stream. Once retention period is defined, it can't be edited.
How do I monitor the operations and performance of my OSS stream?
The Oracle Cloud Infrastructure Streaming console provides both operational and performance metrics, such as throughput of data input and output. OSS also integrates with Oracle Cloud Infrastructure Telemetry so that you can collect, view, and analyze telemetry metrics for your streams.
Security & Encryption
How do I manage and control access to my stream?
All streams in the same tenancy have unique immutable names. Every stream has a compartment assigned. So, all the power of Oracle Cloud Infrastructure access control policies may be used to describe fine-grained rules at the tenancy, compartment, or single stream level.
Access policy is specified in a form of "Allow to in where ".
How do I authenticate when emitting or consuming data from OSS?
Our internet API uses the Oracle Identity service. Oracle Identity Service provides convenient way to authenticate users and authorize an access to such APIs from both browser (Username/password) and code (API Key).
See documentation here.
When I use OSS, how secure is my data?
OSS is secure by default - User data is encrypted both at rest and in motion. Only the account and data stream owners have access to the stream resources they create. OSS supports user authentication to control access to data. You can use Oracle Cloud Infrastructure IAM policies to selectively grant permissions to users and groups of users. You can securely put and get your data from OSS through SSL endpoints using the HTTPS protocol.
Can I encrypt my data?
You own the data you emit; you can encrypt your data before sending it to OSS.
Can you walk me through the encryption life cycle of my data from the point in time I send it to an OSS Stream to when I retrieve it?
Ingestion (your producer - Streaming gateway): Data encrypted in motion due to SSL (HTTPS).
Inside of streaming service: On the gateway SSL gets terminated, data is encrypted upon arriving with per-stream AES-128 key, and is sent to the storage layer for persistence.
On consumption: Encrypted data is read from OSS, decrypted by the gateway node, and sent to consumer over SSL.
What encryption algorithm is used for OSS encryption?
OSS uses AES-GCM 128 algorithm for encryption.
Pricing & Billing
How much does it cost?
OSS uses simple pay-as-you-go pricing. There are no upfront costs or minimum fees, and you only pay for the resources you use.
- GET/PUT request price (GigaBytes of data transferred)
- Extended data retention is an optional cost determined by the amount of additional days of retention beyond the default 24-hour retention (GigaBytes of storage per hour)
Is there a free tier for OSS?
OSS doesn't have a free tier.