Streams

Unlike other Data Sources, streams cannot be created independent of a project. The option to add a stream only exists when you:

  • Have an existing SymetryML project

  • Create a new SymetryML project from scratch

SymetryML supports two types of streaming data sources: Kafka and NATS.

Kafka Streams

Creating a Kafka stream requires you to specify at minimum the following information:

  • Bootstrap Servers

  • Schema Registry

  • Kafka Topic

Stream Info

Time btw. Persists is an optional parameter that controls the frequency at which SymetryML persists a project with a stream. Default value of 300 seconds, should be sufficient unless your schema contains a large number of attributes (1000+).

Start from Beginning checkbox would read the earliest possible data from the stream. Unselecting this option would only update the project with the data that will be generated from this point in time.

Advanced Kafka parameters could be specified by clicking the Kafka Options button and adding the corresponding parameter/value pair.

Adding a Kafka Stream to an existing Project

Right-click the project node, and then click Add Stream.

Specify the required stream info (bootstrap servers, topic, and etc.)

When choosing a stream topic, the topic names will be displayed in the following format : TOPIC_NAME:NUM_PARTITIONS. Ensure the correct partitions have been selected by clicking Kafka Options button and verifying the prefilled kafka.partitions field.

Stream Topic Browser
Stream Parameters

Ensure your information is valid and click Next

Verify the stream in the preview panel. Click Next to continue.

In the final panel specify the correct type mapping and click Finish

NATS Streams

NATS is a lightweight, high-performance messaging system. Creating a NATS stream requires you to specify the following information:

Parameter
Required
Description

NATS Hosts

Yes

Comma-separated list of NATS server addresses

NATS Subject

Yes

The NATS subject to subscribe to

Data Format

Yes

Format of the incoming data: json, csv, or protobuf

Protobuf Schema

If protobuf

The Protobuf schema definition (required when Data Format is protobuf)

Protobuf Message Type

If protobuf

The Protobuf message type name (required when Data Format is protobuf)

Authentication

No

One of None, User/Password, Token, or NKey

Request Timeout

No

Timeout in seconds (default: 30)

Max Memory

No

Maximum memory for the NATS stream (default: 1GB)

NATS Stream Info

Time btw. Persists and Start from Beginning options function the same as with Kafka streams.

Adding a NATS Stream to an existing Project

Right-click the project node, and then click Add Stream. Select NATS as the stream type, then fill in the required parameters listed above.

NATS Stream w/ Protocol

Ensure your information is valid and click Next.

Verify the stream in the preview panel. Click Next to continue.

In the final panel specify the correct type mapping and click Finish.

Creating a Project with Stream

When creating a new project, adding a stream (Kafka or NATS) can be performed by simply:

  1. Selecting New Data Source on the data selection panel

  2. Specifying the required stream info for your chosen stream type

See Creating New Project for more information.

Stopping and Restarting a Stream

Once added to a project, a stream will be continuously polled and will cause the project to be updated automatically with new incoming data. This behavior can be paused and restarted again.

Stream Actions

To stop the update behavior:

  1. Right-click on the stream node

  2. Click Stop Stream

While this does stop the project from being updated, it has no effect on the underlying stream. The stream will continue to ingest new data in the background.

Resuming project update behavior can be done in one of two ways:

  • A stream can be resumed from the point where it was paused (Resume Stream)

  • It can be restarted from the very beginning. (Start from Beginning)

The correct restart point will depend on the architecture of your streaming cluster. If the cluster allows for storage of large volume of data, simply resuming might be the best option. On the other hand, if the data in the cluster is changing rapidly, restarting from the beginning might be more optimal.

Stream Metrics

Information about a stream can be obtained by right clicking on the stream node and selecting Stream Metrics

Stream Metrics

Stream Errors

Error Log allows the user to diagnose any potential problems that might occur when interfacing with a stream. This is the first place you should look when troubleshooting.

Stream Error Log

Last updated