About Federated Learning

This section will give background needed in order to understand the business object behind the Federated Learning functionality. The next section Federated Learning API will go into the details about the REST API itself.

Federated Project Terminology

SymetryML projects can easily be merged together. That is, imagine you have 2 projects: (a) a project p1 that processed dataset d1 and (b) a project p2 that processed dataset d2. You can merge p2 into p1 and the resulting p1 project will be the same as if p1 would have processed the datasets d1 and d2. This capability is leveraged in a SymetryML Federated project. A federation consists of n Symetry Projects that each process their own private data and share their results at a given interval. This can be seen in the following picture:

Example of 3 nodes federation

In order to fully understand the federated learning REST API one needs to understand a few concepts / terminology.

Federation Terminology

Term
Definition

peers or node

A node is a member of a federation. It’s basically a Federated Symetry Project.

federated project

A Federated Symetry Project contains 2 symetry projects. One local project and one federated project. The federated project is rebuilt from time to time according the the Federation Schedule defined by the federation admin.

local project

A Federated Symetry Project contains 2 projects. One local project and one federated project. The local project is responsible to process data that is local to this project.

Federation

A federation is a set of nodes that communicate and share Symetry project information

Federation Info

Information that describes a federation.

Federation Admin

The user who creates a federation automatically becomes the federation admin.

Federation Contract

A set of boolean rules used to enforce quality of individual peer's data. Please see the Federation Contract section for details.

Federation secret key

An AES secret key that is used to encrypt communication between peers/nodes of a federation.

Federation Schedule

Peers in a federation will send updates to other peers according to a schedule. This schedule is defined by the federation admin when a federation is created. Example of schedule:

- m30 : synchronize every 30 mins

- h3 : synchronize every 3 hours

- d7 : synchronize every 7 days

scheduled synchronization message

A periodic message sent by a peer to other peers in a federation. The period is defined by the federation schedule.

A SymetryML Federation can use either Amazon AWSarrow-up-right services or NATSarrow-up-right in the backend to transmit the various messages to support its functionality.

AWS Backed Federation

In the AWS implementation, under the hood, the federation service uses many AWS services:

  • Each federation node has an AWS SQS queue to receive messages

  • A Federation has an AWS SNS topic that allows fanout messages to be sent to multiple SQS queues.

  • Nodes in the federation use messages to the SNS topic to communicate with other nodes

  • SNS messages are lightweight and contain pointers to Amazon S3 files that are used to temporarily store message content.

  • AWS STS credentials are used to allow other users to access a user’s file on S3.

  • The following figure illustrates this:

Fedederated SML AWS Integration

NATS Based Federation

NATS based federation use the NATS 'connective technology' to create a federation. For more details on NATS please consult www.nats.io. Under the hood SymetryML uses NATS to send message as well as synchronization message between all the peers in the federation.

NATS based SML Federation

Peers can authenticate to the NATS network by either using user/password combination or token. Please consult https://docs.nats.io/developing-with-nats/securityarrow-up-right for more details.

Federated Project Uses Cases

Create a Federation

The user who creates a federation will become the administrator of it.

Join a federation

In order to join a federation one must:

  1. Make sure that your clock is correctly synched using a ntp service or something similar. If a computer’s clock, in a federation, is not correctly synched it will have problems receiving messages from other nodes as the service will ignore many messages because of the discrepancy between the time a message was sent and the internal clock of the computer receiving the message. Those errors could be seen using the Get Error Log rest endpoint.

  2. Receive one-time encrypted federation info along with the password to decrypt the message. This can be done over email, Skype or any other means that allows transferring some base64 encrypted text. The federation administrator can get this encrypted federation info using the Get Encrypted rest endpoint

  3. Invoke the rest point to join the federation (FedJoin) with the encrypted message and the password received from the federation admin. This message is also to be encrypted using the user secret key.

  4. Upon successful result from step 3, one can now start syncing with other nodes in the federation. This is done by invoking the Start Pulse rest endpoint.

Federation Contracts

SymetryML's federated learning capabilities allow peers to share statistical information without sharing raw data. This shared statistical representation supports:

  • supervised and unsupervised machine learning and

  • various exploration APIs.

Some of these exploration APIs can be used to enforce quality on the data that peers participating in a federation contribute. Of course, data is never shared directly, only statistical knowledge of the data is shared. But this knowledge is sufficient to enforce rules like: "enforce that at least 40% of the rows with positive cancer are female" or "enforce that at least 500 example of fraud is part of this dataset", etc...

This enforcement is done via what we call 'Federation Contracts'. A Federation Contract is a list of rules to be enforced on shared statistical data for it to be validated. These rules are effectively Boolean predicates that evaluate to true or false and for a contract to be validated, all its rules need to evaluate to true.

Federation Contract Rules

Federation Contracts are defined with the following Backus-Naur notation as well as the following table that describes the individual function that can be used in a Federation Contract.

Federation Contract Backus-Naur Notation

Function That Can Be Used in a Federation Contract

  • F1 / F2 means 'Feature 1 type' and 'Feature 2 type'

  • C means Continuous Type

  • B means binary Type

Rule
F1
F2
Business Rule Interpretation

COUNT

C|B

How many time a feature was seen

MEAN

C|B

The mean value of a features

STDDEV

C|B

The standard deviation of a features

VARIANCE

C|B

the variance of a feature

STDDEV_UNBIASED

C|B

The unbiased standard deviation of a features

VARIANCE_UNBIASED

C|B

the unbiased variance of a feature

COVAR

C|B

C|B

The covariance of 2 features

LINCORR

C|B

C|B

The linear correlation of 2 features

COND_STDDEV

C|B

B

Stddev of feature 1 when Feature 2 is '1' or true

COND_VARIANCE

C|B

B

Variance of feature 1 when Feature 2 is '1' or true

COND_STDDEV_UNBIASED

C|B

B

Unbiased stddev of feature 1 when Feature 2 is '1' or true

COND_VARIANCE_UNBIASED

C|B

B

Unbiased variance of feature 1 when Feature 2 is '1' or true

COMPL_COND_STDDEV

C|B

B

Stddev of feature 1 when Feature 2 is '0' or false

COMPL_COND_VARIANCE

C|B

B

Variance of feature 1 when Feature 2 is '0' or false

COMPL_COND_STDDEV_UNBIASED

C|B

B

Unbiased stddev of feature 1 when Feature 2 is '0' or false

COMPL_COND_VARIANCE_UNBIASED

C|B

B

Unbiased variance of feature 1 when Feature 2 is '0' or false

PCT_OF_TRUE

B

B

Percentage of occurrence with Feature1 is 1 or true and feature2 is '1' or true

PCT_OF_FALSE

B

B

Percentage of occurrence with Feature1 is 1 or true and feature2 is '0' or false

NUM_OCCURENCE_WHEN_TRUE

B

B

Number of occurence when Feature1 is 1 or true and feature2 is '1' or true

NUM_OCCURENCE_WHEN_FALSE

B

B

Number of occurence when Feature1 is 1 or true and feature2 is '0' or false

MEAN_WHEN_TRUE

C

B

Mean of feature 1 when Feature 2 is '1' or true

MEAN_WHEN_FALSE

C

B

Mean of feature 1 when Feature 2 is '0' or false

Examples of Federation Contract

Here is a small example with the Iris data set. For a Federation Contract to be valid all the rows must evaluate to TRUE.

Another example using multiple predicates on each line which is permitted per the Backus-Naur Notation:

Federation Contract Failure Action

Federation Contracts can be evaluated by each peer at two times: First, when sharing their own statistical data with other peers in a federation and second when receiving other peers' statistical data. It's possible to control what SymetryML does when a validation failure occurs at both these times. This is specified when creating / joining a federation by each individual peer by specifying the following parameters inside the Federation Key Map Value, please consult the sections mentioned in Federated Learning: Creating Federation for details:

Action Type
Action Choice

fed_psr_contract_snd_fail_action

  • fed_psr_contract_snd_fail_action_block - default

  • fed_psr_contract_snd_fail_action_allow

fed_psr_contract_rcv_fail_action

  • fed_psr_contract_rcv_fail_action_block - default

  • fed_psr_contract_rcv_fail_action_allow

Peer Exploration

Another functionality enabled by SymetryML federated learning is 'peer exploration'. It allows you to use SymetryML's full suite of exploration APIs against the statistical data of a peer. This can be used to perform various univariate and bivariate comparisons between different peers' data without ever seeing the raw data of that peer.

If a particular peer wishes to block such functionality please consult this section to learn how to disable / enable this functionality.

Secure Multi-Party Computation Mode

SymetryML's federated learning allows sharing certain summary features of the data without ever sharing the raw data. However, in order for the shared statistical representation not to be invertible - that is not allow for the reconstruction of the original data - it needs to have processed a minimum number of rows. This minimum threshold depends on the number of attributes and equals the following:

Minimum number of rows = Number of Attributes + 5

If this minimum is not met on a given peer at the time of syncing then the peer will not share its current statistical data with the other nodes in a federation. The same logic applies for incremental synchronization. That is the delta of each sync - or the amount of new data since the last synchronization - must follow this rule for the synchronization to be allowed.

This can be a limitation for some federations where each peer does not have lots of data. To circumvent this limitation, it's possible to use secure multi-party computation when peers share their statistical data. The protocol will only complete if the resulting shared data is not invertible.

Federated Learning with SMPC can be enabled by simply adding a key value pair the fed_use_smpc= true inside the Federation Key Map Value when an administrator creates a federation please consult the sections mentioned in Federated Learning: Creating Federation for details.

Federated Project REST API at a Glance

Besides creating and joining a federation via rest endpoints, other operations are available. The following table lists all available rest endpoints for federated learning. The following functionality of normal SymetryML projects is available in Federated Project

Limitation of Federated Project

  • Features hashing is not available

  • By default Random Forest model are disabled. It can be enabled by change the SymetryML server configuration. For details please see the rtlm.option.sml.fed.strict.mode key in Installation Guide - SymetryML REST Configuration section.

  • If your project has more than 2000 attributes you should be careful on how frequently you sync your projects. Please consult the Federation Terminology section for more information.

Federated Project Actions
Definition

This rest endpoint is used to create a new federation. The user performing this operation will become the owner of the federation.

This is a map of properties for a federation.

Return the federation information encrypted with a password. This is needed in order to share federation information with other peers that the federation admin wants to invite to join the federation. The response will contain a token that can only be used once.

This rest endpoint allows a peer to join an existing federation.

This endpoint instructs your federated project to start pulsing, that is the project will periodically poll for messages from other nodes in the federation as well as sending its scheduled synchronization message.

Stop synchronizing with the federation

Returns the error log for this project. Since many messages between nodes happen asynchronously, this allows the user to see if there was an error while communicating with the other peers in the federation.

This returns a log of when this federated project was updated.

For AWS based federations, this will return information about AWS SNS topic, SNS subscriptions as well as SQS queues. This is for troubleshooting purposes.

Last updated