Select Model with SymetryML

Select Model introduction

SymetryML Select Model is a feature selection functionality. It allows to automatically select the best features for a given model algorithm and it leverages SymetryML unique capabilities to build different predictive model quickly. The functionality builds various model each with different input attributes using a predefined heuristic. It then computes a score for them using out of sample data and will retain the best one. The following table describes the available heuristics:

Select Heuristic

Select heuristic

Name

Description

Forward Backward

A heuristic that does the following: 1. Iteratively add as many features as possible while keeping the best model 2. Iteratively remove as many feature as possible while keeping the best model 3. repeat a specific number of time.

Brute Force

Brute force will try all possible combinations of the input attributes. It should not be used if you have more than 17-18 attributes.

Max Number of Iterations

Randomly create a model by trying a specific number of random number of permutations of the features.

Max. Number of Seconds

Randomly create a model by trying a random number of permutations of the features for a maximum number of seconds.

Simple

The simple heuristic starts with one feature and then incrementally adds one additional feature until it tries all the features. It then keeps track of the best model.

Selector Types

Parameter
Description

selector_type_fw_bw

Forward / Backward heuristic. Number of iteration is by default 5. It can be controlled with the selector_max_iterations parameters.

selector_type_simple

Simple heuristic

selector_type_brute

Brute force selector.

selector_type_iteration

A Selector that will either try a specific number of random combination or will try for a specific number of seconds. selector_max_iterations or selector_max_seconds must also be specified with this type of selector

selector_type_genetic

(Experimental) Genetic Algorithm feature selector. Uses evolutionary optimization to find optimal feature subsets. See Genetic Algorithm Selector section.

selector_type_bayesian

(Experimental) Bayesian Optimization feature selector. Uses probabilistic modeling to efficiently search the feature space. See Bayesian Optimization Selector section.

Selector Grid

Elastic Net model has 2 hyper parameters that can be optimized eta and lambda. The auto-select algorithm will try various combinations of these parameters using a grid search. The size of this grid can be controlled via the autoselect_grid_type extra parameter in the MLContext request body. Please see this section for such an example.

Parameter
Description

autoselect_grid_type_tiny

eta [0, 0.5, 1.0] x lambda [1e-3, 1e-2, 0.1]

autoselect_grid_type_small

eta [0, 0.5, 1.0] x lambda [1e-3, 1e-2, 0.1, 1]

autoselect_grid_type_normal

eta [0, 0.3333, 0.6666, 1.0] x lambda [1e-4, 1e-3, 1e-2, 0.1, 1, 10]

autoselect_grid_type_large

eta [0, 0.2, 0.4, 0.6, 0.8, 1.0] x lambda [1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 0.1, 1, 10, 100, 1000]

Select Model Rest API

Allows to invoke the select model functionality by specifying an external data source id as the out of sample data to use for model assessment.

URL

Query Parameters

Parameter
Required / Optional
Description

modelid

Required

ID to assign to the new model.

algo

Required

Algorithm to fit

MLContext Build Parameters

Parameter
Required / Optional
Type
Description

rnd_seed

Optional

Integer

Set the seed of the randomizer

selector_type

Optional

String

Default is selector_type_fw_bw. Please see Selector Heuristic and Selector Types sections for details.

autoselect_grid_type

Optional

String

Default is autoselect_grid_type_tiny. Please see Selector Grid Table for details.

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

202

OK

Job accepted.

400

BAD REQUEST

Unknown SymetryML project. {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}

Sample Request Response Classifier

Sample Request Response Regression

Select Model Dataframe Rest API

Allows to invoke the select model functionality by using a DataFrame passed in the request body as the out of sample data to be used for models assessment.

URL

Query Parameters

Parameter
Required / Optional
Description

modelid

Required

ID to assign to the new model.

algo

Required

Algorithm to fit

MLContext Build Parameters

Parameter
Required / Optional
Type
Description

rnd_seed

Optional

Integer

Set the seed of the randomizer

selector_type

Optional

String

Default is selector_type_fw_bw. Please see Selector Heuristic and Selector Types sections for details.

autoselect_grid_type

Optional

String

Default is autoselect_grid_type_tiny. Please see Selector Grid Table for details.

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

202

OK

Job accepted.

400

BAD REQUEST

Unknown SymetryML project. {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}

Sample Request Response Classifier

Sample Request Response Regression


Genetic Algorithm Selector (Experimental)

The Genetic Algorithm selector uses evolutionary optimization to find optimal feature subsets. It evolves a population of candidate feature sets over multiple generations, using selection, crossover, and mutation operations to discover high-performing feature combinations.

When to Use

  • When you have a large number of features and want to explore the feature space more thoroughly than forward/backward selection

  • When feature interactions are important and simple greedy approaches may miss optimal combinations

  • When you can afford more computation time for potentially better results

Genetic Algorithm Parameters

Parameter
Type
Default
Description

genetic_population_size

Integer

50

Number of candidate feature sets in each generation

genetic_num_generations

Integer

100

Maximum number of generations to evolve

genetic_mutation_rate

Double

0.05

Probability of flipping each feature (gene) during mutation

genetic_crossover_rate

Double

0.8

Probability of performing crossover between two parents

genetic_elite_count

Integer

2

Number of top-performing individuals preserved unchanged each generation

genetic_tournament_size

Integer

3

Number of individuals competing in tournament selection

genetic_initial_feature_prob

Double

0.1

Probability that each feature is included in initial random population

genetic_min_features

Integer

1

Minimum number of features allowed in any individual

genetic_max_features

Integer

unlimited

Maximum number of features allowed in any individual

genetic_parallel_threads

Integer

4

Number of parallel threads for model evaluation

genetic_stagnation_limit

Integer

20

Number of generations without improvement before early stopping

Sample Request


Bayesian Optimization Selector (Experimental)

The Bayesian Optimization selector uses probabilistic modeling to efficiently search the feature space. It builds a surrogate model of the objective function and uses an acquisition function to balance exploration and exploitation when selecting which feature combinations to evaluate.

When to Use

  • When model evaluation is expensive and you want to minimize the number of evaluations

  • When you want a more sample-efficient search compared to random or genetic approaches

  • When the feature space is large but you suspect good solutions exist in specific regions

Bayesian Optimization Parameters

Parameter
Type
Default
Description

bayesian_num_iterations

Integer

100

Total number of optimization iterations

bayesian_initial_random

Integer

20

Number of random samples before starting Bayesian optimization

bayesian_exploration_weight

Double

0.1

Exploration weight for UCB (Upper Confidence Bound) acquisition function

bayesian_num_candidates

Integer

100

Number of candidate feature sets evaluated per iteration

bayesian_local_search_steps

Integer

10

Number of local search steps for solution refinement

bayesian_embedding_dim

Integer

50

Dimension for random embedding when dealing with high-dimensional feature spaces

bayesian_top_k_memory

Integer

200

Number of top observations kept in memory for surrogate model

bayesian_stagnation_limit

Integer

30

Number of iterations without improvement before early stopping

Sample Request

Last updated