Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/microsoft/onnxruntime/llms.txt

Use this file to discover all available pages before exploring further.

SessionOptions

The SessionOptions class allows you to configure various aspects of an InferenceSession, including graph optimization, thread pool sizes, execution providers, and profiling.

Constructor

SessionOptions()
Creates a new SessionOptions object with default settings.

Properties

graph_optimization_level
GraphOptimizationLevel
Controls the level of graph optimizations applied.
  • ORT_DISABLE_ALL - No optimizations
  • ORT_ENABLE_BASIC - Basic optimizations (default)
  • ORT_ENABLE_EXTENDED - Extended optimizations
  • ORT_ENABLE_ALL - All optimizations including layout transformations
intra_op_num_threads
int
Number of threads used to parallelize execution within nodes. Default is 0 (use default number).
inter_op_num_threads
int
Number of threads used to parallelize execution of nodes. Default is 0 (use default number).
execution_mode
ExecutionMode
Controls whether operators are executed sequentially or in parallel.
  • ORT_SEQUENTIAL - Execute operators sequentially
  • ORT_PARALLEL - Execute operators in parallel when possible
execution_order
ExecutionOrder
Controls the order in which graph nodes are executed.
  • DEFAULT - Use default topological order
  • PRIORITY_BASED - Use priority-based scheduling
enable_profiling
bool
Enable profiling to collect performance data. Default is False.
optimized_model_filepath
str
Path to save the optimized model. If set, the optimized graph will be saved to this location.
log_severity_level
int
Logging verbosity level (0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal). Default is 2.
log_verbosity_level
int
VLOG level for verbose logging. Default is 0.
enable_mem_pattern
bool
Enable memory pattern optimization. Default is True.
enable_mem_reuse
bool
Enable memory reuse optimization. Default is True.
enable_cpu_mem_arena
bool
Enable CPU memory arena allocator. Default is True.

Methods

add_session_config_entry()

Add custom session configuration entry.
add_session_config_entry(
    key: str,
    value: str
)
key
str
required
Configuration key.
value
str
required
Configuration value.
Common Configuration Keys:
  • session.load_model_format - Set to “ONNX” or “ORT”
  • session.use_env_allocators - Use environment allocators
  • session.record_ep_graph_assignment_info - Record EP graph assignment (“1” to enable)
  • session.disable_prepacking - Disable weight prepacking

register_custom_ops_library()

Register a shared library containing custom operators.
register_custom_ops_library(library_path: str)
library_path
str
required
Path to the shared library (.so, .dll, or .dylib).

add_external_initializers()

Add external initializers to the session.
add_external_initializers(
    names: list[str],
    values: list[OrtValue]
)
names
list[str]
required
Names of the initializers.
values
list[OrtValue]
required
OrtValue objects containing initializer data.

add_free_dimension_override_by_name()

Override a free dimension with a specific value.
add_free_dimension_override_by_name(
    dim_name: str,
    dim_value: int
)
dim_name
str
required
Name of the dimension to override.
dim_value
int
required
Value to use for the dimension.

Example Usage

Basic Configuration

import onnxruntime as ort

sess_options = ort.SessionOptions()

# Enable all optimizations
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

# Set thread counts
sess_options.intra_op_num_threads = 4
sess_options.inter_op_num_threads = 2

# Enable parallel execution
sess_options.execution_mode = ort.ExecutionMode.ORT_PARALLEL

# Create session with options
sess = ort.InferenceSession("model.onnx", sess_options=sess_options)

Enable Profiling

sess_options = ort.SessionOptions()
sess_options.enable_profiling = True

sess = ort.InferenceSession("model.onnx", sess_options=sess_options)

# Run inference
outputs = sess.run(None, inputs)

# Get profiling results
profile_file = sess.end_profiling()
print(f"Profiling data saved to: {profile_file}")

Save Optimized Model

sess_options = ort.SessionOptions()
sess_options.optimized_model_filepath = "model_optimized.onnx"
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

sess = ort.InferenceSession("model.onnx", sess_options=sess_options)
# Optimized model is automatically saved

Custom Configuration

sess_options = ort.SessionOptions()

# Load ORT format model
sess_options.add_session_config_entry("session.load_model_format", "ORT")

# Record EP graph assignment for debugging
sess_options.add_session_config_entry("session.record_ep_graph_assignment_info", "1")

sess = ort.InferenceSession("model.ort", sess_options=sess_options)

# Get graph assignment info
assignment = sess.get_provider_graph_assignment_info()

Register Custom Operators

sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library("custom_ops.so")

sess = ort.InferenceSession("model_with_custom_ops.onnx", sess_options=sess_options)

Performance Tuning

# For CPU inference
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8  # Use 8 threads per op
sess_options.inter_op_num_threads = 1   # Sequential op execution
sess_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

# For GPU inference
sess_options = ort.SessionOptions()
sess_options.execution_mode = ort.ExecutionMode.ORT_PARALLEL
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL