Benchmarks
Benchmark results
In our release publication we compare Qibo performance with other publicly available libraries for quantum circuit simulation and we provide results from different hardware configurations. For convenience the results can be found in the following examples for various tasks related to circuit or adiabatic evolution simulation:
The libraries used in these benchmarks are shown in the table below with their respective default simulation precision and supported hardware configurations.
Library |
Precision |
Hardware |
---|---|---|
single/double |
multi-thread CPU, GPU, multi-GPU |
|
single |
single-thread CPU |
|
single |
single-thread CPU |
|
double |
single-thread CPU |
|
double |
single-thread CPU |
|
double |
multi-thread CPU |
|
single |
multi-thread CPU, GPU |
|
double |
multi-thread CPU, GPU |
The default precision and hardware configuration is used for all libraries. Single-thread Qibo numbers were obtained using the taskset utility to restrict the number of threads.
All results presented in the above pages are produced with an
NVIDIA DGX Station.
The machine specification includes 4x NVIDIA Tesla V100 with
32 GB of GPU memory each, and an Intel Xeon E5-2698 v4 with 2.2 GHz
(20-Core/40-Threads) with 256 GB of RAM.
The operating system of this machine is the default Ubuntu 18.04-LTS with
CUDA/nvcc 10.1
, TensorFlow 2.2.0 and g++ 7.5
.
The following sections describe how to run Qibo benchmarks using the scripts found at: https://github.com/qiboteam/qibo/tree/master/examples/benchmarks.
How to run circuit benchmarks?
The main benchmark script is main.py
. This can be
executed as python main.py (OPTIONS)
where (OPTIONS)
can be any of the
following options:
--nqubits
(int
): Number of qubits in the circuit.--circuit
(str
): Circuit to execute. Read the next section for a list of available circuits. Some circuit types support additional options which are described below. Quantum Fourier Transform is the default benchmark circuit.--backend
(str
): Qibo backend to use for the calculation. See Simulation backends for more information on the calculation backends.qibojit
is the default backend.--precision
(str
): Complex number precision to use for the benchmark. Available options are single and double precision. Default is double.--nreps
(int
): Number of repetitions for the circuit execution.--nshots
(int
): Number of measurement shots. This will benchmark the sampling of frequencies, not individual shot samples. If not given no measurements will be performed and the benchmark will terminate once the final state vector is found.--fuse
(bool
): Use Circuit fusion to reduce the number of gates in the circuit. Default isFalse
.--transfer
(bool
): Transfer the final state vector from GPU to CPU and measure the required time.--device
(str
): Device to use for the benchmarks. Example:--device /GPU:0
or--device /CPU:0
. Note that GPU is not supported by all backends. If a GPU and a supporting backend is available it will be the default choice.--accelerators
(str
): Devices to use for distributed execution of the circuit. Example:--accelerators 1/GPU:0,1/GPU:1
will distribute the execution on two GPUs. The coefficient of each device denotes the number of times to reuse this device. Seeqibo.core.distcircuit.DistributedCircuit
for more details in the distributed implementation.--memory
(int
): Limits GPU memory used for execution. Relevant only for Tensorflow backends, as Tensorflow uses the full GPU memory by default.--threading
(str
): Selects numba threading layer. Relevant for the qibojit backend on CPU only. See Numba threading layers for more details.--compile
(bool
): Compile the circuit usingtf.function
. Available only when using the tensorflow backend. Default isFalse
.
When a benchmark is executed, the total simulation time will be printed in the
terminal once the simulation finishes. Optionally execution times can be saved
by passing the --filename
(str
) flag. All benchmarks details are logged
in a Python dictionary and saved in a text file using json.dump
. The logs
include circuit creation and simulation times. If the given filename
already
exists it will be updated, otherwise it will be created.
Available circuit types
As explained above, the circuit to be used in the benchmarks can be selected
using the --type
flag. This accepts one of the following options:
qft
: Circuit for Quantum Fourier Transform. The circuit contains SWAP gates that rearrange output qubits to their original input order.variational
: Example of a variational circuit. Contains layer of parametrizedRY
gates followed by a layer of entanglingCZ
gates. The parameters ofRY
gates are sampled randomly from 0 to 2pi. Supports the following options:--nlayers
: Total number of layers.
opt-variational
: Same asvariational
using theqibo.abstractions.gates.VariationalLayer
. This gate optimizes execution by fusing the parametrized with the entangling gates before applying them to the state vector. Supports the following options:--nlayers
: Total number of layers.
one-qubit-gate
: Single one-qubit gate applied to all qubits. Supports the following options:--gate-type
: Which one-qubit gate to use.--nlayers
: Total number of layers.--theta
: Value of the free parameter (for parametrized gates).
two-qubit-gate
: Single two-qubit gate applied to all qubits. Supports the following options:--gate-type
: Which two-qubit gate to use.--nlayers
: Total number of layers.--theta
(and/or--phi
): Value of the free parameter (for parametrized gates).
ghz
: Circuit that prepares the GHZ state.
How to run VQE benchmarks?
It is possible to run a VQE optimization benchmark using vqe.py
. This
attempts to find the ground state of the qibo.hamiltonians.XXZ
Hamiltonian using a variational circuit ansatz consisting of RY and CZ gates
and supports the following options:
--nqubits
(int
): Number of qubits in the circuit.--nlayers
(int
): Total number of layers in the circuit.--method
(str
): Optimization method. Default is scipy’s Powell method.--maxiter
(int
): Maximum number of iterations for the optimizer. Default isNone
.--backend
(str
): Qibo backend to use. See Simulation backends for more information on the calculation backends. Default isqibojit
.--varlayer
(bool
): IfTrue
theqibo.abstractions.gates.VariationalLayer
will be used to construct the circuit, otherwise plainRY
andCZ
gates will be used. Default isFalse
.--filename
(str
): Name of the file to save benchmark logs.
The script will perform the VQE minimization and will print the optimal energy found and its difference with the exact ground state energy. It will also show the total execution time.
How to run QAOA benchmarks?
It is possible to run a QAOA optimization benchmark using qaoa.py
. This
attempts to find the ground state of the qibo.hamiltonians.XXZ
Hamiltonian using the Quantum Approximate Optimization algorithm and supports
the following options:
--nqubits
(int
): Number of qubits in the circuit.--nangles
(int
): Number of variational parameters in the QAOA ansatz. The parameters are initialized according to uniform distribution in [0, 0.1].--dense
(bool
): IfTrue
it uses the full Hamiltonian matrix to perform the unitaries, otherwise it will use the Trotter decomposition of the operators. Default isFalse
.--solver
(str
): Solvers to use for applying the exponential operators.--method
(str
): Optimization method. Default is scipy’s Powell method.--maxiter
(int
): Maximum number of iterations for the optimizer. Default isNone
.--filename
(str
): Name of the file to save benchmark logs.
The script will perform the QAOA minimization and will print the optimal energy found and its difference with the exact ground state energy. It will also show the total execution time.
How to run time evolution benchmarks?
Time evolution benchmarks can be run using evolution.py
. This performs an
adiabatic evolution with qibo.hamiltonians.X()
as the easy Hamiltonian
and qibo.hamiltonians.TFIM()
as the problem Hamiltonian and supports the
following options:
--nqubits
(int
): Number of qubits in the circuit.--dt
(float
): Time step for the evolution algorithm.--solver
(str
): Solvers to use for evolving the state.--dense
(bool
): IfTrue
it uses the full Hamiltonian matrix to evolve the system, otherwise it will perform the Trotter decomposition. Default isFalse
.--accelerators
(str
): Devices to use for distributed execution of the circuit. Seeqibo.core.distcircuit.DistributedCircuit
for more details on the distributed implementation.--maxiter
(int
): Maximum number of iterations for the optimizer. Default isNone
.--filename
(str
): Name of the file to save benchmark logs.
The script will perform the QAOA minimization and will print the optimal energy found and its difference with the exact ground state energy. It will also show the total execution time.