156-Qubit Quantum Reservoir Computing: The Largest Demonstration on Real Hardware

Daniel Mo Houshmand avatar
Daniel Mo Houshmand
Cover for 156-Qubit Quantum Reservoir Computing: The Largest Demonstration on Real Hardware

We have completed the largest quantum reservoir computing experiment ever performed on real quantum hardware. Using IBM’s 156-qubit Heron r2 processor, we surpassed all previous demonstrations in the field. But the most important discovery was not about scale. It was about a fundamental crisis that changes how we must think about quantum machine learning.

What is Quantum Reservoir Computing?

Before diving into our results, let us explain what quantum reservoir computing actually does. If you have ever seen a pond rippling after you throw a stone, you have seen a reservoir in action.

A reservoir is any complex dynamical system that transforms inputs into rich patterns. When you throw a stone into a pond, the simple input (the splash) creates complex rippling patterns across the entire surface. These patterns encode information about the stone’s size, speed, and entry angle in ways that would be difficult to compute directly.

Reservoir computing harnesses this principle for machine learning. Instead of training every connection in a neural network (which is computationally expensive), you use a fixed dynamical system as your reservoir. You only train a simple linear readout that interprets the reservoir’s patterns.

Quantum reservoir computing uses a quantum system as the reservoir. Quantum systems are extraordinarily complex. Even a modest 50-qubit system has more possible states than there are atoms in the observable universe. This complexity could provide computational power that no classical reservoir can match.

Figure 1: The quantum reservoir circuit architecture. Parameterized rotation gates (Ry) encode input data, while CNOT gates create entanglement between qubits. The measurement outcomes form the reservoir's feature vector.
Figure 1: The quantum reservoir circuit architecture. Parameterized rotation gates (Ry) encode input data, while CNOT gates create entanglement between qubits. The measurement outcomes form the reservoir's feature vector.

Mathematical Framework of Reservoir Computing

The mathematical foundation of reservoir computing relies on several key equations. Understanding these is essential for grasping the sample efficiency crisis.

Reservoir State Evolution

The reservoir state evolves according to a recurrent update rule. At each time step, the new state depends on both the current input and the previous state:

x(t+1)=f(Winu(t)+Wx(t))\mathbf{x}(t+1) = f\bigl(\mathbf{W}_{\text{in}} \mathbf{u}(t) + \mathbf{W} \mathbf{x}(t)\bigr)
(1)
The fundamental reservoir state update equation, where the new state combines input injection and recurrent dynamics through a nonlinear activation function.

Here x(t)\mathbf{x}(t) is the reservoir state vector, u(t)\mathbf{u}(t) is the input, Win\mathbf{W}_{\text{in}} is the input weight matrix, W\mathbf{W} is the recurrent weight matrix, and ff is a nonlinear activation function (typically tanh\tanh).

Output Computation

The output is computed as a linear combination of reservoir states:

y(t)=Woutx(t)\mathbf{y}(t) = \mathbf{W}_{\text{out}} \mathbf{x}(t)
(2)
The output layer is a simple linear transformation of the reservoir state, making training efficient through linear regression.

The key insight is that only Wout\mathbf{W}_{\text{out}} needs to be trained. The reservoir weights W\mathbf{W} and Win\mathbf{W}_{\text{in}} remain fixed.

Our Circuit Architecture

Our quantum reservoir uses a layered circuit design with three key components.

The first component is input encoding. We use parameterized Ry rotation gates to encode classical data into the quantum state. Each data point gets mapped to a rotation angle. This transforms our time series data into quantum amplitudes.

Quantum Feature Map

The quantum feature map encodes classical data into the quantum Hilbert space:

ϕ(x)=i=1nRy(θi)0\phi(\mathbf{x}) = \bigotimes_{i=1}^{n} R_y(\theta_i) |0\rangle
(3)
The quantum feature map transforms classical input data into quantum states through parameterized rotation gates.

where θi=arccos(xi)\theta_i = \arccos(x_i) maps the classical input to rotation angles.

The second component is entanglement generation. CNOT gates create quantum correlations between qubits. These correlations are essential. Without entanglement, our quantum system would just be many independent classical bits. The entanglement creates the rich, complex dynamics that make quantum reservoirs powerful.

Entangling Layer

The entangling unitary creates correlations:

Uent=i=1n1CNOTi,i+1U_{\text{ent}} = \prod_{i=1}^{n-1} \text{CNOT}_{i,i+1}
(4)
CNOT gates create entanglement between neighboring qubits, generating the quantum correlations essential for reservoir computing.

The third component is measurement. We measure each qubit and record the expectation values. These measurements form our feature vector. A 156-qubit system produces 156 raw features, which we then expand through polynomial feature engineering.

Expectation Values

The features extracted from the quantum system are expectation values of Pauli operators:

Zi=Tr(ρZi)\langle Z_i \rangle = \text{Tr}\bigl(\rho \, Z_i\bigr)
(5)
Quantum features are extracted as expectation values of Pauli-Z operators on each qubit.

The Three Systems We Tested

We designed our experiment to systematically compare quantum reservoirs across different scales and configurations.

System 1: Small-Scale IBM (4 Qubits)

Our first system used just 4 qubits on IBM hardware with 50 training samples. This represents a well-controlled regime where we have abundant data relative to the number of features.

With 4 qubits producing 9 features (after our feature engineering), we achieve 5.56 samples per feature. This is a comfortable operating point. The classical readout layer has plenty of examples to learn from.

System 2: Large-Scale IBM Heron r2 (156 Qubits)

Our second system scaled to IBM’s full 156-qubit Heron r2 processor with 200 training samples. This is the largest quantum reservoir computing experiment on real hardware ever reported.

The Heron r2 is IBM’s latest generation of quantum processors, featuring improved coherence times and gate fidelities compared to earlier generations. We chose this system specifically to push the boundaries of what is possible on current hardware.

With 156 qubits, we generate 156 raw features. The samples-per-feature ratio drops to just 1.28. This is dangerously low.

System 3: Simulated Rigetti Novera (9 Qubits)

Our third system used a high-fidelity simulation of Rigetti’s Novera processor with 640 training samples. We applied the full Steinegger-Räth feature engineering methodology, expanding the 9-qubit measurements to 3,375 features.

Despite having just 9 qubits, this system represents the state of the art in quantum reservoir feature extraction. The polynomial expansion captures nonlinear combinations that dramatically increase representational power.

Summary of Experimental Systems

Table 1: Quantum System Configurations

SystemQubitsTraining SamplesFeaturesSamples/FeatureHardware
IBM Small45095.56IBM Quantum
IBM Heron r21562001561.28IBM Heron r2
Rigetti Novera96403,3750.19Simulation

The Task: Spectral Energy Prediction

All three systems tackled the same prediction task: forecasting the evolution of spectral energy data. This is a challenging time series prediction problem that requires the reservoir to capture both short-term dynamics and longer-range patterns.

Figure 2: The spectral energy time series used in our experiments. The data exhibits complex oscillatory behavior with multiple frequency components, making it an ideal benchmark for reservoir computing.
Figure 2: The spectral energy time series used in our experiments. The data exhibits complex oscillatory behavior with multiple frequency components, making it an ideal benchmark for reservoir computing.

Spectral energy prediction matters for applications ranging from audio processing to financial forecasting to physical simulations. A quantum reservoir that can predict spectral evolution could have immediate practical applications.

Performance Results: The Surprise

We evaluate prediction accuracy using the coefficient of determination:

R2=1i(yiy^i)2i(yiyˉ)2R^2 = 1 - \frac{\sum_i (y_i - \hat{y}_i)^2}{\sum_i (y_i - \bar{y})^2}
(6)
The R² coefficient measures how well predictions match actual values, where 1.0 indicates perfect prediction.

Here are the R2R^2 scores (a measure of prediction accuracy, where 1.01.0 is perfect):

The 4-qubit system achieved R2=0.784R^2 = 0.784. This is solid performance. The quantum reservoir successfully captures the dynamics of the spectral data and produces accurate forecasts.

The 156-qubit system achieved R2=0.723R^2 = 0.723. Despite having 39×39\times more qubits, performance actually decreased. More quantum resources led to worse results.

The 9-qubit Rigetti simulation achieved R2=0.959R^2 = 0.959. The smallest qubit count produced the best performance by a significant margin.

Table 2: Performance Results Summary

SystemQubitsR2R^2 ScoreTraining ErrorValidation ErrorOverfitting
IBM Small40.7840.180.22Low
IBM Heron r21560.7230.020.28Severe
Rigetti Novera90.9590.040.04Minimal
Figure 3: Performance comparison across our three quantum systems. The inverse relationship between qubit count and prediction accuracy reveals the sample efficiency crisis.
Figure 3: Performance comparison across our three quantum systems. The inverse relationship between qubit count and prediction accuracy reveals the sample efficiency crisis.

These results were not what we expected. The conventional wisdom in quantum machine learning has been that more qubits provide more computational power. Our experiments show this is dangerously oversimplified.

The Sample Efficiency Crisis Explained

Why did more qubits lead to worse performance? The answer lies in sample efficiency.

Machine learning requires data. Every feature you extract from your quantum system is a dimension that your classical readout must learn to interpret. More features means you need more training samples to avoid overfitting.

Overfitting happens when your model memorizes the training data instead of learning the underlying patterns. A model with 156 features and only 200 training samples has enormous capacity to memorize. It fits the training data perfectly but fails on new data.

Figure 4: Sample efficiency analysis showing the critical relationship between samples-per-feature ratio and model performance. Below roughly 2 samples per feature, overfitting dominates.
Figure 4: Sample efficiency analysis showing the critical relationship between samples-per-feature ratio and model performance. Below roughly 2 samples per feature, overfitting dominates.

The mathematics is straightforward. If you have NN features and MM training samples, your samples-per-feature ratio is M/NM/N. When this ratio drops below approximately 22, you enter a danger zone where overfitting becomes severe.

η=MN=training samplesnumber of features\eta = \frac{M}{N} = \frac{\text{training samples}}{\text{number of features}}
(7)
The critical ratio that determines whether a model will overfit. Values below ~2 indicate danger.

Our 4-qubit system: 50/9=5.5650 / 9 = 5.56 - Safe.

Our 156-qubit system: 200/156=1.28200 / 156 = 1.28 - Dangerous.

Our 9-qubit simulation: 640/3375=0.19640 / 3375 = 0.19 - This should be catastrophic, but aggressive ridge regularization saves it.

Effective Dimension

The effective dimension captures how many features the model actually uses:

Deff=Tr(XX(XX+αI)1)D_{\text{eff}} = \text{Tr}\bigl(\mathbf{X}^\top \mathbf{X} (\mathbf{X}^\top \mathbf{X} + \alpha \mathbf{I})^{-1}\bigr)
(8)
The effective dimension accounts for regularization, which reduces the actual number of parameters the model uses.

where α\alpha is the regularization parameter. Strong regularization reduces effective dimension, allowing models to work with fewer samples.

Why Ridge Regularization Matters

Ridge regularization is a technique that penalizes large weights in the readout layer. Instead of finding the weights that perfectly fit the training data, ridge regression finds weights that fit reasonably well while staying small.

The ridge regression objective function is:

Wout=argminW(XWY22+αW22)\mathbf{W}_{\text{out}} = \arg\min_{\mathbf{W}} \left( \|\mathbf{X}\mathbf{W} - \mathbf{Y}\|_2^2 + \alpha \|\mathbf{W}\|_2^2 \right)
(9)
Ridge regression minimizes squared error plus a penalty on weight magnitude, preventing overfitting.

This has the closed-form solution:

Wout=(XX+αI)1XY\mathbf{W}_{\text{out}} = (\mathbf{X}^\top \mathbf{X} + \alpha \mathbf{I})^{-1} \mathbf{X}^\top \mathbf{Y}
(10)
The optimal ridge regression weights can be computed directly using this matrix formula.

Small weights mean the model cannot memorize individual training examples. It is forced to find patterns that generalize.

The Rigetti simulation demonstrates the power of proper regularization. Even with an extremely low samples-per-feature ratio, careful tuning of the regularization strength achieves excellent performance.

Figure 5: Learning curves showing how prediction error decreases with training samples. The gap between training and validation error indicates overfitting, which is most severe for the 156-qubit system.
Figure 5: Learning curves showing how prediction error decreases with training samples. The gap between training and validation error indicates overfitting, which is most severe for the 156-qubit system.

The learning curves reveal the overfitting directly. For the 156-qubit system, training error is near zero (perfect memorization) while validation error remains high (poor generalization). The 9-qubit system with proper regularization shows training and validation errors converging, indicating genuine learning.

Forecast Trajectories: Seeing the Predictions

What do the actual predictions look like? Here we show forecast trajectories comparing ground truth with each system’s predictions.

Figure 6: Forecast trajectories showing predicted versus actual spectral evolution. The 9-qubit Rigetti system tracks the ground truth most closely, while the 156-qubit system shows systematic deviations.
Figure 6: Forecast trajectories showing predicted versus actual spectral evolution. The 9-qubit Rigetti system tracks the ground truth most closely, while the 156-qubit system shows systematic deviations.

The 9-qubit system (green) closely tracks the ground truth (black dashed). It captures both the amplitude and phase of the oscillations with remarkable accuracy.

The 4-qubit system (blue) performs reasonably well but shows some phase drift on longer time horizons. This is expected given the limited representational capacity of 4 qubits.

The 156-qubit system (red) shows the largest deviations. Despite having the most quantum resources, it produces the least accurate forecasts. This is the sample efficiency crisis made visible.

Validating on Chaotic Systems

Time series prediction is most challenging for chaotic systems, where small errors compound exponentially. To validate our findings, we tested on two canonical chaotic attractors.

The Lorenz-63 Attractor

The Lorenz attractor is the original “butterfly effect” system, discovered by meteorologist Edward Lorenz in 1963. It is governed by the equations:

dxdt=σ(yx),dydt=x(ρz)y,dzdt=xyβz\frac{dx}{dt} = \sigma(y-x), \quad \frac{dy}{dt} = x(\rho-z)-y, \quad \frac{dz}{dt} = xy - \beta z
(11)
The Lorenz-63 system, the original chaotic attractor discovered in atmospheric modeling.

with standard parameters σ=10\sigma=10, ρ=28\rho=28, β=8/3\beta=8/3.

It has a Lyapunov exponent of λ=0.906\lambda = 0.906, meaning nearby trajectories diverge by a factor of e every 1.1 time units.

Our QRC achieved R2=0.796R^2 = 0.796 on Lorenz prediction. This is impressive given the system’s extreme sensitivity to initial conditions.

The Rössler Attractor

The Rössler attractor has milder chaos with a Lyapunov exponent of λ=0.071\lambda = 0.071. Trajectories diverge more slowly, making prediction somewhat easier.

Here we achieved R2=0.969R^2 = 0.969, near-perfect prediction of a chaotic system.

Figure 7: QRC performance across chaotic systems with different Lyapunov exponents. The 13-fold range in chaos intensity demonstrates robust generalization.
Figure 7: QRC performance across chaotic systems with different Lyapunov exponents. The 13-fold range in chaos intensity demonstrates robust generalization.

The 13-fold range in Lyapunov exponents shows our approach works across the chaos spectrum. As expected from dynamical systems theory, prediction accuracy correlates inversely with the Lyapunov exponent. More chaotic systems are harder to predict.

Table 3: Chaotic System Benchmark Results

SystemLyapunov Exponent λ\lambdaLyapunov Time τλ\tau_\lambdaR2R^2 ScorePrediction Horizon
Lorenz-630.9061.100.796~3 τλ\tau_\lambda
Rössler0.07114.10.969~5 τλ\tau_\lambda
Spectral Energy--0.959Long-term

Prediction Horizon and Lyapunov Time

The fundamental limit on chaotic prediction is set by the Lyapunov exponent:

δx(t)δx(0)eλt\|\delta \mathbf{x}(t)\| \sim \|\delta \mathbf{x}(0)\| \, e^{\lambda t}
(12)
Errors in chaotic systems grow exponentially at a rate determined by the Lyapunov exponent.

The Lyapunov time τλ=1/λ\tau_\lambda = 1/\lambda sets the timescale over which prediction is meaningful.

The Steinegger-Räth Methodology

Our best results came from applying the feature engineering framework developed by Steinegger and Räth in 2025. This methodology systematically expands the information extracted from quantum measurements.

Figure 8: The Steinegger-Räth feature engineering pipeline. Raw quantum measurements flow through temporal multiplexing, spatial reservoir copies, and polynomial expansion to create rich feature vectors.
Figure 8: The Steinegger-Räth feature engineering pipeline. Raw quantum measurements flow through temporal multiplexing, spatial reservoir copies, and polynomial expansion to create rich feature vectors.

Temporal Multiplexing (V = 5)

Instead of measuring the quantum system once, we sample at multiple evolution times. This creates “virtual nodes” that capture how the quantum state develops.

We used V = 5, meaning we take 5 measurement snapshots per input. This multiplies our feature count by 5 without adding physical qubits.

Spatial Reservoirs (r = 3)

We run 3 independent copies of the quantum evolution with different random initializations. Each copy sees the same input but responds differently due to the random components.

This is analogous to ensemble methods in classical machine learning. Multiple independent predictors combine to give more robust results.

Polynomial Expansion (G = 3)

Raw expectation values are linear features. By computing products of measurements up to degree 3, we capture nonlinear interactions.

ϕpoly(x)={xi1xi2xik:1i1ikn,  kG}\phi_{\text{poly}}(\mathbf{x}) = \bigl\{ x_{i_1} x_{i_2} \cdots x_{i_k} : 1 \leq i_1 \leq \cdots \leq i_k \leq n, \; k \leq G \bigr\}
(13)
Polynomial feature expansion creates nonlinear combinations of raw features up to a maximum degree.

If qiq_i and qjq_j are two qubit expectation values, we include not just qiq_i and qjq_j, but also qi×qjq_i \times q_j, qi2q_i^2, qj2q_j^2, qi2×qjq_i^2 \times q_j, qi×qj2q_i \times q_j^2, qi3q_i^3, qj3q_j^3, and qi×qj×qkq_i \times q_j \times q_k for three-qubit combinations.

Total Feature Count

Combined, these techniques transform 9 raw qubit measurements into 3,375 features:

Nfeatures=V×r×(n+GG)=5×3×225=3375N_{\text{features}} = V \times r \times \binom{n + G}{G} = 5 \times 3 \times 225 = 3375
(14)
The total feature count after applying all feature engineering transformations.

where (n+GG)\binom{n+G}{G} is the number of polynomial features from nn qubits at degree GG.

Quantum Kernel Perspective

Our quantum reservoir can also be understood through the lens of quantum kernels:

K(x,x)=ϕ(x)ϕ(x)2K(\mathbf{x}, \mathbf{x}') = \bigl|\langle \phi(\mathbf{x}) | \phi(\mathbf{x}') \rangle\bigr|^2
(15)
The quantum kernel measures similarity between data points in the quantum feature space.

The kernel alignment with the target function determines learning performance:

A(K,y)=K,yyFKFyyFA(K, \mathbf{y}) = \frac{\langle K, \mathbf{y}\mathbf{y}^\top \rangle_F}{\|K\|_F \|\mathbf{y}\mathbf{y}^\top\|_F}
(16)
Kernel alignment measures how well the kernel structure matches the prediction task.

The Barren Plateau Problem

Large quantum systems face the barren plateau phenomenon, where gradients vanish exponentially:

Var[Oθ]O(2n)\text{Var}\left[\frac{\partial \langle O \rangle}{\partial \theta}\right] \sim O(2^{-n})
(17)
The barren plateau scaling shows that gradients vanish exponentially with system size, making training increasingly difficult.

For 156 qubits, this implies gradient variance of order 104710^{-47}, making gradient-based training impossible. This is another reason why the reservoir computing approach (training only the output layer) is essential for large quantum systems.

Noise Characterization

Real quantum hardware is noisy. Understanding how noise affects our results is essential for practical applications.

Figure 9: Noise characterization across the IBM Heron r2 processor. Gate errors and decoherence vary significantly across the chip, affecting different qubit subsets differently.
Figure 9: Noise characterization across the IBM Heron r2 processor. Gate errors and decoherence vary significantly across the chip, affecting different qubit subsets differently.

The noise model includes depolarizing errors:

E(ρ)=(1p)ρ+p3(XρX+YρY+ZρZ)\mathcal{E}(\rho) = (1-p)\rho + \frac{p}{3}(X\rho X + Y\rho Y + Z\rho Z)
(18)
The depolarizing channel models random quantum errors by mixing the state with the maximally mixed state.

The Heron r2 processor shows significant variation in noise levels across the chip. Some qubits have error rates below 0.1%, while others exceed 1%. We characterized readout errors, single-qubit gate errors, and two-qubit gate errors for the full processor.

Interestingly, some noise may actually help reservoir computing. Noise introduces the kind of irreversibility that reservoirs need for the “fading memory” property. The optimal noise level balances information injection (which requires some noise) against information corruption (which destroys the signal).

Memory Capacity Analysis

The memory capacity of a reservoir measures how well it can recall past inputs:

MC=k=1r2(k)=k=1Cov2(y(t),u(tk))Var(y(t))Var(u(tk))MC = \sum_{k=1}^{\infty} r^2(k) = \sum_{k=1}^{\infty} \frac{\text{Cov}^2(y(t), u(t-k))}{\text{Var}(y(t)) \cdot \text{Var}(u(t-k))}
(19)
Memory capacity sums the squared correlations between output and delayed inputs across all delays.

For linear reservoirs, memory capacity is bounded by the number of nodes. Quantum reservoirs can potentially exceed this bound through quantum correlations.

Correlation Analysis

What patterns does the quantum reservoir learn? We analyzed the correlation structure of our feature vectors.

Figure 10: Correlation matrix of quantum reservoir features. Strong correlations appear in blocks corresponding to physically connected qubits, reflecting the entanglement structure of the circuit.
Figure 10: Correlation matrix of quantum reservoir features. Strong correlations appear in blocks corresponding to physically connected qubits, reflecting the entanglement structure of the circuit.

The correlation matrix reveals interesting structure. Features from physically connected qubits (those linked by CNOT gates) show strong correlations. This is the entanglement making itself visible in the measurements.

Uncorrelated features are more informative, since each provides independent information. The off-diagonal structure suggests our circuit could be optimized to reduce redundant correlations.

Ablation Studies

Which components of our methodology matter most? We performed ablation studies, systematically removing each component and measuring the impact.

Figure 11: Ablation study results. Removing polynomial expansion has the largest impact, followed by temporal multiplexing. Spatial reservoirs contribute least to final performance.
Figure 11: Ablation study results. Removing polynomial expansion has the largest impact, followed by temporal multiplexing. Spatial reservoirs contribute least to final performance.

Polynomial expansion is the most critical component. Without it, performance drops by 35%. The nonlinear feature combinations capture essential dynamics that linear features miss.

Temporal multiplexing is second most important. Without multiple measurement times, we lose 20% performance. The quantum state’s evolution contains information that a single snapshot misses.

Spatial reservoirs contribute least. Removing them only costs 8% performance. This suggests that for our task, a single well-tuned reservoir may be nearly as good as an ensemble.

Table 4: Ablation Study Results

Component RemovedR2R^2 ImpactRelative DropImportance Rank
Polynomial Expansion−0.335−35%1 (Critical)
Temporal Multiplexing−0.192−20%2 (High)
Ridge Regularization−0.156−16%3 (High)
Spatial Reservoirs−0.077−8%4 (Moderate)
None (Baseline)0.959--

Topology Comparison

We compared different circuit topologies to understand how connectivity affects reservoir quality.

Figure 12: Performance comparison across circuit topologies. All-to-all connectivity performs best, but linear chains are surprisingly competitive and much easier to implement.
Figure 12: Performance comparison across circuit topologies. All-to-all connectivity performs best, but linear chains are surprisingly competitive and much easier to implement.

All-to-all connectivity (where every qubit can interact with every other) gives the best performance. This maximizes the reservoir’s ability to mix information across all inputs.

Surprisingly, simple linear chains (where each qubit connects only to its neighbors) are competitive. They achieve 90% of the all-to-all performance while being much easier to implement on real hardware.

Ring topologies fall between these extremes. The additional connection from the last qubit to the first provides modest improvement over linear chains.

Computational Cost Analysis

Quantum computing is expensive. We analyzed the computational cost of each approach.

Figure 13: Computational cost breakdown showing circuit executions, classical post-processing, and total wall-clock time. The 156-qubit system requires the most resources despite worse performance.
Figure 13: Computational cost breakdown showing circuit executions, classical post-processing, and total wall-clock time. The 156-qubit system requires the most resources despite worse performance.

The 156-qubit system requires the most quantum resources by far. Each circuit execution takes longer due to the larger circuit depth, and we need more shots (repeated measurements) to get reliable statistics from a larger system.

The number of shots needed scales with desired precision:

Nshots1ϵ2log(2δ)N_{\text{shots}} \geq \frac{1}{\epsilon^2} \log\left(\frac{2}{\delta}\right)
(20)
The number of measurement shots required scales inversely with the square of desired precision.

for precision ϵ\epsilon with confidence 1δ1-\delta.

Paradoxically, we pay more for worse results. The 9-qubit simulation delivers the best performance at the lowest quantum cost. The classical post-processing for polynomial expansion is cheap compared to quantum circuit execution.

This has major implications for practical quantum machine learning. Throwing more qubits at a problem may waste quantum resources while degrading results.

Generalization Bounds

The generalization error can be bounded using Rademacher complexity:

LtestLtrain+2Rn(F)+3log(2/δ)2n\mathcal{L}_{\text{test}} \leq \mathcal{L}_{\text{train}} + 2\mathcal{R}_n(\mathcal{F}) + 3\sqrt{\frac{\log(2/\delta)}{2n}}
(21)
The generalization bound shows how prediction error on new data relates to training error and model complexity.

where Rn(F)\mathcal{R}_n(\mathcal{F}) is the Rademacher complexity of the hypothesis class. For quantum reservoirs, this complexity grows with the number of features, explaining the sample efficiency crisis.

Implications for Quantum Machine Learning

Our findings challenge conventional wisdom in several ways.

Qubit Count is Not Everything

The quantum machine learning community has celebrated each increase in qubit count as progress. Our results show this is dangerously naive. A 156-qubit system performs worse than a 9-qubit system when sample efficiency is ignored.

Future benchmarks must report samples-per-feature ratios alongside qubit counts. A paper claiming “quantum advantage with 1000 qubits” means nothing without knowing how much training data was used.

Classical Post-Processing is Not Auxiliary

The quantum community sometimes treats classical components as “just” readout, focusing attention on the quantum parts. Our results show classical post-processing is essential.

The Rigetti simulation’s success comes from careful regularization and feature engineering. Without this classical sophistication, the quantum features would be useless.

Quantum and classical components must be co-designed. The best quantum reservoir is worthless without proper classical interpretation.

The Optimal Scale is Task-Dependent

There is no universal “best” qubit count. The optimal scale depends on available training data, the prediction task’s complexity, and the feature engineering approach.

For typical experimental data volumes of 100-1000 samples, 8-16 qubits appears optimal. Scaling beyond this requires exponentially more training data to maintain sample efficiency.

Future Directions

Several research directions emerge from our work.

Adaptive feature selection could identify which quantum features are most informative for a given task. Instead of using all polynomial combinations, we could select a sparse subset that maximizes predictive power.

Data augmentation techniques from classical machine learning might help stretch limited training sets. If we can synthetically generate additional training samples, we could support larger quantum reservoirs.

Transfer learning could allow models trained on one task to bootstrap learning on related tasks. This would effectively increase the sample efficiency by reusing learned representations.

Hardware-aware circuit design could optimize topologies for specific quantum processors. If we know which qubits are noisiest, we can design circuits that minimize their impact.

Conclusion

We have presented the largest quantum reservoir computing experiment on real quantum hardware: 156 qubits on IBM’s Heron r2 processor. But our key contribution is not the scale. It is the discovery of a fundamental sample efficiency crisis.

More qubits do not automatically mean better results. Without proportional increases in training data, larger quantum systems perform worse due to overfitting. The field must move beyond qubit counting toward sample-efficient quantum machine learning.

Our results point toward practical quantum reservoir computing. Small, well-engineered systems with sophisticated classical post-processing outperform large, naively deployed quantum processors. This is good news for near-term applications, as small systems are easier to build and operate.

The path to quantum advantage in machine learning runs through sample efficiency, not just scale.

Read the Full Paper

The complete paper with additional technical details is available:

This research was conducted as part of QDaria’s quantum machine learning program. For collaboration inquiries, contact mo@qdaria.com.


Appendix

List of Figures

13 figures

List of Tables

List of Equations

Nomenclature

SymbolDefinitionUnit
Coefficient of determination (prediction accuracy)
Lyapunov exponent (chaos measure)
Number of features
Number of training samples
Number of temporal multiplexing steps
Number of spatial reservoir copies
Polynomial expansion degree
Ridge regularization parameter
Sample efficiency ratio
Effective dimension
Quantum Reservoir Computing
Controlled-NOT quantum gate
Rotation gate around Y-axis by angle θ
Quantum density matrix
Pauli-Z operator on qubit i

Frequently Asked Questions

8 questions
Q1 What is quantum reservoir computing and how does it differ from classical reservoir computing?
Quantum reservoir computing uses a quantum system as the dynamical reservoir instead of a classical one. The exponentially large Hilbert space of quantum systems (2^n states for n qubits) provides potentially richer dynamics than classical reservoirs. However, this also creates challenges with sample efficiency as shown in our research.
Q2 Why did the 156-qubit system perform worse than the 9-qubit system?
The 156-qubit system suffered from severe overfitting due to insufficient training data. With 156 features but only 200 training samples (1.28 samples per feature), the classical readout layer memorized the training data rather than learning generalizable patterns. The 9-qubit system with proper regularization achieved better generalization despite fewer quantum resources.
Q3 What is the sample efficiency crisis?
The sample efficiency crisis refers to the fundamental problem that larger quantum systems require exponentially more training data to avoid overfitting. Each additional qubit adds features that need to be learned, but training data in real applications is often limited. This means simply adding more qubits can actually degrade performance.
Q4 How many qubits are optimal for quantum reservoir computing?
Based on our experiments, 8-16 qubits appears optimal for typical experimental data volumes of 100-1000 samples. The optimal scale depends on available training data, task complexity, and feature engineering approach. More qubits require proportionally more training data.
Q5 What is the Steinegger-Räth methodology?
The Steinegger-Räth methodology is a feature engineering framework that systematically expands quantum measurements through: (1) temporal multiplexing (multiple measurement snapshots), (2) spatial reservoirs (multiple independent quantum evolutions), and (3) polynomial expansion (computing products of measurements). This transforms 9 raw qubit measurements into 3,375 rich features.
Q6 Can noise help quantum reservoir computing?
Interestingly, some noise may actually help reservoir computing by introducing irreversibility needed for the 'fading memory' property. The optimal noise level balances information injection against information corruption. However, excessive noise destroys the quantum signal entirely.
Q7 What are the practical implications for quantum machine learning?
Our findings suggest that: (1) qubit count alone is not a useful metric for quantum ML progress, (2) classical post-processing is essential and not auxiliary, (3) small well-engineered systems can outperform large naive ones, and (4) future benchmarks must report samples-per-feature ratios alongside qubit counts.
Q8 Where can I access the full research paper?
The complete paper with additional technical details is available on Zenodo (Open Access) and ResearchGate.