EN
CN EN
Bringing AI robots into every household.
MODEL RELEASE · 2026

OneModel 1.7 FrontoStria-RL
A World Action Model Proven in Real-World Tests

A Latent Frontostriatal Policy Loop with Reinforcement Learning

A self-developed world action model for real household robot deployment. Through implicit policy modulation, it connects high-level world understanding with action execution, and uses a reinforcement learning closed loop to turn real-world feedback into continuous success-rate gains.

99%
LIBERO Benchmark
6/7
Best Under LIBERO-plus Perturbations
99%
Average Success Rate on Daily Tasks
Part 01

Summary

In May 2026, OneRobotics officially released its self-developed world action model, OneModel 1.7 FrontoStria-RL. Built for scaled deployment in real-world scenarios, it serves as the core model foundation behind OneRobotics' push to commercialize household and service robots.

OneModel 1.7 FrontoStria-RL adopts OneRobotics' self-designed RL-Latent World Action Model architecture (RL-LWAM): the World Model supports cross-scenario generalization, the Understand Expert handles task understanding and Skill scheduling, and the Action Expert delivers precise execution. The three modules are implicitly connected through Predictive Policy Latent. On top of this, a reinforcement learning closed loop and success memory mechanism allow every piece of real deployment feedback to flow back into the model, enabling capability to accumulate through use.

On LIBERO, a standard embodied intelligence benchmark, OneModel 1.7 reaches an average task success rate of 99%, leading mainstream public models including π0.5, GR00T-N1.5, and OpenVLA-OFT. In tests closer to real deployment, OneModel 1.7 also performs strongly: approximately 99% average success on daily manipulation tasks, approximately 97% on high-precision tasks, and over 90% receiving success in the highly dynamic, high-precision scenario of table-tennis rallying with a human.

For embodied robots to enter real living environments, task understanding is not enough; stable task completion is the threshold. OneModel 1.7 FrontoStria-RL integrates generalizable understanding, action success rate, and real-world feedback learning into a single world action model system, covering scenarios from everyday household operations to high-precision and highly dynamic interactions. It is not a one-off demo, but a model platform validated by measured results and built for scalable delivery.

Part 02

Research Background

The Home Is Embodied Intelligence's Hardest and Most Valuable Arena

The home is the key entry point for robots to reach large-scale adoption, but it is also one of the most demanding settings for intelligence. Unlike factory lines or warehouse picking, no two kitchens are exactly alike, no two living rooms are arranged the same way, and the task mix changes every day. A robot must not only complete concrete actions such as opening doors, folding clothes, carrying bowls, and tidying items, but also understand tasks and act reasonably when it encounters unfamiliar objects, changing lighting, and different home layouts.

At the same time, the capability boundary for robots is moving toward harder tasks: plugging and unplugging test tubes and pouring coffee beans require extremely small end-effector error and stable force control, while rallying table tennis with a human imposes extreme requirements on real-time perception, dynamic prediction, and high-speed response. This means embodied intelligence models must simultaneously provide strong generalization, high success rates, and dynamic adaptability in extreme scenarios.

The VLA Route: Precise Actions, Limited Generalization

Over the past two years, Vision-Language-Action (VLA) models have developed rapidly in embodied intelligence and have become an important paradigm for action generation. Their strength is mapping visual observations and language instructions directly to robot actions end to end. When training data is sufficiently covered and task boundaries are clear, VLA models can achieve high execution success rates.

But relying purely on end-to-end action mapping also creates challenges. These models are better at reproducing action patterns covered in demonstration data, while explicit modeling of task structure, stage goals, and reusable Skills remains insufficient. When object placement, tabletop material, or lighting changes, model performance can be affected; in complex multi-stage and multi-constraint tasks, the global goal is also easier to lose.

The World Model Route: Deeper Understanding, Harder Deployment

In contrast to the VLA route is the World Model. A World Model attempts to build predictive capability over environment state and task evolution on top of visual and language information, including spatial relationships among objects, the staged structure of a task, and the possible consequences of actions. In theory, this type of modeling naturally carries stronger generalization potential.

At the current stage, however, World Models still face practical bottlenecks: model size and inference cost are high, making it difficult to naturally meet the latency requirements of real-time control; uncertainty in generative prediction can lead to incorrect estimates of environment state; more importantly, there is often no effective transmission mechanism from high-level world understanding to low-level action execution. There is no ready-made bridge between "understanding the world" and "moving accurately."

The Missing Middle Layer: Tasks Are Hard to Decompose, Skills Are Hard to Reuse

Whether using VLA or a World Model, one key link is easily overlooked: structured understanding of the task itself. It is like a factory with advanced production equipment and a strong environmental monitoring system, but without SOPs or operating manuals. This is exactly what a Skill system means for robots: it tells the model how a complex task should be decomposed, ordered, and solved by reusing existing capabilities.

Real manipulation tasks often contain clear stages, subgoal dependencies, and skill-combination logic: folding clothes requires flattening first, folding next, and finally aligning the edges; dishwasher operation requires identifying dishware types, selecting placement positions, and confirming the door-closing action. These structured operating procedures belong neither purely to environment modeling in the World Model nor to action generation in VLA. They are the middle layer that connects understanding and execution.

The Missing Training Loop: Train Once, Use Once, Hard to Keep Evolving

Current mainstream embodied models share another problem: after training ends, capability improvement mainly depends on collecting new data and retraining offline. Many systems still rely heavily on imitation learning, learning policies from human demonstrations. But imitation learning is constrained by the coverage, quality, and long-tail distribution of demonstration data, making it difficult to continuously correct failure modes during real deployment.

Once deployed in real environments, a model encounters boundary cases not covered in training: the gripper slips, an object deforms, or a user intervenes unexpectedly. This is where reinforcement learning becomes valuable. With clear rewards, safety constraints, and human-in-the-loop supervision, the model can optimize policies through real task feedback, not only correcting mistakes but gradually discovering more robust and efficient execution paths.

OneModel 1.7's Path: Making World Understanding Act on Execution

These problems - insufficient generalization, difficulty grounding understanding, missing task planning, and lack of continuous evolution - are not isolated issues. They are systemic bottlenecks faced by embodied intelligence as it moves from the lab to real deployment. OneModel 1.7's RL-LWAM architecture is an integrated answer to these problems.

It uses the World Model to provide cross-scenario generalization, the Understand Expert + Skill system to handle task understanding and structured planning, and the Action Expert to ensure action-execution precision. Through Predictive Policy Latent, world understanding implicitly modulates action policy. Combined with a reinforcement learning closed loop and success memory mechanism, real-world feedback becomes model capability that can accumulate continuously. Ultimately, OneModel 1.7 is not a single-point capability display, but a world action model system built for scaled deployment.

Part 03

OneModel 1.7 FrontoStria-RL: Model Architecture

Overall Architecture

OneModel 1.7 FrontoStria-RL adopts the RL-Latent World Action Model (RL-LWAM) architecture. Its complete information flow is as follows:

Instruction / Observation / Skill → World Model → Predictive Policy Latent → Understand Expert → Action Expert → Robot Execution → RL / Success Memory / HITL ↺

OneModel 1.7 uses the RL-LWAM architecture to form a complete embodied intelligence closed loop: the World Model builds a generalizable representation of the environment; Predictive Policy Latent implicitly transmits that representation to the Understand Expert for task decomposition and Skill scheduling; the Action Expert then generates and executes precise actions. Execution results are used for policy optimization through reinforcement learning, successful experiences are written into memory for later reuse, and human-in-the-loop supervision provides safety constraints - forming an auditable, controllable, and continuously improving loop.

OneModel 1.7 FrontoStria-RL architecture diagram
Figure: Full architecture of OneModel 1.7 FrontoStria-RL. Predictive Policy Latent acts as the core transmission mechanism connecting the World Model, Understand Expert, and Action Expert.

World Model: Understanding the World to Build Generalization

The World Model is the architecture's cognitive layer. It receives environmental observations from visual sensors and natural-language task instructions, then builds a deep understanding of the current scene, including object recognition, spatial-relation reasoning, task-stage decomposition, and action-consequence estimation.

The World Model is the core source of the system's generalization capability. Even in unseen scene layouts or with unfamiliar manipulation objects, the system can still form reasonable high-level task plans - exactly the capability that purely end-to-end action mapping struggles to cover reliably.

Predictive Policy Latent: Implicit Transmission Across the Full Chain

The World Model's understanding must be transmitted to downstream modules before it can create value. Traditional approaches often rely on explicit intermediate representations, such as generated future images or target coordinate points, but these representations lose information, create tight coupling, and struggle to carry the rich, abstract understanding produced by a World Model.

One core innovation of RL-LWAM is the use of Predictive Policy Latent - an implicit policy modulation layer - to connect the World Model, Understand Expert, and Action Expert. Here, the latent is not an image or a set of explicit coordinates. It is a physical-reasoning representation learned during training with the help of future observations: during training, the model can "see" the result after an action is executed, shaping its understanding of task consequences; during deployment, it no longer depends on future information and can form equivalent action expectations from current observations alone. It transfers the World Model's understanding of scene structure and motion trends to the Understand Expert and Action Expert in a compressed, efficient, and learnable form.

This mechanism allows high-level generalizable understanding to efficiently drive task decomposition and action execution. Compared with explicit image generation, implicit modulation skips redundant pixels and generative noise, preserving only the information that truly matters for decision-making.

Understand Expert: Task Decomposition and Skill Scheduling

The Understand Expert is the architecture's planning layer. It receives modulation signals from Predictive Policy Latent and performs structured decomposition of the current task - identifying task stages, determining subgoal dependencies, and scheduling the corresponding Skill sequence - so the robot always knows which stage it is in and what to do next when facing complex long-horizon tasks.

This module enables the system to reuse existing Skills for new task combinations instead of learning from scratch each time. In long workflows, it preserves goal consistency and avoids losing the global objective due to disturbances in intermediate steps.

Action Expert: Precise Actions to Ensure Success Rate

The Action Expert is the architecture's execution layer. It receives Skill instructions from the Understand Expert and real-time visual observations, then generates continuous action plans through flow matching. The model does not learn single-step absolute displacement; it learns a continuous velocity field from noise to real action, generating a complete action sequence, or action chunk, which is then converted by the robot adapter into executable robot commands.

At the action-parameterization level, OneModel 1.7 further adopts MCF-Proto (Motion-Centric Action Frame): instead of directly predicting displacement in a fixed world coordinate system, it organizes action prototypes around task-relevant local motion structures - such as door hinges, rails, holes, and folding lines - then maps them back to real robot actions. This design keeps action generation highly stable under camera-view perturbations and robot initial-pose deviations.

Reinforcement Learning Closed Loop and Success Memory: Continuous Evolution

In real deployment, no model can be perfect on day one. OneModel 1.7 builds a complete continuous-optimization loop into the architecture.

Reinforcement learning (RL) uses real task feedback for autonomous exploration and policy optimization, enabling the model to go beyond imitation and discover better execution paths.

Success Memory, based on Retrieve-then-Steer, writes action segments that succeed during deployment into an online memory bank. When a similar scenario appears again, the system automatically retrieves verified successful experience to guide the next round of action generation, improving success rate without updating model parameters.

Human-in-the-loop supervision (HITL) provides safety constraints for high-risk tasks, balancing autonomous RL exploration with safety boundaries. Together, these components form a continuous-evolution engine that improves through use.

Part 04

Core Technical Innovations

Four core technical modules
Figure: Four core technical modules of OneModel 1.7 FrontoStria-RL.

The module diagram above summarizes the four key technical innovations of OneModel 1.7 FrontoStria-RL: Predictive Policy Latent, the Understand Expert + Skill system, MCF-Proto action parameterization, and the RL closed loop + Success Memory. These modules correspond respectively to implicit transmission from world understanding to action policy, structured task decomposition and Skill reuse, action parameterization driven by local motion structures, and a continuous-evolution loop jointly built by reinforcement learning and success memory.

1. Predictive Policy Latent

Innovation: Replaces explicit future-image generation with implicit physical-reasoning representations, enabling zero-redundancy transmission from world understanding to action policy.

Difference from existing approaches: Traditional methods rely on generated future images or explicit target coordinates to bridge high-level understanding and action execution, introducing pixel redundancy, generative hallucination, and high latency. Predictive Policy Latent uses future observations during training to shape the representation, while at deployment it can output equivalent modulation signals from current observations alone - higher information density, faster inference, and no generative noise.

2. Understand Expert + Skill System

Innovation: Introduces an independent task-planning layer between the World Model and Action Expert, enabling Skill-level structured decomposition and reuse.

Difference from existing approaches: Mainstream VLA models treat perception-to-action as an end-to-end mapping and lack stage management for long-horizon tasks. The Understand Expert gives the system the ability to "read the SOP": it can reuse existing Skills in new task combinations and preserve stage goals in long workflows instead of relearning every task from scratch.

3. MCF-Proto Action Parameterization

Innovation: Builds action coordinate frames around local motion structures, such as hinges, rails, and folding lines, and replaces direct displacement regression in world coordinates with combinations of action prototypes.

Difference from existing approaches: Action prediction in a fixed coordinate system is highly sensitive to camera-view changes and robot initial-pose deviations. MCF-Proto aligns action representation with the task's physical constraints, reducing distribution shift caused by geometric perturbations at the source and maintaining high stability under viewpoint and pose changes.

4. RL Closed Loop + Success Memory

Innovation: A dual-channel continuous-evolution mechanism: reinforcement learning breaks through the ceiling of imitation learning, while success memory enables test-time adaptation without parameter updates.

Difference from existing approaches: After offline training ends, many models mainly improve by collecting new data and retraining. OneModel 1.7 uses RL for policy-level optimization, breaking the upper bound of demonstration data, and Retrieve-then-Steer for lightweight experience reuse without retraining. The two paths complement each other: one addresses capability ceiling, the other deployment adaptation.

Part 05

Model Evaluation Results

LIBERO is a widely used standardized benchmark for embodied manipulation, evaluating capabilities such as instruction understanding, spatial-relation judgment, object interaction, and long-horizon execution, with task success rate as the core metric. Building on this, LIBERO-plus introduces perturbations in camera view, robot pose, language, lighting, background, noise, and layout to test the robustness of Action Expert / MCF-Proto under environmental change. SimplerEnv is used to observe the gains provided by Success Memory / Retrieve-then-Steer in simulation.

Taken together, these benchmarks validate OneModel 1.7 from three angles: system-level headline performance, perturbation robustness, and simulation verification. OneModel 1.7 reaches a 99.0% average success rate on standard LIBERO. LIBERO-plus perturbation tests show that MCF-Proto achieves better results in six of seven perturbation categories and maintains higher stability under geometric perturbations such as camera-view changes and robot initial-pose deviations. SimplerEnv results show that Retrieve-then-Steer can deliver meaningful gains in simulation.

Benchmark Overview

LIBERO Benchmark average success rate
Figure 1: Average success-rate comparison on standard LIBERO.

Average Success-Rate Comparison on Standard LIBERO

Figure 1 compares OneModel 1.7 with mainstream public embodied manipulation models on average task success rate in the standard LIBERO benchmark. Standard LIBERO mainly evaluates comprehensive execution capability across spatial understanding, object interaction, goal reasoning, and long-horizon manipulation tasks.

OneModel 1.7 achieves a 99% average task success rate on this benchmark, leading the listed public baselines including π0.5, GR00T-N1.5, and OpenVLA-OFT. This result comes from collaboration among RL-LWAM modules: the World Model's generalizable representation, Understand Expert's task planning, MCF-Proto's action parameterization, and continuous optimization through the RL closed loop.

LIBERO-plus robustness radar
Figure 2: Robustness comparison across seven LIBERO-plus perturbations.

LIBERO-plus Robustness Comparison Across Seven Perturbations

Figure 2 shows MCF-Proto's robustness across seven LIBERO-plus perturbation tests, compared with the strongest baseline in each category. LIBERO-plus introduces Camera, Robot, Language, Light, Background, Noise, and Layout perturbations on top of standard LIBERO to test model stability under deployment-environment changes.

High benchmark performance does not automatically mean a model can handle the unexpected variability of real deployment. LIBERO-plus is designed as a stress test for exactly this. MCF-Proto achieves higher success rates in six of the seven perturbation categories, with the Language category close to the best result (80.1% vs. 81.5%). The most important categories are Camera and Robot. Both are geometric perturbations, among the most common and most action-critical changes in home environments. MCF-Proto reaches 69.7% on Camera (strongest baseline: 66.4%) and 66.0% on Robot (strongest baseline: 50.3%), leading by 3.3 and 15.7 percentage points respectively. This shows that local motion coordinate frames are more resistant to geometric interference than fixed world-frame action regression.

SimplerEnv average success rate
Figure 3: Average success-rate comparison on SimplerEnv.

SimplerEnv Simulation Validation

Figure 3 shows the average success-rate comparison on SimplerEnv. Retrieve-then-Steer improves CogACT's average success rate from 75.8%±0.3 to 79.5%±0.2, a 3.7 percentage-point gain. Compared with baselines such as RT-1, RT-2-X, and OpenVLA, CogACT + Retrieve-then-Steer remains ahead on this average success-rate metric.

Part 06

Real-Robot Validation Results

The following evaluations validate OneModel 1.7 FrontoStria-RL from three angles: daily operations, high-precision tasks, and extreme dynamic scenarios. The results show that the RL-LWAM architecture delivers consistently high success rates and strong robustness across a broad range of real-world tasks, from folding clothes to rallying table tennis. These results support OneRobotics' core view: scaled deployment does not require only a larger monolithic model, but a model system that unifies generalizable understanding, task planning, precise execution, a data flywheel, and continuous evolution.

Success rates for daily operations and high-precision tasks
Figure 4: Success rates for daily operations and high-precision tasks.

Real-World Tasks: Daily Operations and High-Precision Manipulation

Figure 4 shows OneModel 1.7's success rates across multiple tasks on real robot platforms. The evaluation covers two difficulty levels: daily operation tasks and high-precision manipulation tasks.

Daily operation tasks average around 99%: laundry handling, clothes folding, dishwasher operation, and picking objects from a conveyor belt. These tasks involve deformable-object manipulation, multi-stage workflows, and environmental diversity, requiring the model to balance generalizable understanding with stable execution.

High-precision tasks average around 97%: plugging and unplugging test tubes, stacking paper cups, and pouring coffee beans. These tasks demand high end-effector position accuracy, pose control, and stable force control, with very little tolerance for error. MCF-Proto's design of organizing action prototypes around local motion structures shows clear advantages in this class of tasks.

Action-stage success rates in human table-tennis rallying
Figure 5: Action-stage success rates in human table-tennis rallying.

Extreme Dynamic Scenario: Table-Tennis Rallying with a Human

Figure 5 shows OneModel 1.7's action-stage success rates in human table-tennis rallying. Table tennis is an especially challenging embodied intelligence task: the ball moves fast, trajectories vary widely, the response window is extremely short, and precise hit position and angle control are required. It is a typical "high dynamics + high precision" scenario.

OneModel 1.7 reaches a 91.2% receive success rate in this scenario, summarized as "90%+ receive success rate in a highly dynamic scenario." This result validates the RL-LWAM architecture under extreme time constraints: the World Model provides fast prediction of incoming-ball trajectories, the Action Expert generates precise actions within a short time window, and the reinforcement learning closed loop continuously optimizes hitting strategies through large-scale rally training.

Task Execution Records

Laundry Handling
Clothes Folding
Dishwasher Operation
Test-Tube Handling
Cup Stacking
Table-Tennis Rally
Part 07

Conclusion: Model Innovations and Their Meaning for Household Robots

The core challenge of home environments is openness: object categories are diverse, spatial layouts vary, lighting and contact states keep changing, and tasks often involve multi-step long-horizon operations. This requires embodied intelligence models to balance high-level generalization, low-level action precision, and continuous adaptation after deployment.

OneModel 1.7 FrontoStria-RL makes four targeted architectural choices: the World Model provides high-level task and environment representations as the basis for cross-scenario and cross-object generalization; Predictive Policy Latent replaces explicit future images with implicit physical-reasoning representations, using future information during training and current observations only during deployment to efficiently modulate action policies with world understanding; the Action Expert generates continuous action plans through flow matching and, together with MCF-Proto's Motion-Centric Action Frame and prototype-based action parameterization, maps high-level goals stably into action generation around local motion structures; Success Memory, based on Retrieve-then-Steer, reuses successful segments verified by the environment during deployment without updating model parameters, improving closed-loop stability for long-horizon tasks.

Experimentally, OneModel 1.7 reaches a 99.0% average task success rate on the standard LIBERO benchmark as a complete model system, and real-robot validation covers daily operations, high-precision manipulation, and table-tennis rallying with a human, showing consistent performance from standardized benchmarks to real tasks. Module-level validation further demonstrates the contributions of key technologies: MCF-Proto achieves better results in six of seven LIBERO-plus perturbation categories, reflecting robustness to geometric and perceptual perturbations; Retrieve-then-Steer validates the test-time adaptation capability of success memory in SimplerEnv and other simulation evaluations. Together, system-level results and module-level validation push OneModel 1.7 to industry-leading performance across multiple core evaluations.