Internationalization

How a 10-Person Team Built 35 AI Agents

iSoftBao Engineering March 3, 2026 ~14 min read

The iSoftBao engineering team at their Shanghai office

When people hear that iSoftBao has 35 AI agents in production, serving over 200 enterprise clients, they usually assume a large engineering organization. The reality is more unexpected: a single team of ten people, operating out of a modest office in Shanghai, built the entire system over three years. No massive funding. No army of PhDs. Just a small team making a sequence of deliberate architectural and organizational choices that compounded over time.

This is not a story about AI hype. It is a story about how to build a complex AI system with a small team — the architecture decisions that mattered, the ones that didn't, and the hard lessons learned along the way. If you are building an AI product with a small team, or considering doing so, this deep dive is for you.

1. The Constraint That Shaped Everything

The ten-person team was not a choice — it was a reality. "When we started, we simply could not compete with big tech on hiring," said the CTO. "They were offering triple our salaries plus equity. So we had to build differently."

This constraint turned into the team's greatest advantage. It forced them to make decisions that reduced complexity at every level. Every new agent, every new feature, every architectural decision had to pass a simple test: Can a single engineer own this end-to-end, from prompt design to deployment to monitoring?

The answer to that question drove the team toward micro-agent architecture — each agent is an independent, self-contained unit with its own prompt, its own model selection, its own evaluation suite, and its own deployment pipeline. One engineer can fully understand, debug, and extend any agent in the system without needing to coordinate with five other people.

The team's architecture whiteboard session

"The biggest productivity killer in small teams is not lack of skill — it is coordination overhead. Every time someone needs to ask someone else 'how does this work?' or 'can you change this for me?', you lose hours. Micro-agent architecture eliminates that. One engineer, one agent, full ownership." — CTO

2. The Team Composition Nobody Expected

The composition of the ten-person team is counterintuitive. There is not a single dedicated machine learning researcher on the team. Instead, the team breaks down as follows: three full-stack engineers who handle agent logic, backend services, and deployment; two frontend engineers who build the customer-facing portal and agent monitoring dashboards; one prompt engineer who focuses exclusively on prompt design, testing, and versioning; two solution engineers who work directly with clients to understand their workflows and translate them into agent requirements; one QA engineer who built the automated evaluation framework; and the CTO, who serves as the system architect and floating problem solver.

"People ask us why we don't have ML researchers. The answer is simple: the hard part of making AI agents work in production is not model training — it is integrating AI with real business systems," explained the CTO. "Our agents need to query databases, call APIs, validate outputs against business rules, handle edge cases gracefully. That requires strong software engineering, not deep learning research."

The most distinctive role is the solution engineers — two people whose full-time job is to sit with clients, understand their workflows in detail, and translate real business pain into clear agent specifications. "Without them, our engineers would build technically impressive but commercially useless agents. They are the bridge between code and customer value."

3. The Architecture: Micro-Agents Over Monoliths

The EIOS platform's architecture can be summarized in one sentence: each agent is an independently deployable microservice that communicates with other agents only through a standardized message bus.

This architecture emerged from a painful early lesson. "Our first attempt was a monolithic agent system — one giant prompt that tried to handle customer service, order processing, and inventory queries all at once," recalled the CTO. "It was a disaster. The prompt was 8,000 tokens long, the context window kept overflowing, and when one part broke, the entire system went down."

The refactored architecture has four layers. The routing agent at the top classifies incoming requests and dispatches them to the appropriate specialist agent. The specialist agents — 35 of them across four domains (customer interaction, process automation, data analysis, knowledge management) — each handle one specific task. The message bus provides standardized agent-to-agent communication. And the orchestration agent coordinates multi-step workflows that span multiple specialist agents.

This architecture provides three critical properties for small-team operations: isolation, meaning a bug in the inventory-prediction agent never affects the customer-service agent; independent iteration, meaning one engineer can release a new version of their agent without coordinating with the rest of the team; and incremental deployment, meaning new agents can be added one at a time without touching existing code.

4. The Prompt Engineering Discipline

Perhaps the most underappreciated aspect of the team's work is their approach to prompt engineering. "Most people think prompt engineering is writing text. It is not. It is a software engineering discipline with version control, regression testing, and performance evaluation," said the prompt engineer.

The team maintains between 50 and 200 test cases for each core agent. Every time a prompt is modified — even a single word change — the full regression suite is run automatically. The evaluation framework measures three dimensions: accuracy (did the agent produce the correct output?), completeness (did it address all aspects of the request?), and consistency (does it produce the same quality output across similar inputs?).

The internal prompt testing and evaluation dashboard

"We learned this the hard way. Early on, we would tweak a prompt to fix one customer's edge case, only to break it for ten other customers. Without automated regression testing, you are flying blind." The team now treats prompts with the same engineering rigor as code — version-controlled in Git, peer-reviewed, and deployed through a CI/CD pipeline with automated gates.

5. Model Selection: Matching Capability to Need

One of the most impactful decisions was a tiered model selection strategy. Not all tasks need the most powerful model. The team categorizes every agent interaction into one of three tiers: simple tasks (keyword extraction, format validation, simple classification) use lightweight models; medium tasks (intent classification, document summarization, basic reasoning) use mid-tier models; complex tasks (multi-step planning, long-document understanding, ambiguous queries) use the strongest models.

"When we first started, we routed everything to the most capable model available. Then we got the bill." The tiered strategy reduced AI API costs by approximately 70% while maintaining output quality — because 80% of customer interactions in production fall into the simple and medium tiers.

Beyond cost, the team implemented semantic caching — when a new request is semantically similar to a previously processed one (cosine similarity above a tuned threshold), the cached result is returned without calling the model. "In customer service scenarios, about 30% of queries are variations of the same few questions. Semantic caching alone saves us thousands of API calls per day."

6. What We Would Do Differently

When asked what they would change if they could start over, the team was unanimous on three points. First, they would invest in automated evaluation infrastructure from day one, not six months in. "The technical debt of untested prompts is invisible until it breaks something in production. Then it is very visible."

Second, they would hire solution engineers earlier. "For the first year, our engineers were talking directly to customers. That was a mistake. Engineers optimize for technical elegance; solution engineers optimize for customer value. You need both perspectives from the beginning."

Third, they would resist the temptation to build "clever" architectures. "Every time we tried to be clever — custom DSLs, auto-orchestration, dynamic prompt composition — we ended up reverting to simpler designs within a few months. The boring architecture is almost always the right architecture for a small team."

The lesson is clear: with the right architecture, the right team composition, and ruthless prioritization, a small team can build AI systems that compete with the output of organizations ten times their size. The key is not to hire more people — it is to eliminate the overhead that makes more people necessary.