Observation 1: Python Programming vs. Natural Language Programming
As the capabilities of large language models (LLMs) continue to grow, it is anticipated that natural language will increasingly dominate the design of agents. In this new paradigm, designers only need to provide high-level instructions and domain specific service knowledge, leaving the language model to handle the intricacies of user interaction. We have the following observations:

By their nature, the tasks performed by (human) agents are often expressed in natural language, while Python programs better serve as a bridge between agents and the computer world. Whenever there is a tendency to use Python for control enforcement, it is often possible to achieve the same control logic solely through natural language. Here is an example of a natural language command line using LLM(MICA) vs. Python+LLM. The bot can translate a user’s natural language input into Bash commands, execute them, and return the results with a natural language explanation. MICA can achieve the same functionality using an LLM agent.
tools:
- executor.py
middleman:
type: llm agent
description: Ask GPT and get command/status/next action
prompt: |
You are the middleman AI, which sits between the user and the bash command line of a recent Ubuntu system.
Your input will be interleaved user input, the command you generate and system feedback from executing any of these commands.
Your behavior:
1. You should translate user's request into a command.
2a. If we need to explicit user confirmation before running the command, the user will be asked: "Run this command? (yes/no)";
If the user declines, we will pass that feedback into your next input.
2b. If we don't need to explicit user confirmation, call "run_command" function.
3. Provide system feedback (stdout, stderr, or user-declined commands).
4. terminate.
Remember: The commands you generate should run without any user inputs.
args:
- command
uses:
- run_command
meta:
type: ensemble agent
description: You can select an agent to respond to the user's question.
contains:
- middleman
main:
steps:
- call: meta
The code responsible for program execution is the same as Middleman’s implementation.
import subprocess
def run_command(command):
"""
Executes a shell command, returning (stdout, stderr, returncode).
"""
result = subprocess.run(command, shell=True, capture_output=True, text=True)
return result.stdout, result.stderr, result.returncode
Show the Python+LLM implementation of Middleman
import openai
from typing import Literal, Optional
from pydantic import BaseModel
import subprocess
import json
SYSTEM_PROMPT = """
You are the middleman AI, which sits between the user and the bash command line of a recent Ubuntu system.
Your input will be interleaved user input, the command you generate and system feedback from executing any of these commands.
Your behavior:
1. You always generate a response as a JSON object with the following schema:
...
2. The field 'type' can be:
- "plain": A message that only displays 'content' to the user.
- "command": A message containing a 'command' field to be executed on the system.
- "terminate": A message indicating the conversation should end after displaying 'content' to the user.
3. If you provide a "command" of type "command":
- The "content" field must always include an explanation or reason.
- The "confirm" field specifies whether we need explicit user confirmation before running the command.
- If "confirm" is true, the user will be asked: "Run this command? (yes/no)".
- If the user declines, we will pass that feedback into your next input.
- If "confirm" is false, we will run the command without asking for confirmation.
4. If "type" is "terminate":
- We will display the "content" message to the user and then end the session.
5. You must always respond in valid JSON.
Your role is to:
- Interpret user requests and any system feedback (stdout, stderr, or user-declined commands).
- Provide either an explanation, a new command, or end the conversation, based on the user’s needs.
Remember:
- Always respond in valid JSON.
- Do not include any additional text or markdown outside the JSON structure.
- The commands you generate should run without any user inputs.
class Response(BaseModel):
type: Literal["plain", "command", "terminate"]
content: str
command: Optional[str]
confirm: bool
client = openai.OpenAI()
# ANSI color codes
RED = "\033[31m"
GREEN = "\033[32m"
YELLOW = "\033[33m"
BLUE = "\033[34m"
RESET = "\033[0m"
def prompt_user():
return input(">> ")
def format_message(user_input=None, command=None, stdout=None, stderr=None, declined=False):
"""
Constructs a message with sections only if they have content:
--- User Input:
--- COMMAND:
--- STDOUT:
--- STDERR:
--- DECLINED:
"""
msg = ""
if user_input:
msg += f"--- User Input:\n{user_input}\n"
if command:
msg += f"--- COMMAND:\n{command}\n"
if declined:
msg += "--- DECLINED\n"
if stdout:
msg += f"--- STDOUT:\n{stdout}\n"
if stderr:
msg += f"--- STDERR:\n{stderr}\n"
return msg
def ask_chatgpt(context):
"""
Sends 'context' to OpenAI ChatCompletion and expects a JSON response:
{
"type": "plain" | "command" | "terminate",
"content": "...",
"command": "...", # if type == "command"
"confirm": bool, # if type == "command"
"terminate": bool # optional if type == "terminate"
}
"""
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=context,
response_format=Response
)
parsed = response.choices[0].message.parsed
content = response.choices[0].message.content
return parsed, content
def run_command(command):
"""
Executes a shell command, returning (stdout, stderr, returncode).
"""
result = subprocess.run(command, shell=True, capture_output=True, text=True)
return result.stdout, result.stderr, result.returncode
def main():
context = []
carry_over_message = None # Stores message for the next loop iteration
# Optional system instruction:
context.append({"role": "system", "content": SYSTEM_PROMPT})
while True:
# Either use carry_over_message as user input or prompt the user
if carry_over_message:
user_input = carry_over_message
carry_over_message = None
else:
user_input = prompt_user()
# Add the user's message to the context
context.append({"role": "user", "content": user_input})
# Ask the AI for a response
ai_response, content = ask_chatgpt(context)
print(ai_response.content)
# Add the AI's response back to the context
context.append({"role": "assistant", "content": content})
msg_type = ai_response.type
if msg_type == "terminate":
print("Session terminated by AI.")
break
elif msg_type == "command":
command = ai_response.command
confirm = ai_response.confirm
# Ask user for confirmation if needed
if confirm:
user_confirmation = input(f"Run this command? {command} (yes/no): ")
if user_confirmation.lower() != "yes":
declined_message = format_message(command=command, declined=True)
carry_over_message = declined_message
continue
# Execute the command
print(f"{YELLOW}{command}{RESET}")
stdout, stderr, returncode = run_command(command)
# Color-code the output
colored_stdout = f"{GREEN}{stdout}{RESET}" if stdout else ""
colored_stderr = f"{RED}{stderr}{RESET}" if stderr else ""
# Format the output message
output_message = format_message(
command=command,
stdout=stdout if stdout else None,
stderr=stderr if stderr else None
)
# Display the color-coded output to the user
if stdout:
print(f"{YELLOW}--- STDOUT:{RESET}")
print(colored_stdout, end="")
if stderr:
print(f"{YELLOW}--- STDERR:{RESET}")
print(colored_stderr, end="")
# Add the raw (uncolored) message to context for AI
context.append({"role": "assistant", "content": output_message})
# Use the output as carry-over message for the next iteration
carry_over_message = output_message
# If type == "plain", just loop again
if __name__ == "__main__":
main()
"""
Observation 2: Rigid Control Flow vs. Full Flexibility
Main stream agent frameworks such as AutoGen, CrewAI, LangChain, Amazon MAO, and Swarm remain predominantly Python-centric. In contrast, MICA chooses to move away from Python programming as much as possible, embracing the belief that LLMs will continue improving, becoming more powerful, accurate and user-friendly.

Service bots are traditionally developed with rigid flow control. As long as you would like to give a little bit more freedom to users, they will fall apart as it is hard to predict user input. Achieving true flexibility requires leveraging LLMs. MICA shifts away from traditional flow control, embracing the power of LLMs to handle complex, open-ended interactions. While rigid flow control may provide short-term benefits, such as reducing hallucinations and offering a sense of controllability, it will be challenging in the long term if the goal is to provide users with greater freedom to interact with the system.
Suppose we want to implement a money transfer chatbot. Using a mixture of traditional flow controls and LLM support, Rasa requires explicitly defining slots, bot responses, and decision logic, it takes at least 180 lines of YAML (excluding some function code) to complete the task. Developing and testing this code is a challenging task, not to mention the subsequent maintenance and upgrade efforts. In contrast, you only need ~40 lines in MICA.
Show the RASA code
flows:
transfer_money:
description: send money to friends and family
name: transfer money
always_include_in_prompt: True
steps:
- collect: transfer_money_recipient
description: the name of a person
- id: ask_amount # we keep this id, because we use it for a jump
collect: transfer_money_amount_of_money
description: the amount of money without any currency designation
- action: check_transfer_funds
next:
- if: not slots.transfer_money_has_sufficient_funds
then:
- action: utter_transfer_money_insufficient_funds
- set_slots:
- transfer_money_amount_of_money: null
- transfer_money_has_sufficient_funds: null
next: ask_amount
- else: transfer_money_final_confirmation
- id: transfer_money_final_confirmation
collect: transfer_money_final_confirmation
description: accepts True or False
ask_before_filling: true
next:
- if: not slots.transfer_money_final_confirmation
then:
- action: utter_transfer_cancelled
next: END
- else: execute_transfer
- id: execute_transfer
action: execute_transfer
next:
- if: slots.transfer_money_transfer_successful
then:
- action: utter_transfer_complete
next: END
- else:
- action: utter_transfer_failed
next: END
actions:
- check_transfer_funds
- execute_transfer
slots:
transfer_money_transfer_successful:
type: bool
mappings:
- type: custom
action: execute_transfer
transfer_money_has_sufficient_funds:
type: bool
mappings:
- type: custom
action: check_transfer_funds
transfer_money_recipient:
type: text
mappings:
- type: from_llm
transfer_money_amount_of_money:
type: text
mappings:
- type: from_llm
transfer_money_final_confirmation:
type: text
mappings:
- type: from_llm
responses:
utter_transfer_money_insufficient_funds:
- text: You don't have so much money on your account!
utter_transfer_failed:
- text: something went wrong transferring the money.
utter_out_of_scope:
- text: Sorry, I'm not sure how to respond to that. Type "help" for assistance.
utter_ask_transfer_money_amount_of_money:
- text: How much money do you want to transfer?
utter_ask_transfer_money_recipient:
- text: Who do you want to transfer money to?
utter_transfer_complete:
- text: Successfully transferred {transfer_money_amount_of_money} to {transfer_money_recipient}.
utter_transfer_cancelled:
- text: Transfer cancelled.
utter_ask_transfer_money_final_confirmation:
- buttons:
- payload: yes
title: "Yes"
- payload: no
title: "No, cancel the transaction"
text: Would you like to transfer {transfer_money_amount_of_money} to {transfer_money_recipient}?
If you use MICA, it looks like the following:
transfer_money:
type: llm agent
description: This agent let users transfer money to a recipient.
prompt: |
You are a smart agent for handling transferring money request. When user ask for transferring money,
it is necessary to sequentially collect the recipient's information and the transfer amount.
Then, the function "check_transfer_funds" should be called to check whether the account balance is sufficient to cover the transfer. If the balance is insufficient, it should return to the step of requesting the transfer amount.
Finally, before proceeding with the transfer, confirm with the user whether the transfer should be made.
args:
- recipient
- amount_of_money
uses:
- check_transfer_funds
Observation 3: Multiple Agents vs. One Gigantic LLM Agent
While it is possible to put all the constraints, all the business logics and knowledge in one gigantic LLM agent, practically it will cause a lot of issues with testing, debugging, reusability, etc. Modern engineering principles emphasize the importance of designing and testing individual components before integrating them. The same principle applies to agent development.

In summary, MICA considers these observations and advocates for an agent centric framework as the future of customer service. While it retains flow control and tool use to facilitate interaction with traditional programming interfaces, MICA prioritizes natural language-based agents as its core element. This agent-centric approach also paves the way for advancements in automated testing and evaluation, addressing an increasingly critical need in service bot development. We will explore these benefits once MICA’s auto-testing capabilities are put online in the future.