Many administrators arrive in Raisetalk with the habits of manual Quality Monitoring, where "broad criteria" mixing several topics were common. This is the main pitfall to avoid.
On the human evaluator side, broad criteria make evaluation unstable:
- each evaluator interprets them their own way, which forces you to organise regular calibration sessions
- biases pile up and comparing teams becomes difficult
- the time spent arbitrating quickly exceeds the time spent coaching
On the AI side, the problem is the same: a criterion that bundles several uncorrelated conditions forces the AI to arbitrate on a fuzzy set, which degrades the reliability of the results and makes reports hard to exploit.
The rule is therefore simple, with humans as with AI: one criterion = one topic, phrased in a specific and observable way.
Anatomy of a criterion

In Raisetalk, a criterion is made of the following elements, configured by the administrator:
- Category: to organise criteria and find your way in the repository
- Name: a short label, readable in reports
- Assertion: the "truth" that the AI will challenge during analysis
- Orientation: if the assertion is verified, is the criterion achieved or not achieved?
- Points: the weight of the criterion in the overall score
- Evaluation mode: automatic (AI) or manual (human)
- Evaluation conditions: optional, to apply the criterion only in certain cases
- Language(s): so the criterion appears in the repository in several languages
The trickiest part, and the one that really drives a criterion's effectiveness, is writing the assertion. The following sections detail each element, starting with that one.
A repository provided by default
When a Raisetalk instance is created, it ships with a set of standard criteria that provide a solid baseline you can then evolve as you see fit. There is no need to start from a blank page: you get examples that are already written and structured, which you can keep as is, adapt to your business, disable or extend.
Two business repositories are covered:
- Support and customer relations: a repository inspired by the grids of Voted Customer Service of the Year, covering the key points of handling quality: greeting, active listening, reformulation, clarity of the answer, conversation closing, and more.
- Sales: a repository based on the MEDDIC framework (Metrics, Economic Buyer, Decision Criteria, Decision Process, Identify Pain, Champion), which structures sales conversations around the classic evaluation criteria of B2B selling.
Beyond the time saved, these criteria also serve as concrete examples of how a clear assertion is written: reviewing them is a good way to get familiar with the principles detailed in the following sections.
Writing a clear assertion
The assertion is the text the AI will try to verify against the transcript of the conversation. For it to be usable, it must follow a simple principle:
WHO does WHAT: an actor (the customer, the agent, the salesperson, the prospect, ...), an action verb, and a precise object.
In other words, the assertion must describe an observable behaviour on the transcript, not a value judgement or a broad concept.
Examples: avoid / prefer
| ❌ Avoid | ✅ Prefer | Why |
|---|---|---|
Churn | The customer expresses their intention to terminate their contract | A keyword is not an assertion. No subject, no verb, no object: neither the AI nor a human supervisor would know precisely what to evaluate. |
The agent was professional | The agent uses a greeting formula at the start of the conversation | "Professional" is subjective and covers many notions (politeness, rigour, empathy, and so on). Replace with a precise, verifiable behaviour. |
The agent handled the request well | The agent rephrases the customer's problem before proposing a solution | "Handled well" is too vague to be evaluated. Describe a concrete action that can be found in the transcript. |
The agent shows empathy and proposes a suitable solution and complies with the commercial policy | Split into 3 distinct criteria: empathy, suitable solution, compliance with the commercial policy | One criterion = one topic. Mixing several conditions in the same assertion makes evaluation unpredictable and reports unreadable. |
The last case is the most common among teams coming from manual Quality Monitoring. As soon as an assertion contains "and", "as well as", "while", etc., it is likely that it deserves to be split.
Orienting the criterion: achieved or not achieved
Once the assertion is written, you must indicate what happens when it is verified: is the criterion achieved or not achieved?
The rule is simple:
- Assertion describing a positive behaviour → achieved when it is verified Example: The agent rephrases the customer's problem → achieved
- Assertion describing a negative behaviour → not achieved when it is verified Example: The customer expresses their intention to terminate their contract → not achieved
In other words, it is indeed the fact that the assertion is verified which triggers the state chosen by the administrator. This lets you naturally phrase both positive signals and alert signals.
Weighting the criterion
Each criterion receives points that determine its weight in the overall score of the conversation:
- From 0 to 100 points when the criterion is achieved
- From 0 to -100 points when it is not achieved
A criterion worth 100 will therefore weigh ten times more than a criterion worth 10 in the final score.
Best practice: keep a consistent scale within a single grid. If most criteria are weighted between 10 and 30, a lone criterion at 100 will crush all the others and make the score hard to interpret. Defining 3 or 4 tiers ahead of time (e.g. minor / important / major / blocking) and sticking to them gives much more readable reports.
Automatic or manual evaluation
By default, criteria are evaluated automatically by the AI. This is the mode that makes conversation analysis scalable: every conversation is processed without human intervention.
In some cases, it may be relevant to keep one or two manual criteria per grid, to be evaluated by a human supervisor. This remains the exception and typically concerns criteria that are too subjective to be evaluated reliably without additional business context.
This choice is made when the criterion is created and cannot be changed afterwards.
Conditioning the evaluation
Some criteria only make sense in certain cases. Raisetalk lets you condition the evaluation so it is only triggered when that case is detected.
- Criterion always evaluated (no condition): The customer is satisfied with the answer provided
- Conditional criterion: If a hold occurs, the agent notifies the customer before interrupting the conversation. This criterion is only evaluated if a hold actually occurred during the conversation.
This avoids penalising criteria across all conversations where they do not apply, and distorting the overall score.
Multilingual criteria
A criterion can be entered in several languages. This translation is purely editorial: it does not change anything in the AI processing. Its sole purpose is to make the criterion appear in the repository in each of the languages supported by the account, so that every administrator, supervisor, and agent can consult the repository in their working language.
