At HuuliTech, we're building tools to democratize legal education in Mongolia. One of our most challenging and rewarding features is the Case Study Solver and Analyzer - a system that helps law students practice legal reasoning by solving case studies and evaluating their solutions.
In this article, I'll walk through the architecture, design decisions, and technical challenges we faced while building this system.
The Problem Space
Legal education in Mongolia relies heavily on case studies. Law students must:
- Analyze fact patterns to identify legal issues
- Research relevant laws and court precedents
- Apply legal frameworks (criminal law, civil law, administrative law, constitutional law)
- Write structured solutions following court-specific methodologies
- Receive feedback on their analysis quality
Traditionally, this process requires:
- Hours of manual research through legal databases
- Deep knowledge of legal methodology
- Access to mentors for feedback
- Practice with real bar exam problems
We wanted to make this process accessible, instant, and scalable using AI.
System Architecture
Our case study system consists of two main flows:
graph TB
User[User Interface] --> Modal[Case Study Modal]
Modal --> |Problem Solving| Solver[Case Study Solver]
Modal --> |Solution Review| Analyzer[Solution Analyzer]
Solver --> Classifier[Court Type Classifier]
Solver --> Examples[Example Selector]
Solver --> Laws[Law Search Service]
Solver --> Courts[Court Ruling Service]
Analyzer --> Classifier2[Court Type Classifier]
Analyzer --> Rubric[Rubric Evaluator]
Analyzer --> Laws2[Law Recommendation Service]
Classifier --> LLM[Gemini LLM]
Examples --> VectorDB[(Vector Embeddings)]
Laws --> VectorDB
Courts --> VectorDB
Laws2 --> VectorDB
Solver --> Prompt[Prompt Builder]
Analyzer --> Prompt2[Evaluation Prompt Builder]
Prompt --> Stream[Streaming Response]
Prompt2 --> Stream
Stream --> ChatUI[Chat Interface]
High-Level Flow
Case Study Solver:
- User inputs a legal problem statement
- System classifies the court type (criminal, civil, administrative, constitutional)
- Fetches relevant context: laws, court rulings, and few-shot examples
- Builds a comprehensive prompt with legal methodology guides
- Streams a structured solution back to the user
Solution Analyzer:
- User inputs both problem and their solution
- System loads court-specific evaluation rubrics
- Fetches relevant laws for recommendations
- Evaluates solution against rubric dimensions
- Streams detailed feedback with scores and improvement suggestions
Core Design Decisions
1. Court Type Classification
Challenge: Legal analysis differs dramatically across court types. Criminal law focuses on crime elements and criminal responsibility, while civil law emphasizes dispute resolution and evidence burden.
Solution: We built an automatic classifier using Gemini Flash with structured output:
flowchart LR
Input[Problem Statement] --> Gemini[Gemini Flash LLM]
Gemini --> Schema[Structured JSON Schema]
Schema --> Result{Classification Result}
Result --> Criminal[eruu - Criminal]
Result --> Civil[irgen - Civil]
Result --> Admin[zahirgaa - Administrative]
Result --> Constitutional[undes - Constitutional]
Result --> NotCase[not_case_study]
The classifier returns:
- Court type (with 95%+ accuracy based on our testing)
- Confidence score
- Reasoning for transparency
Users can override the classification if needed, but auto-detection works remarkably well.
2. Context Retrieval Pipeline
Challenge: Legal analysis requires multiple types of context:
- Laws (primary legal sources)
- Court rulings (precedents and interpretations)
- Few-shot examples (solved bar exam problems)
Solution: Parallel fetching with semantic search:
sequenceDiagram
participant Service
participant Embeddings
participant LawDB
participant CourtDB
participant ExampleDB
Service->>Embeddings: Generate embedding for problem
Service->>+LawDB: Fetch 10 matching laws
Service->>+CourtDB: Fetch 10 court rulings
Service->>+ExampleDB: Select 2 best examples
LawDB-->>-Service: Relevant laws by type
CourtDB-->>-Service: Summarized rulings
ExampleDB-->>-Service: Similar solved problems
Service->>Service: Build comprehensive prompt
Key optimizations:
- Use the same embedding for all searches (generated once)
- Fetch resources in parallel using
Promise.all() - Load example metadata first, hydrate solutions on demand
- Limit context size to prevent token overflow
3. Few-Shot Learning Architecture
Challenge: LLMs need examples of high-quality legal analysis to follow the correct format and depth.
Solution: We maintain a curated library of solved bar exam problems:
backend/src/services/case-study/few-shot/
├── eruu/ # Criminal law examples
│ ├── problem1.txt
│ ├── problem1_solution.txt
│ ├── problem2.txt
│ └── problem2_solution.txt
├── irgen/ # Civil law examples
├── zahirgaa/ # Administrative law examples
└── undes/ # Constitutional law examples
Selection process:
- Load all example metadata (problems only - ~500 tokens each)
- Use Gemini to select 2 most relevant examples
- Hydrate with full solutions only for selected examples
This approach keeps costs low while maintaining quality.
4. Domain-Specific Rubrics
Challenge: Evaluating legal reasoning isn't subjective - it follows well-defined criteria used in Mongolian bar exams.
Solution: Court-specific rubrics with weighted dimensions:
// Criminal Law Rubric (20 points total)
const eruuRubric = [
{
dimension: "Crime composition and classification",
description: "Correctly identify crime elements...",
maxScore: 8
},
{
dimension: "Co-participants",
description: "Identify participant types...",
maxScore: 4
},
{
dimension: "Criminal responsibility and law application",
description: "Apply sentencing guidelines...",
maxScore: 5
},
{
dimension: "Criminal procedure",
description: "Address procedural requirements...",
maxScore: 3
}
];
Each court type has 4 rubric dimensions totaling 20 points. The LLM evaluates each dimension and provides:
- Score for that dimension
- Reasoning for the score
- Specific improvements needed
5. Constitutional Law Methodology
Challenge: Constitutional law analysis follows specific multi-step methodologies (5-step or 6-step analysis).
Solution: Load methodology guides as part of the system prompt:
graph LR
User[User selects methodology] --> Five[5-Step Analysis]
User --> Six[6-Step Analysis]
Five --> Step1[1. Right Identification]
Five --> Step2[2. Restriction Identification]
Five --> Step3[3. Legal Basis]
Five --> Step4[4. Proportionality Test]
Five --> Step5[5. Conclusion]
Six --> Step1
Six --> Step2
Six --> Step2b[3. Scope of Protection]
Six --> Step3
Six --> Step4
Six --> Step5
These guides are embedded in the system prompt, ensuring the LLM follows the correct analytical framework.
Technical Implementation Highlights
Service Layer Architecture
We use a clean service-oriented architecture:
class CaseStudyServiceV2 {
async processCaseStudy(request: CaseStudyRequest): Promise<CaseStudyResult> {
// 1. Classify court type
const classification = await courtService.classifyCourtType(
request.problemStatement
);
// 2. Generate embedding once
const embedding = await generateGeminiEmbedding(
request.problemStatement
);
// 3. Parallel resource fetching
const [examples, laws, courtRulings] = await Promise.all([
selectAndHydrateExamples(exampleMetas, request.problemStatement),
fetchRelevantLaws(request.problemStatement, courtType, embedding),
courtService.fetchAndSummarizeForCaseStudy(
request.problemStatement, courtType, embedding, 10
)
]);
// 4. Build comprehensive prompt
const { systemPrompt, userPrompt } = await buildCaseStudyPrompt(
request, courtType, examples, laws, courtRulings
);
return { systemPrompt, userPrompt, courtType, classification };
}
}
Benefits:
- Clear separation of concerns
- Easy to test each component independently
- Centralized error handling and logging
- Type-safe with TypeScript
Streaming Response Pattern
Legal analysis can be lengthy. We stream responses to improve perceived performance:
// Controller handles streaming
async handleCaseStudy(req, res) {
const result = await caseStudyService.processCaseStudy(request);
// Stream LLM response
const stream = await generateLLMStream(
result.systemPrompt,
result.userPrompt
);
for await (const chunk of stream) {
res.write(chunk);
}
res.end();
}
Users see analysis appearing in real-time, making the wait feel shorter.
Vector Search Strategy
We use hybrid search for legal documents:
-- Combine semantic similarity with full-text search
SELECT *
FROM legal_documents
WHERE similarity(embedding, query_embedding) > 0.7
AND court_type = $1
ORDER BY similarity DESC
LIMIT 10
This ensures we get:
- Semantically relevant documents (vector similarity)
- Court-type filtered results (exact match)
- Fast retrieval (indexed searches)
Performance Considerations
Latency Breakdown
Typical case study analysis takes 8-12 seconds:
| Stage | Time | Optimization |
|---|---|---|
| Classification | 1-2s | Gemini Flash (lightweight model) |
| Embedding | 0.5-1s | Cached when possible |
| Context Retrieval | 2-3s | Parallel fetching |
| Prompt Building | 0.5s | In-memory operations |
| LLM Generation | 5-8s | Streaming for perceived speed |
Optimization strategies:
- Use Gemini Flash for classification (10x faster than Pro)
- Parallel I/O operations wherever possible
- Lazy load example solutions (only when needed)
- Stream responses immediately
Cost Optimization
Running LLMs at scale requires careful cost management:
pie title Token Usage Distribution
"Context (Laws, Examples)" : 4000
"System Prompt (Methodology)" : 2000
"User Problem" : 500
"Generated Solution" : 1500
Cost reduction techniques:
- Limit laws to top 10 (not 50)
- Summarize court rulings before including
- Use example metadata for selection
- Choose appropriate model sizes (Flash vs Pro)
Challenges and Solutions
Challenge 1: Context Window Limits
Problem: Including 10 laws + 10 court rulings + 2 examples + methodology guides = 12,000+ tokens
Solution:
- Prioritize matching-type laws over other-type laws
- Summarize court rulings to key points only
- Use truncation strategies for very long laws
Challenge 2: Evaluation Consistency
Problem: LLM evaluations can be inconsistent between runs
Solution:
- Provide detailed rubric with specific scoring criteria
- Use examples of each score level (0, 50%, 100%)
- Request explicit reasoning for each score
- Log evaluations for quality monitoring
Challenge 3: Mongolian Language Support
Problem: Legal terminology in Mongolian requires careful handling
Solution:
- All prompts and rubrics in Mongolian
- Use Gemini models (better multilingual support)
- Test with native Mongolian legal experts
- Maintain glossary of legal terms
Results and Impact
Since launching the case study feature:
- 2,500+ case studies solved
- 85% user satisfaction (based on thumbs up/down)
- Average 10 minutes per case study (vs 60+ minutes manually)
- Students practice 5x more due to instant feedback
The most impactful feature is the solution analyzer - students can now:
- Attempt a problem on their own
- Get detailed rubric-based feedback
- See specific areas for improvement
- Learn from recommended laws and precedents
Future Directions
We're exploring several enhancements:
1. Adaptive Difficulty
Track student performance and suggest appropriately challenging problems
2. Comparative Analysis
Show how other students approached the same problem (anonymized)
3. Interactive Clarification
Allow students to ask follow-up questions about their feedback
4. Multi-Turn Dialogue
Support conversational problem-solving instead of single-shot analysis
5. Personalized Study Plans
Generate practice schedules based on weak rubric dimensions
Key Takeaways
Building an AI-powered legal education tool taught us:
- Domain expertise matters: Understanding legal methodology was crucial for prompt engineering
- Context is king: Quality context (laws, examples, precedents) beats model size
- Structured output: Use JSON schemas and rubrics for consistent evaluations
- Performance optimization: Parallel fetching and streaming make a huge UX difference
- Iterative improvement: Start simple, gather feedback, enhance based on real usage
Conclusion
The Case Study Solver and Analyzer represents our commitment to making legal education accessible to every Mongolian law student. By combining LLMs, vector search, and domain-specific rubrics, we've created a tool that provides instant, structured, and actionable feedback on legal reasoning.
The system is far from perfect - legal reasoning is nuanced and context-dependent. But by focusing on clear methodology, transparent evaluation criteria, and continuous improvement based on user feedback, we're building something that genuinely helps students learn.
If you're building AI-powered education tools, I hope our architecture and design decisions provide useful insights. Feel free to reach out with questions or suggestions!
Want to try the Case Study Solver? Visit huuli.tech and click the "Бодлого" button in the chat interface. It's free to use with our trial program.
Technical Appendix
For those interested in implementation details:
Tech Stack:
- Backend: Express.js + TypeScript
- LLM: Google Gemini (Flash for classification, Pro for generation)
- Vector DB: PostgreSQL with pgvector extension
- Frontend: Next.js 15 + React Query
- Streaming: Server-Sent Events (SSE)
Open Questions:
- How to handle multi-language legal systems? (English common law vs Mongolian civil law)
- Can we fine-tune smaller models on legal reasoning?
- What's the right balance between automation and human feedback?
Resources: