Skip to main content

Building a Requirements Similarity Analyzer with Python and Vue

· 7 min read

Functional programmers prefer data to calculations and prefer calculations to actions.

I discovered this concept while reading Grokking Simplicity in 2021, and it completely changed how I approach software development. This mindset shift is particularly useful when building tools that analyze document similarity - breaking the problem into data, calculations, and actions makes everything clearer.

The Problem I Needed to Solve

Software requirements documents are often created by different teams, departments, or for related projects. This leads to:

  • Redundant specifications that waste development effort
  • Inconsistent implementations of the same requirement
  • Difficulty tracking changes across documents
  • Maintenance headaches when requirements evolve differently

I needed a solution that could analyze multiple documents and identify similar requirements to maintain consistency.

My Approach: Breaking Down the Solution

I built a Requirements Similarity Analyzer that follows functional programming principles by separating:

  • Data: The requirements documents and their extracted content
  • Calculations: Similarity algorithms and requirement classification
  • Actions: Document processing, report generation, and user interactions

The solution has three key capabilities:

  1. Upload and analyze multiple requirements documents
  2. Compare specifications using configurable similarity thresholds
  3. Generate reports highlighting common and unique requirements

The Technical Architecture

I combined a Python FastAPI backend with a Vue frontend:

Backend (Python/FastAPI)
├── Core Services
│ ├── Document Processing (PDF, DOCX, TXT)
│ ├── Text Analysis & Similarity Detection
│ ├── Requirements Extraction & Classification
│ └── Report Generation
└── API Endpoints
├── Document Upload & Management
├── Analysis Control
└── Results & Export

Frontend (Vue.js)
├── Document Management Interface
├── Analysis Configuration
├── Results Visualization
└── Report Export Options

Key Components: Data, Calculations, and Actions

1. Data Processing (Actions)

I use Python libraries to extract text from documents, applying functional programming principles to keep the document processing pure:

# Example showing document processing for requirements
def extract_requirements(text):
# Split text into paragraphs
paragraphs = text.split('\n\n')

# Identify potential requirement paragraphs
requirements = []
for para in paragraphs:
if is_requirement(para):
requirements.append({
'text': para,
'type': classify_requirement_type(para)
})

return requirements

def is_requirement(text):
# Look for requirement indicators (shall, must, should, etc.)
requirement_indicators = [
r'\bshall\b', r'\bmust\b', r'\brequired\b',
r'\bshould\b', r'needs to\b'
]
return any(re.search(pattern, text, re.IGNORECASE) for pattern in requirement_indicators)

2. Similarity Analysis (Calculations)

The core calculation is pure and functional - it takes inputs and returns outputs without side effects:

def calculate_similarity(req1, req2):
"""Calculate similarity between two requirement texts"""
# Use similarity ratio algorithm from SequenceMatcher
return SequenceMatcher(None, req1.lower(), req2.lower()).ratio()

def find_similar_requirements(requirements, threshold=0.8):
"""Group similar requirements based on similarity threshold"""
groups = []
processed = set()

for i, req1 in enumerate(requirements):
if i in processed:
continue

similar_reqs = []
for j, req2 in enumerate(requirements):
if i != j and j not in processed:
similarity = calculate_similarity(req1['text'], req2['text'])
if similarity >= threshold:
similar_reqs.append({
'requirement': req2,
'similarity': similarity
})
processed.add(j)

if similar_reqs:
groups.append({
'primary': req1,
'similar': similar_reqs
})
else:
groups.append({
'primary': req1,
'similar': []
})

processed.add(i)

return groups

3. Report Generation (Calculations to Data)

After analyzing documents, I generate comprehensive reports using pure functions:

def generate_similarity_report(grouped_requirements, format='json'):
"""Generate a report of similar requirements"""
if format == 'json':
return json.dumps({
'timestamp': datetime.now().isoformat(),
'total_requirements': sum(1 + len(g['similar']) for g in grouped_requirements),
'unique_requirements': len(grouped_requirements),
'duplicate_count': sum(len(g['similar']) for g in grouped_requirements),
'groups': grouped_requirements
}, indent=2)
elif format == 'csv':
# CSV generation code...
pass

Implementation: Putting It All Together

Let's look at how I implemented the document processor service, which extracts requirements from text documents:

# file_processor.py
import re
from typing import List, Dict, Any, Tuple
import fitz # PyMuPDF

async def extract_text_from_pdf(content: bytes) -> Tuple[str, List[int]]:
"""Extract text and page numbers from PDF content."""
doc = fitz.open(stream=content, filetype="pdf")
text = ""
page_numbers = []

for page_num, page in enumerate(doc, 1):
page_text = page.get_text()
if page_text.strip():
text += page_text + "\n\n"
page_numbers.append(page_num)

return text, page_numbers

async def extract_requirements_from_text(text: str) -> List[Dict[str, Any]]:
"""Extract requirement statements from text."""
# Split into paragraphs
paragraphs = text.split('\n\n')

# Filter potential requirements
requirements = []
for para in paragraphs:
para = para.strip()
if para and is_requirement_statement(para):
requirements.append({
'text': preprocess_text(para),
'type': classify_requirement_type(para)
})

return requirements

def is_requirement_statement(text: str) -> bool:
"""Identify if text is likely a requirement statement."""
keywords = ['shall', 'must', 'will', 'should', 'may', 'required']
return any(re.search(rf'\b{keyword}\b', text.lower()) for keyword in keywords)

This separates the concerns neatly into data transformations and follows functional programming principles.

Visualizing Requirements: The Vue.js Frontend

Here's a simplified Vue component that visualizes requirements, separating data, calculations, and actions:

<!-- RequirementsAnalyzer.vue -->
<template>
<div class="requirements-analyzer">
<h1>Requirements Similarity Analyzer</h1>

<!-- Upload Form (UI for Actions) -->
<div class="upload-section" v-if="!isAnalyzing && !analysisComplete">
<h2>Upload Documents</h2>
<div class="upload-form">
<div class="file-input">
<label for="file-upload">Select files (PDF, DOCX, TXT)</label>
<input
type="file"
id="file-upload"
multiple
@change="handleFileSelection"
accept=".pdf,.docx,.txt"
>
<div class="selected-files" v-if="selectedFiles.length">
<p>Selected {{ selectedFiles.length }} files:</p>
<ul>
<li v-for="(file, index) in selectedFiles" :key="index">
{{ file.name }}
</li>
</ul>
</div>
</div>

<!-- Similarity Threshold Input (Data) -->
<div class="similarity-settings">
<label for="similarity-threshold">Similarity Threshold: {{ similarityThreshold }}%</label>
<input
type="range"
id="similarity-threshold"
v-model="similarityThreshold"
min="50"
max="100"
step="5"
>
</div>

<!-- Action Trigger -->
<button
class="analyze-button"
@click="startAnalysis"
:disabled="selectedFiles.length === 0"
>
Analyze Requirements
</button>
</div>
</div>

<!-- Results Display (Visualization of Data) -->
<div class="results-section" v-if="analysisComplete">
<!-- Content display here -->
</div>
</div>
</template>

<script>
import { ref, computed } from 'vue';
import { analyzeRequirements, getAnalysisStatus, exportAnalysisResults } from '../services/api';

export default {
name: 'RequirementsAnalyzer',
setup() {
// Data
const selectedFiles = ref([]);
const similarityThreshold = ref(85);
const analysisResults = ref({
common_requirements: [],
unique_requirements: []
});

// Calculations (computed values)
const analysisProgressMessage = computed(() => {
if (analysisProgress.value < 50) {
return `Extracting requirements from documents (${Math.round(analysisProgress.value)}%)`;
} else {
return `Analyzing requirement similarities (${Math.round(analysisProgress.value)}%)`;
}
});

// Actions (side effects)
const handleFileSelection = (event) => {
selectedFiles.value = Array.from(event.target.files);
};

const startAnalysis = async () => {
// Implementation details
};

return {
// Expose data, calculations, and actions to the template
selectedFiles,
similarityThreshold,
analysisResults,
handleFileSelection,
startAnalysis
};
}
};
</script>

Adapting What I Learned

I've found that this functional approach to building software is particularly effective for document analysis tools. By separating data, calculations, and actions, the code becomes more:

  • Testable: Pure functions are easy to test
  • Maintainable: Separation of concerns makes code easier to understand
  • Extensible: New functionality can be added without disrupting existing code

Future Enhancements

I'm planning to extend the Requirements Similarity Analyzer with:

  1. Advanced NLP techniques: Using word embeddings or BERT for semantic similarity
  2. Requirement categorization: Auto-classifying requirements by type
  3. Integration with management tools: Connecting with Jira, Azure DevOps, etc.
  4. Change tracking: Identifying when similar requirements evolve differently
  5. Impact analysis: Assessing how changes affect related requirements

Conclusion

Building this Requirements Similarity Analyzer has reinforced how effective functional programming principles can be for document analysis applications. By breaking down the problem into data, calculations, and actions, I've created a tool that's both powerful and maintainable.

Through careful document processing, intelligent similarity analysis, and an intuitive user interface, teams can quickly gain insights into their requirements landscape and make informed decisions about standardization and consolidation.

As I continue to work on this tool, I'm excited to see how these principles can be applied to other document analysis problems.

Credits