Batch Image Processing: Efficient OCR Solutions and Best Practices
Learn how to batch process multiple images for OCR recognition, including loop calls, concurrent requests, error handling, and performance optimization.
Why Batch Processing?
In real business scenarios, you may need to process large numbers of images at once:
- Scanning archived historical documents (hundreds to thousands of pages)
- Batch recognition of invoices and receipts for financial entry
- Processing multiple user-uploaded images
- Scheduled tasks to automatically process new files
Processing images one by one manually is too inefficient; an automated batch processing solution is needed.
Method 1: Sequential Loop Calls
The simplest approach is to use a loop to process each image sequentially. This method is easy to implement and suitable for scenarios with fewer images or lower speed requirements.
JavaScript/Node.js Example
async function batchOCR(images) {
const results = [];
for (const image of images) {
const formData = new FormData();
formData.append('file', image);
try {
const response = await fetch('https://api.easyocr.org/ocr', {
method: 'POST',
body: formData
});
const result = await response.json();
results.push({
filename: image.name,
success: true,
data: result
});
} catch (error) {
results.push({
filename: image.name,
success: false,
error: error.message
});
}
// Add delay to avoid too frequent requests
await new Promise(resolve => setTimeout(resolve, 200));
}
return results;
}
Python Example
import requests
import time
def batch_ocr(image_paths):
results = []
for path in image_paths:
try:
with open(path, 'rb') as f:
files = {'file': f}
response = requests.post(
'https://api.easyocr.org/ocr',
files=files
)
result = response.json()
results.append({
'filename': path,
'success': True,
'data': result
})
except Exception as e:
results.append({
'filename': path,
'success': False,
'error': str(e)
})
# Add delay
time.sleep(0.2)
return results
Method 2: Concurrent Request Processing
For faster processing, you can use concurrent requests to process multiple images simultaneously. However, be careful to control the concurrency level to avoid triggering API rate limits.
JavaScript Concurrent Example
async function batchOCRConcurrent(images, concurrency = 3) {
const results = [];
// Split images into batches
for (let i = 0; i < images.length; i += concurrency) {
const batch = images.slice(i, i + concurrency);
const promises = batch.map(async (image) => {
const formData = new FormData();
formData.append('file', image);
try {
const response = await fetch('https://api.easyocr.org/ocr', {
method: 'POST',
body: formData
});
return {
filename: image.name,
success: true,
data: await response.json()
};
} catch (error) {
return {
filename: image.name,
success: false,
error: error.message
};
}
});
const batchResults = await Promise.all(promises);
results.push(...batchResults);
// Add delay between batches
if (i + concurrency < images.length) {
await new Promise(resolve => setTimeout(resolve, 500));
}
}
return results;
}
Error Handling and Retry Mechanism
During batch processing, some requests may fail due to network issues or temporary errors. Implementing a retry mechanism can improve overall success rate.
async function ocrWithRetry(image, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const formData = new FormData();
formData.append('file', image);
const response = await fetch('https://api.easyocr.org/ocr', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return await response.json();
} catch (error) {
console.log(`Attempt ${attempt}/${maxRetries} failed: ${error.message}`);
if (attempt === maxRetries) {
throw error;
}
// Exponential backoff: double wait time each retry
await new Promise(resolve =>
setTimeout(resolve, 1000 * Math.pow(2, attempt - 1))
);
}
}
}
Progress Tracking
When processing large numbers of images, providing progress feedback improves user experience.
async function batchOCRWithProgress(images, onProgress) {
const results = [];
const total = images.length;
for (let i = 0; i < images.length; i++) {
const image = images[i];
// Call progress callback
onProgress({
current: i + 1,
total: total,
percentage: Math.round(((i + 1) / total) * 100),
currentFile: image.name
});
// Process image...
const result = await processImage(image);
results.push(result);
}
return results;
}
// Usage example
batchOCRWithProgress(images, (progress) => {
console.log(`Progress: ${progress.percentage}% (${progress.current}/${progress.total})`);
console.log(`Current file: ${progress.currentFile}`);
});
Performance Optimization Tips
1. Image Preprocessing
Compressing and optimizing images before upload can reduce transfer time:
- Compress large images to appropriate size (recommended width not exceeding 2000px)
- Use JPEG format with moderate quality reduction (80-90%)
- Crop out areas that don't need recognition
2. Set Appropriate Concurrency
- Too low concurrency: Slow processing speed
- Too high concurrency: May trigger rate limits or cause server pressure
- Recommended: 3-5 concurrent requests
3. Implement Checkpoint Resume
For large batch tasks, save progress to support resuming after interruption:
// Save progress to local storage
function saveProgress(processedIds) {
localStorage.setItem('ocr_progress', JSON.stringify(processedIds));
}
// Restore progress
function loadProgress() {
const saved = localStorage.getItem('ocr_progress');
return saved ? JSON.parse(saved) : [];
}
Important Notes
- Follow API usage guidelines: Don't send requests too frequently
- Add appropriate delays: Recommended interval of 200ms or more between requests
- Implement error handling: Network requests may fail, handle them properly
- Monitor processing status: Record success and failure counts for troubleshooting
- Consider memory usage: Loading many images simultaneously may cause memory issues
Related Resources
Learn more OCR tips: