Tips

Batch Image Processing: Efficient OCR Solutions and Best Practices

Learn how to batch process multiple images for OCR recognition, including loop calls, concurrent requests, error handling, and performance optimization.

15 min read

Why Batch Processing?

In real business scenarios, you may need to process large numbers of images at once:

  • Scanning archived historical documents (hundreds to thousands of pages)
  • Batch recognition of invoices and receipts for financial entry
  • Processing multiple user-uploaded images
  • Scheduled tasks to automatically process new files

Processing images one by one manually is too inefficient; an automated batch processing solution is needed.

Method 1: Sequential Loop Calls

The simplest approach is to use a loop to process each image sequentially. This method is easy to implement and suitable for scenarios with fewer images or lower speed requirements.

JavaScript/Node.js Example

async function batchOCR(images) {
  const results = [];
  
  for (const image of images) {
    const formData = new FormData();
    formData.append('file', image);

    try {
      const response = await fetch('https://api.easyocr.org/ocr', {
        method: 'POST',
        body: formData
      });
      
      const result = await response.json();
      results.push({ 
        filename: image.name, 
        success: true, 
        data: result 
      });
    } catch (error) {
      results.push({ 
        filename: image.name, 
        success: false, 
        error: error.message 
      });
    }
    
    // Add delay to avoid too frequent requests
    await new Promise(resolve => setTimeout(resolve, 200));
  }
  
  return results;
}

Python Example

import requests
import time

def batch_ocr(image_paths):
    results = []
    
    for path in image_paths:
        try:
            with open(path, 'rb') as f:
                files = {'file': f}
                response = requests.post(
                    'https://api.easyocr.org/ocr',
                    files=files
                )
                result = response.json()
                results.append({
                    'filename': path,
                    'success': True,
                    'data': result
                })
        except Exception as e:
            results.append({
                'filename': path,
                'success': False,
                'error': str(e)
            })
        
        # Add delay
        time.sleep(0.2)
    
    return results

Method 2: Concurrent Request Processing

For faster processing, you can use concurrent requests to process multiple images simultaneously. However, be careful to control the concurrency level to avoid triggering API rate limits.

JavaScript Concurrent Example

async function batchOCRConcurrent(images, concurrency = 3) {
  const results = [];
  
  // Split images into batches
  for (let i = 0; i < images.length; i += concurrency) {
    const batch = images.slice(i, i + concurrency);
    
    const promises = batch.map(async (image) => {
      const formData = new FormData();
      formData.append('file', image);
      
      try {
        const response = await fetch('https://api.easyocr.org/ocr', {
          method: 'POST',
          body: formData
        });
        return { 
          filename: image.name, 
          success: true, 
          data: await response.json() 
        };
      } catch (error) {
        return { 
          filename: image.name, 
          success: false, 
          error: error.message 
        };
      }
    });
    
    const batchResults = await Promise.all(promises);
    results.push(...batchResults);
    
    // Add delay between batches
    if (i + concurrency < images.length) {
      await new Promise(resolve => setTimeout(resolve, 500));
    }
  }
  
  return results;
}

Error Handling and Retry Mechanism

During batch processing, some requests may fail due to network issues or temporary errors. Implementing a retry mechanism can improve overall success rate.

async function ocrWithRetry(image, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const formData = new FormData();
      formData.append('file', image);
      
      const response = await fetch('https://api.easyocr.org/ocr', {
        method: 'POST',
        body: formData
      });
      
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }
      
      return await response.json();
    } catch (error) {
      console.log(`Attempt ${attempt}/${maxRetries} failed: ${error.message}`);
      
      if (attempt === maxRetries) {
        throw error;
      }
      
      // Exponential backoff: double wait time each retry
      await new Promise(resolve => 
        setTimeout(resolve, 1000 * Math.pow(2, attempt - 1))
      );
    }
  }
}

Progress Tracking

When processing large numbers of images, providing progress feedback improves user experience.

async function batchOCRWithProgress(images, onProgress) {
  const results = [];
  const total = images.length;
  
  for (let i = 0; i < images.length; i++) {
    const image = images[i];
    
    // Call progress callback
    onProgress({
      current: i + 1,
      total: total,
      percentage: Math.round(((i + 1) / total) * 100),
      currentFile: image.name
    });
    
    // Process image...
    const result = await processImage(image);
    results.push(result);
  }
  
  return results;
}

// Usage example
batchOCRWithProgress(images, (progress) => {
  console.log(`Progress: ${progress.percentage}% (${progress.current}/${progress.total})`);
  console.log(`Current file: ${progress.currentFile}`);
});

Performance Optimization Tips

1. Image Preprocessing

Compressing and optimizing images before upload can reduce transfer time:

  • Compress large images to appropriate size (recommended width not exceeding 2000px)
  • Use JPEG format with moderate quality reduction (80-90%)
  • Crop out areas that don't need recognition

2. Set Appropriate Concurrency

  • Too low concurrency: Slow processing speed
  • Too high concurrency: May trigger rate limits or cause server pressure
  • Recommended: 3-5 concurrent requests

3. Implement Checkpoint Resume

For large batch tasks, save progress to support resuming after interruption:

// Save progress to local storage
function saveProgress(processedIds) {
  localStorage.setItem('ocr_progress', JSON.stringify(processedIds));
}

// Restore progress
function loadProgress() {
  const saved = localStorage.getItem('ocr_progress');
  return saved ? JSON.parse(saved) : [];
}

Important Notes

  • Follow API usage guidelines: Don't send requests too frequently
  • Add appropriate delays: Recommended interval of 200ms or more between requests
  • Implement error handling: Network requests may fail, handle them properly
  • Monitor processing status: Record success and failure counts for troubleshooting
  • Consider memory usage: Loading many images simultaneously may cause memory issues

Learn more OCR tips:

Was this article helpful?

Visit ourHelp Center

Share: