HTR with VLM

This handwritten text recognition pipeline uses Florence-2 fine-tuned for text line detection and OCR tasks. Steps in the pipeline:

  1. Detect text lines from the page image
  2. Perform text recognition on detected lines

This space does not have access to GPU. Inference on CPU will be extremely slow, so I cached example results to disk. Some notes:

  • To view example outputs, select one image from the examples, and choose Used cached result: True. To transcribe an example from scratch, choose False.
  • New images uploaded will be transcribed from scratch.
Use cached result