B.Öztürk
Artificial Intelligence2023·4 weeks·Completed

CV Parser API

A REST API that accepts PDF and DOCX CVs and parses them using LangChain + GPT-4. Output: clean JSON with name, contact info, work history, education, and skills.

Category

Artificial Intelligence

Year

2023

Role

Backend Developer

Status

Completed

Project preview

Problem

CV parsing modules in HR software are both expensive and low-performing on Turkish-language CV formats.

Solution

Custom prompt templates combined with LangChain's structured output feature parse both English and Turkish CVs with 94% accuracy. Parse results are cached in Redis.

Outcomes

94% parse accuracy across Turkish and English CVs

Average processing time: 2.3 seconds

2 production integrations with HR software

Technical Challenges

01

Robust text extraction across wildly different CV layouts

02

Type-safe parsing using LangChain structured output

03

Cost optimization with a Redis TTL caching strategy

Tech Stack

FastAPI

API framework

LangChain

LLM orchestration

GPT-4

Parse engine

PyMuPDF

PDF text extraction

python-docx

Word file reading

Redis

Caching

Tags

FastAPIPythonLangChainRedis

Other projects