RIS 2025

Urban Pleša | Apr 6, 2025 min read

The Problem

We had to make a model that would exctract certain information from medical report needed to create medical records. Medical reports were writen by doctors and were in standar conversational form. Medical record only need a type of surgery, size of artefacts, location, number of them, etc. In first round we had to exctract 10 different variables and in second round 22.

How were we doing?

We were doing well! Our program solved test cases in under 20 minutes, achieving at least 75% accuracy. And that’s not all—we hadn’t even fully trained our model (due to lack of time), so imagine what it could achieve once properly trained!

How does the program work?

Instead of feeding the model a full dataset and asking it to output all variables at once, we opted for a more efficient approach. We give it a snippet of data and ask for one variable at a time. Thus, we repeat the same snippet 10 times to extract all the information we need.

The most critical factor is the model we use. A good model:

  • Understands Slovenian,
  • Is large enough to not be stupid,
  • Is small enough to not be slow or overly demanding for the given hardware.

After extensive testing and filling 1TB of supercomputer storage, the best-performing models were Qwen (by Alibaba, Chinese developers). We were surprised by Qwen’s capabilities and, conversely, disappointed by DeepSeek’s (another Chinese model), despite it being its peer. The top performers were Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct, so we decided to use both.

The model isn’t magic—if you feed it garbage, it’ll return even worse garbage. More than the model itself, we experimented with prompts (instructions) to refine answers. This step-by-step tweaking led us to our current accuracy. With more time, we could likely improve it by several more percentage points.

We also applied some pre- and post-processing techniques to boost otherwise slightly weaker results—though that’s a story for another time. (Statistics!…)

Results

We won!!!