IRIS.AI is an ML driven software that helps scientists all over the world throughout their work. We are building a Science Assistant Machine that can read and understand scientific text and give you the full picture of the current state-of-the-art in a human understandable way.
Our AI facilitates our clients’ research and development process with the help of:
- Text summarisation
- Topic filter/analysis
- Data extraction
- Chatbot
- Rag-as-a-Service
Research Projects for this internship program:
- “Evaluating AI generated summaries” This research project focuses on developing and refining methods to evaluate the quality of AI-generated document summaries produced by large language models (LLMs). The goal is to identify, implement, and validate metrics for assessing the factuality, completeness, coverage, and readability of summaries, addressing challenges with long and complex documents. Deliverables include a review of existing evaluation methods, Python implementations of metrics, and an analysis of their performance across various datasets.
- “Fine tuning LLMs on domain specific data” . This research project explores the impact of fine-tuning large language models (LLMs) on domain-specific data to enhance the performance of Retrieval Augmented Generation (RAG) systems. The study aims to determine whether fine-tuned models outperform larger general-purpose models in specialized domains and investigate methods for generating synthetic data for fine-tuning. Deliverables include a domain-specific fine-tuning dataset, a Python framework for model fine-tuning, and a performance analysis of fine-tuned models across standard and RAG-specific benchmarks.
- "Training a Computer Vision model for symbol recognition". This project focuses on enhancing Iris.ai’s OCR capabilities to accurately recognize and interpret complex scientific symbols (e.g., mathematical and physical notations) from research documents. It involves creating a specialized dataset, training a computer vision model, and integrating the solution into the document parsing pipeline to overcome current OCR limitations, ensuring efficient and accurate information extraction. Key deliverables include a labeled dataset, a trained model, preprocessing methods, and a report for future development guidance.
- "Implementing Podcast Generation from Datasets". This project aims to generate podcast-style audio summaries from RSpace datasets, integrating open-source models with existing tools. Tasks include script generation, content retrieval, text-to-speech evaluation, and pipeline development. The goal is to deliver a functional prototype and a performance report, creating engaging, accurate audio content tailored to scientific users.
- "Developing a framework for automatic prompt optimization". This project aims to develop an adaptive framework for automatic prompt optimization to improve Iris.ai’s use of large language models. Key tasks include reviewing methods, evaluating frameworks, designing a custom system, and testing in RAG scenarios. Deliverables include a comparative analysis, a custom framework, and guidelines for creating consistent, efficient prompts.
Internship Details:
- 3 months long
- full-time (40 hours a week)
- fully Remote
- insightful, deeply practical and progressive internship program.
Requirements:
- Being a Bachelor/Master student in а computer science major
- Seeking a career with Machine Learning/NLP
- Interest in and having some knowledge of NLP
- Some experience in Python development is mandatory
- Some knowledge and experience in Machine Learning
- Interest in Iris.ai’s research projects and domain
- Ability to work full-time for the period of the internship
- Located within the European Timezone (+/-2 hours of CET)
If you think you match the profile, please send us your CV in English. We appreciate everyone’s effort, however we will only contact the candidates who best meet the above requirements.