DCC | Introducción al Trabajo de Título

Guia	Matías Toro I. 23f353f80f8b11f193dcf64a29b09bb4@dcc.uchile.cl mtoro@dcc.uchile.cl
Áreas	Ciencia e Ingeniería de datos, Inteligencia artificial
Sub Áreas	Bases de datos, Procesamiento de lenguaje natural
Estado	Disponible

Descripción

Behavior-Preserving SQL Translation with LLMs

Recent work such as CrackSQL shows that Large Language Models can effectively translate SQL queries across database dialects, producing queries that are syntactically valid and compatible with the target engine (https://dl.acm.org/doi/10.1145/3725278). However, these translations focus mainly on syntax and feature compatibility, and largely ignore whether the runtime behavior of the query is preserved.

In practice, SQL engines differ significantly in how they handle types, implicit casts, and type errors. As shown in Elucidating Type Conversions in SQL Engines, the same query can produce different results, fail at runtime, or be rejected statically depending on the engine, even when the syntax is valid (https://link.springer.com/chapter/10.1007/978-3-031-91118-7_16). Current LLM-based translators do not account for these semantic differences.

This project explores how to improve LLM-based SQL translation by making it behavior-aware. The goal is to combine LLM translation with type-aware semantic checks, ensuring that translated queries not only run, but preserve the original query’s behavior, including results and error behavior.

Possible directions include:

Detecting when a translation silently changes semantics due to implicit casts.
Guiding LLMs to insert explicit casts when needed.
Using formal typing insights to validate or reject translations.

This project sits at the intersection of databases, programming languages, and AI, and is ideal for students interested in combining formal reasoning with modern LLM systems.