Notes from DL2026

Last week (June 24 to 26) I attended DL2026 in Tromsø. It was three days of talks about deep learning, with most of the speakers coming from academia. What follows are the notes I want to keep for myself. They concentrate on the electronic-health-records (EHR) and transformer talks nearest to my own work.

Topic 1: EHR

Ida Häggström: Forecasting in Medical Imaging

Of all the talks, this is the one that has stayed with me. Häggström articulated a limitation I think is underappreciated in current EHR modelling where we almost invariably tie a prediction to a single and predetermined horizon. Typical targets are 30-day readmission, mortality at the next visit, or deterioration within 24 hours. In each case the prediction time is fixed in advance.

The alternative she advocated is multi-horizon forecasting, where the goal is to predict an entire trajectory across many future time points rather than a single endpoint. The conventional formulation learns

$$ p\bigl(y_{t+\Delta} \mid x_{\le t}\bigr) $$

for one fixed lead time $\Delta$. A multi-horizon model instead estimates the full set of horizons jointly,

$$ \bigl\{\, p\bigl(y_{t+h} \mid x_{\le t}\bigr) \,\bigr\}_{h=1}^{H}. $$

A single trained model can then be queried at arbitrary lead times, such as six months or two years ahead, without retraining a separate classifier for each window. This flexibility is clinically meaningful, since the questions a clinician poses are rarely fixed. This could be an interesting new direction for the EHR domain. The record would be treated as a trajectory to be forecast over a horizon, rather than a static feature vector feeding a single fixed-time classifier.

Some papers I found to read in the future:

Mads Nielsen: CORE-BEHRT and the MEDS standard

Nielsen discussed two contributions I had been meaning to study more carefully. CORE-BEHRT studies BERT-based modelling of electronic health records through incremental optimisation, isolating how key design choices in data representation, technical components, and training procedure affect downstream performance. MEDS (the Medical Event Data Standard) is a lightweight, minimal schema for enabling machine learning over EHR data, designed for interoperability across datasets, existing tools, and model architectures.

Some papers I found to read in the future:

Topic 2: Transformer

Adín Ramírez Rivera: Sparse Vision Models

This was the talk on tokenisation. Ramírez Rivera’s central observation is that vision transformers rest on a rigid assumption: the image is partitioned into a fixed grid of square patches, each treated as a token. That grid disregards the spatial and semantic structure of the image. His group’s work targets exactly this stage of the pipeline. SPiT uses modular superpixel tokenisation. It decouples tokenisation from feature extraction, so that tokens follow image structure rather than a grid. SPoT (Subpixel Placement of Tokens) positions tokens continuously within the image instead of snapping them to patch cells which allow the model to remain sparse where the image is uninformative. The point I am taking away is that tokenisation is itself a modelling decision and not a fixed preprocessing step. EHR sequences are themselves irregular and structured, so it is worth asking what the right token for them should be.

Some papers I found to read in the future:

Beyond the talks

Beyond the presentations, I joined the work session on AI Risks, Challenges and Key Opportunities. The discussion was led by Pierre Baldi and Robert Jenssen. Baldi laid out four broad challenges. There are nefarious uses of AI by bad actors. There is disruption to labour markets and the economy. There is the slow erosion of purpose, shared reality, and social connection. And there are outright existential threats.

One line of his stuck with me.

AI is a tsunami, but in slow motion.

The impact is huge, but it arrives slowly enough that it is easy to underestimate.

Much of the discussion then turned on the metaphor of an “AI telescope”. The question was whether ambitious AI should be pursued as shared scientific infrastructure rather than as fragmented effort. Slowing down research was dismissed as infeasible. The obstacles are familiar ones: cost and how to share it, the facility itself, data, leadership, the relationship to industry, safety, and public image.

The second half sketched a possible safety framework. The idea is simple. Since AI is inspired by natural intelligence, perhaps AI safety can borrow from what keeps natural intelligence safe. The analogy mapped each safety mechanism in natural intelligence onto an AI counterpart.

Natural intelligence safety	AI counterpart
Evolution	Modular architectures, safety modules
Examples (parents, teachers, principles, role models)	Supervised post-training, RLHF
Principles	Constitutional AI
Laws	AI laws
Societal structure	Agentic AI
Enforcement (police, polygraph tests)	Enforcement (fake detectors)
Enforcement (armies, weapons of mass destruction)	Enforcement (kill switches)

Seeing them side by side makes the comparison click. The analogy is not perfect, but it is a handy way to think about the problem.

Closing thoughts

DL2026 gave me a few things to keep thinking about. The EHR talks made me want to forecast over time, not just predict one fixed point. The transformer talks made me see tokens as a design choice. And the work session was a good reminder that the hardest parts of AI are not only technical. The next workshop of DL2027 will be held in Vietnam.