
If you strip away all the buzzwords about enterprise artificial intelligence, such as “agentic AI,” the reality is that companies are learning what works in practice as they experiment with the technology, according to data tools giant Databricks.
Also: 10 key reasons AI went mainstream overnight – and what happens next
“We’re still learning where the right places are to put AI, where you can get the sweet spot of AI to help you solve a problem,” said Databricks’s chief AI scientist, Jonathan Frankle, in a recent interview he and I had in New York.
A new kind of enterprise analytics
Databricks Chief AI scientist Jonathan Frankle.
Tiernan Ray
On a basic level, generative AI, such as large language models, is making possible a new kind of enterprise analytics, said Frankle. Unstructured data, such as Word files, images, or videos, had no place in traditional data analytics before generative AI, noted Frankle. But now, it’s a goldmine.
“Imagine tons and tons of unstructured documents, which are really tricky to analyze in a pre-generative AI or pre-LLM world, and suddenly you can extract meaningful features from them,” he said. “Data that was useless in an analytics world is now incredibly valuable here.”
Also: Will synthetic data derail generative AI’s momentum or be the breakthrough we need?
While many people fixate on generative AI taking over actual programming code, a much simpler use would be to simply analyze a company’s computer code.
“All the documentation for all of the code at your company” was “not really that useful as a data source in 2015, but, in 2025, incredibly valuable […] just answering questions about your code for developers.”
Similarly, “You can imagine every single chat log from a customer service application with real humans, doing high-level analytics on that. What is the average number of interactions in a conversation? What is the average time to resolve an issue? Things that would not have been possible ten years ago.”
The role of data is central in developing generative AI apps, said Frankle. Frankle came to Databricks when it bought the machine learning startup he was working for, MosaicML, in 2023. MosaicML focuses on optimizing the infrastructure for running AI, whereas Databricks is one of the leading purveyors of data lakes and technology to move and shape data.
Also: AI agents aren’t just assistants: How they’re changing the future of work today
“The whole thesis for the acquisition was that we had one piece, Databricks had a lot of other pieces, and it made much more sense together,” said Frankle.
“You’re trying to deploy an AI customer service bot. What data is that customer service bot working off of?” Frankle explained. “It’s working off of customer information, it’s working off your documentation, it’s working off your SQL databases. That’s all on Databricks.”
From data to structure
Having the data together in Databricks is the beginning of creating the kinds of new analytics Frankle cites. While LLMs can make use of a pile of unstructured data, it doesn’t hurt to get a company’s data into some kind of structure beforehand.
“If you did the work in advance to use an LLM to pre-process that data into some kind of structured form, like SQL or JSON, you’re asking for less work on the part of the AI — you should always try to make things as easy as possible for the AI because these systems are definitely not infallible.”
Also: 8 out of 10 college students and administrators welcome AI agents
An important preparatory step is putting the data into what are called “embeddings.”
An “embedding model” is an AI model that is used to turn characters, words, or sentences into a vector — a group of numbers — that capture some of the semantic content of those characters, words, or sentences.
You can think of embeddings as numeric scores representing the relatedness of terms, such as “apple” to “fruit,” or “baby” to “human.”
Simple language models, even relatively small ones, such as Google’s BERT from 2018, can be used to make embeddings. “You don’t need huge models to get great embeddings,” said Frankle.
A lot of embedding models have been developed in the open-source community, noted Frankle, by adapting Meta Platforms’ Llama model via the process known as fine-tuning.
However, “You might need to train a custom embedding model,” given that existing ones are “built on web data,” making them very general.
In specific domains, such as healthcare, for example, a custom embedding model can find relationships between words and phrases better than a generic embedding model.
“We’re finding that customizing embedding models can lead to disproportionately good retrieval improvement,” said Frankle. “We think there’s still a lot of juice to squeeze out of just making them [embedding models] more specific to a domain.”
A well-developed embedding model is exceptionally important because “they will make the heavy lifting that’s done [by the large language model] a lot easier,” he said.
Also: Is your business AI-ready? 5 ways to avoid falling behind
Multiple embedding models can also be chained together, said Frankle. That can allow an AI model used in, for example, document search, to narrow down from a large group of a hundred documents to just a handful that answer a query.
In addition to tuning an embedding model, how data is fed into the embedding is its own area of excellence. “When you provide these documents to an embedding model, you usually don’t want to provide the whole document all at once,” he said.
“You often want to chunk it into pieces,” and how to do so optimally is also a matter of experimenting and trying approaches.
Frankle added that Databricks is “doing research on all of these topics because, in a lot of cases, we don’t think the state of the art is good enough,” including embeddings.
While a lot can be “plug and play” via Databricks, says Frankle, “the trickiest part is there’s still a lot of experimentation. There are a lot of knobs that need to be turned. Should you fine-tune, or should you not fine-tune? How many documents should you try to retrieve and put in the context? What is your chunk size?”
The question of what to build
Beyond the techniques, knowing what apps to build is itself a journey and something of a fishing expedition.
“I think the hardest part in AI is having confidence that this will work,” said Frankle. “If you came to me and said, ‘Here’s a problem in the healthcare space, here are the documents I have, do you think AI can do this?’ my answer would be, ‘Let’s find out.'”
From what Frankle is seeing with customers, “Applications that are getting into practice right now tend to look for things that are a little more open-ended,” he said — meaning what the AI model produces can be fuzzy, not necessarily specific. “AI is great at producing an answer, not always great at producing the answer,” he observed.
Also: Here’s why you should ignore 99% of AI tools – and which four I use every day
“With AI, you can do fuzzy things, you can do document understanding in ways that I could never write a Python program for,” Frankle explained.
“I also look for applications where it’s relatively expensive to come to an answer but relatively cheap to check the answer.” An example is the automatic generation of textual notes for a doctor from recordings of his patient exams. “A draft set of patient notes can be generated, they [the doctor or doctor’s assistant] can check it, tweak a couple of things, and call it a day.” That’s a useful way to eliminate tedium, he said.
Conversely, “Applications where you need the right answer, and they’re hard to check” may be something to avoid for now. He gave the example of drafting a legal document. “If the AI misses one thing, the human now needs to go and review the whole document to make sure they didn’t miss anything else. So, what was the point of using the AI?” Frankle observed.
On the other hand, there is lots of potential for AI to do things such as take over grunt work for lawyers and paralegals and, as a result, broaden the access people have to lawyers.
Also: Want to win in the age of AI? You can either build it or build your business with it
“Suppose that AI could automate some of the most boring legal tasks that exist?” offered Frankle, whose parents are lawyers. “If you wanted an AI to help you do legal research, and help you ideate about how to solve a problem, or help you find relevant materials — phenomenal!”
“We’re still in very early days” of generative AI, “and so, kind of, we’re benefiting from the strengths, but we’re still learning how to mitigate the weaknesses.”
The path to AI apps
In the midst of uncertainty, Frankle is impressed with how customers have quickly traversed the learning curve. “Two or three years ago, there was a lot of explaining to customers what generative AI was,” he noted. “Now, when I talk to customers, they’re using vector databases.”
“These folks have a great intuition for where these things are succeeding and where they aren’t,” he said of Databricks customers.
Given that no company has an unlimited budget, Frankle advised starting with an initial prototype, so that investment only proceeds to the extent that it’s clear an AI app will provide value.
Also: Even premium AI tools distort the news and fabricate links – these are the worst
“It should be something you can throw together in a day using GPT-4, and a handful of documents you already have,” he offered. The developer can enlist “a couple random people from around the company who can tell you you’re on the right track here or not.”
For managers, Frankle advises making exploration of generative AI a part of the job on a regular basis.
“People are motivated,” such as data scientists, he noted. “It’s even less about the money and more about just giving them the time and saying, as part of your job responsibilities, take a couple weeks, do a two-day hackathon, and just go see what you can do. That’s really exciting for people.”
The motto in enterprise generative AI might be, from tiny acorns grow mighty oaks.
As Frankle put it, “The person who happens to have that GPU in their basement, and is playing with Llama, actually is very sophisticated, and could be the generative AI expert of tomorrow.”
(Except for the headline, this story has not been edited by PostX News and is published from a syndicated feed.)