How to extend Featuretools primitive-functions to automatically create features from tabular text columns.

Image by Google. Diagram of a text-to-text framework. Every task uses text as input to the model, which is trained to generate some target text. This allows the same model, loss function, and hyperparameters across a diverse set of tasks, including translation (green), linguistic acceptability (red), sentence similarity (yellow), and document summarization (blue). see Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Originally published at on March 29, 2021.

In this article we will demonstrate how to featurize text in tabular data using Google’s state-of-the-art T5 Text to Text Transformer. You can follow along using the Jupyter Notebook from this repository.

When trying to leverage real-world data in a machine learning pipeline, it is common to come across written text — for example, when predicting real estate valuations there are many numerical features, such as:

  • “number of bedrooms”
  • “number of bathrooms”
  • “area in sqft”
  • “latitude”
  • “longitude”
  • &etc…

But also, there are large blobs ofwritten text, such as found in real estate…


Jan 29, 2020 Mike Casale |

The iBuyer revolution is underway and it’s already reshaping how investors buy homes, how homeowners sell homes, how quickly real estate transactions happen, and the prices transactions happen at— which raises many questions: Is this a good thing for homeowners? When is the best time to sell to an iBuyer? What types of homes are iBuyers looking to buy? Are iBuyers paying a fair value? Perhaps more generally, can we decode the artificial intelligence which is driving their buying decisions?

Our research uncovers the fundamentals of the largest national iBuyers in real estate today: Do iBuyers like Opendoor and Zillow…

“The alchemist knew the legend of Narcissus, a youth who knelt daily beside a lake to contemplate his own beauty. He was so fascinated by himself that, one morning, he fell into the lake and drowned.” — Paulo Coelho, The Alchemist

A Path Toward Automated Social Network Lie Detection Using Transfer Learning with Google’s T5 Unified Text-to-Text Transformer Model

Narcissus drowns while he contemplates the algorithmic reflection of his data. This is because in mathematical terms, a social network is less a connection with others than it is an algorithmic reflection of ourselves. These algorithmic mirrors reflect only the content we are most likely to see, to “Like”, and to “click” — and in this sense, the threat of Facebook is not so much a new terminator-like artificial intelligence, but rather it is the same old threat warned of in ancient myths like Narcissus.

Looking through the lens of a machine learning developer, the real problem with fake news…

“The Coronavirus’s Impact on Housing Is Now Nearly Nationwide” — Redfin

How AI Could Stabilize Real-Estate Prices During Coronavirus Panic

By Mike Casale | Mar. 22, 2020 | Data Science @

Most of the popular home valuations like the Zestimate use state-of-the-art machine learning technologies which, in recent years, have allowed ever more accurate predictions to become pervasive go-to reference points for both home buyers and sellers in residential real estate markets— and this could be good news for home owners during the coronavirus panic.

“A couple of times a week, Nick Spencer checks the value of his four-bedroom house in Haddon Heights, N.J., on Zillow. …

Climate scientists have largely failed to gain the support of those they need the most — white men who own guns. The American citizen most likely to own a gun is a white male. Republicans are twice as likely as Democrats to be members of a gun-owning household. Political independents also are more likely than Democrats to have a firearm in their homes. White southerners are significantly more likely to have a gun at home than whites in other regions. As a group, Americans who have a gun at home see themselves differently than do other adults. According to the…

Artwork by Beau Stanton — Backfire: CARTOGRAPHY OF THE MACHINE (2014)


Libertarians claim that dollars are backed by nothing — yet, in some sense, it is indisputable that dollars are backed by future human labor, and more specifically, the taxation thereof. Given this — can artificial labor displace human labor respecting taxation?

In response to this, a case is made to answer in the affirmative within the context of the estimated $73 trillion (USD) needed to decarbonize the economy. In sections one through five (1–5), focus is given to justification for such a solution. In section six, a non-technical, pre-fabricated solution is provided using a de-abstracted, hypothetical “real-world” scenario. …

Mike Casale

AVM/Data Scientist | machine learning developer, start-ups, creating business intelligence tools and predictive analytics models. Climate action enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store