Using recurrent neural networks to segment customers

Understanding consumer segments is key to any successful business. Analytically, segmentations involve clustering a dataset to find groups of similar customers. What “similar” means is defined by the data that goes into the clustering — it could be demographic, attitudinal, or other characteristics. And the data that goes into the clustering is often limited by the clustering algorithms themselves — most require some kind of tabular data structure, and common techniques like k-Means require strictly numeric input. Breaking out of these restrictions has been one of our top priorities since starting the company.

So what do you do when you want to find segments of customers that are “similar” because they behave similarly — their experience with you, their brand, has been similar. How would you define that? Increasingly, companies are collecting sequence data, with each entry being an interaction with a customer — be it a purchase, reading an email, visiting the website, etc. Given the popularity of deep learning techniques to tackle sequence-related learning tasks, we thought applying neural networks to customer segmentation was the natural approach.

This post builds off of our previous customer journey segmentation post and demonstrates a prototype of a deep learning approach to behavior sequence segmentation. We wanted to investigate if we could leverage the internal state of a recurrent neural network (RNN) on complex sequences of data to identify distinctive customer segments.

Turns out that we can. And it works well.

Data description

Our client recorded a behavioral dataset for each customer interaction such as receiving an email, opening an email or using the app, so a single users “sequence” looks like this. Note that each sequence can have a variable number of rows.

User ID Cancel Sent Email Open email Click email App used Site visited Days since last interaction
1001 0 0 0 0 0 1 0
1001 0 0 0 0 0 1 2
1001 0 0 0 0 0 1 4
1001 0 1 0 0 0 0 5
1001 0 0 1 0 0 0 7
1001 0 0 0 0 0 1 1

Developing the Neural Network

We developed a very simple neural network architecture which is described below. For this sample of customers, we knew whether or not they had churned by the time the data was collected, so our “X’s” were the sequences of customer behavior, and our “Y’s” were 0/1s depending on if the customer had churned.

Therefore we had a sigmoid output layer which predicted either a 0 or 1 and a recurrent input layer, which is able to handle variable length sequences. We included a dense layer to make the network more powerful, and to generate encodings.

 

Layer Input dimension Output dimension
Recurrent Variable 10
Dense 10 10 (used for encoding)
Sigmoid 10 1

 

 

We used Keras (on R) to specify and train the network.

After training the network on the churn data, we used the weights from the Recurrent and Dense layers to produce a set of encodings for each user. After feeding in a user’s sequence, we get a ten-dimensional numeric encoding out:

User ID Encoding_1 Encoding_2 Encoding_3 Encoding_4 Encoding_5
1001 0 0 0.4 12.8 0.5
1002 0.1 1.3 0.9 14.7 141.0
1003 0.1 1.3 0.9 14.7 141.0
1004 0.1 1.3 0.9 14.7 141.0
1005 0.0 0.0 0.0 0.5 0

Clustering the RNN encodings

The encodings capture all of the information of the neural network. Although they do not have any inherent  meaning we can use them in a clustering algorithm to identify distinct segments. Which is exactly what we did.

We decided to run a DBSCAN on the encoded sequence data. DBSCAN had the advantage (in this case) of being able to handle non-linearities in the data and for not needing to specify the number of clusters in advance. K-means performed similarly.

Results

The DBSCAN algorithm identifies  five distinct clusters with some significant, and valuable differences between them.

 

Segment Percentage of customers Avg. E-mails Clicked Avg. E-mails Opened Avg. App Actions Avg. Site Visits Avg. Churn Date Churn percentage
1 0.3% 2.11 22.8 16.4 18.2 325 30.1%
2 34.5% 1.13 11.5 3.6 8.1 308 16.7%
3 59.5% 0.3 3.2 0.1 2.9 88 98%
4 5.5% 4.0 27.0 89.5 16.5 337 0.1%
5 0.2% 0.5 2.0 0.0 1.5 93 93%

 

Although the clusters  are fairly imbalanced (likely an artifact of using a supervised clustering technique), the number of days since the first interaction is clearly a strong driver in defining segments. The key takeaway here is that clusters with the highest churn rate have an interaction history of three months or less. This business  absolutely must focus on getting customers through the first three months to decrease the likelihood of churning early.

Takeaways

  • Sequence data is increasingly being captured by brands and methods for exploring it must be developed
  • Recurrent neural networks are an effective way of generating encodings for behavioral sequence data
  • Clustering the encodings (results of intermediate layers) of a neural network can be an effective way of peering inside the black box

We welcome any thoughts or comments you might have, and feel free to share this blog posts with your friends and colleagues!

 

 

 

Gradient is now powered by windmills

Meet our newest team member, Stefan, from the Netherlands!

We can all relate to the thoughts of playing in the Major Leagues when we were a kid. It’s an indescribable feeling when you dream about hitting that near 100 mph fastball out of the stadium. This feeling became my reality as soon as I joined Gradient. The high-profile clients, excellent deliverables developed with meticulous care by using state-of-the-art modeling and analysis techniques make me feel like I’m batting in the MLB!

Feelings of uncertainty and doubt about the future are inextricably linked to being a recent graduate. However, after my first week at Gradient, these feelings vanished immediately. I’ve been taken in as if I was a long-lost son. At first, the transparency within the company was overwhelming, yet it is quickly becoming my favorite feature. It not only improves internal communication — it also makes me feel that, even after a week, I’m already a fully integrated employee.

Alright, enough about my feelings. Let’s talk about what my role as a Quantitative Analyst means for Gradient. But first, let me take you on my academic journey. I started my university adventure at the technical university in Eindhoven (TU/E). Wait, where? Right, I forgot to mention that I’m a Dutch citizen. I’ll be generating quantitative insights while sitting in a field of tulips wearing clogs. Anyway, I studied Innovation Sciences. Which is a broad term for everything related to the combination of technology and psychology. Afterwards I finished a more business related master’s degree at the Tilburg University (UvT) in Marketing & Management.

As you might know — a data scientist, or quantitative analyst — operates at the intersection of statistics, business and computer science. All these things are right up my alley, and make me into the utility player that Gradient has been searching for. For each new client, I will immerse myself into their business and understand their underlying goals, motivations and opportunities.

Once we have developed a hypotheses, obtained the required data and transformed the information into a usable format, I’ll get cracking on developing statistically robust models. Which is where a lot of data scientists call it a day — not me though, and especially not Gradient. We will guide you through our development process and surface meaning from the analysis.

Each project will require me to learn new techniques and explore new research topics. Not a single day will be the same. One day I might have to code non-stop to finish a sprint, while the other I will be practicing a client presentation in front of the mirror. Variety drives me, be it in my work or in my personal life, I never want to live the same day twice.

Working for an international company like Gradient resonates deeply with me. The cultural dynamic is extremely interesting and provides a challenging yet rewarding workflow. On a more personal level I enjoy learning about one’s background and cultural habits. Back in 2010, when I was 18, I went on a life-changing backpacking trip to Australia and New Zealand. Even more of a cultural shock came to me when I followed an internship in Dubai.

I’m adventurous by nature and my passion for the outdoors might be an obsession. Climbing in the Scottish Highlands, hiking through the Belgium Ardennes, camping in a French province or actually enjoying an outdoor bootcamp in my local town, ‘s Hertogenbosch. Besides the outdoors, I’ll be listening to music wherever I go. Spending hours on Spotify optimizing a playlist (of course accompanied by a meta-data analysis.) for every single occasion is one of my favorite past-times.

Now it is time to start my latest adventure with Gradient.

 

 

 

Multi-state churn analysis with a subscription product

Subscriptions are no longer just for newspapers. The consumer product landscape, particularly among e-commerce firms, includes a bevy of subscription-based business models. Internet and mobile phone subscriptions are now commonplace and joining the ranks are dietary supplements, meals, clothing, cosmetics and personal grooming products.

Standard metrics to diagnose a healthy consumer-brand relationship typically include customer purchase frequency and ultimately, retention of the customer demonstrated by regular purchases. If a brand notices that a customer isn’t purchasing, it may consider targeting the customer with discount offers or deploying a tailored messaging campaign in the hope that the customer will return and not “churn”.

The churn diagnosis, however, becomes more complicated for subscription-based products, many of which offer multiple delivery frequencies and the ability to pause a subscription. Brands with subscription-based products need to have some reliable measure of churn propensity so they can further isolate the factors that lead to churn and preemptively identify at-risk customers.

This post shows how to analyze churn propensity for products with multiple states, such as different subscription cadences or a paused subscription.  

Unpacking our box

Assume we have an online subscription-based product that can be bought at set delivery intervals: monthly, quarterly and biannually. The customer also has the option to pause the subscription for any reason.

In our hypothetical example, a customer journey involves five states:

State 1:  Starts a subscription for the first time

State 2: Unsubscribes from receiving promotional emails from the brand

State 3: Pauses subscription because the supply from the previous delivery is not depleted

State 4: Unsubscribes from receiving promotional emails and pauses subscription (combination of States 2 and 3)

State 5: Cancels the subscription because no longer has a need for the product

Customer transition matrix

Like any relationship, that between a brand and customer passes through many states and phases. The transition between states can be represented in a transition matrix. This transition matrix presents an example of transitions between states:

This plot below conveys the various transitions in a graphical format. The corners represent transition states and each arrow represents the direction of a possible transition journey. Inevitably, one can see that it is possible to move to State 5, churn, from every possible state. However, to move from State 1 to State 4 requires passing through States 2 or 3. This is just one hypothetical customer journey in our subscription-based product.

Putting our data to use

Let’s also assume the brand has a variety of data points about each customer’s journey. Each customer’s files has:

  • A unique ID
  • The time since an event occurred within a state measured in the number of days (columns St2, St3…)
  • Occurrence of an event within said state (columns St2.s, St3.s…; 0: did not occur; 1: occurred)
  • Demographic and sales data such as year, age, discounts, gender

To generate a measure of size of the number of customers in each state, we can simply calculate the transition frequency and proportion of transitions across the full customer base. Here we see that there are 533 churn events.

We can also see that among customers who started their journey in State 1,640 customers moved to State 2, 777 to State 3, and 160 to State 5.  Furthermore, 332 stayed in State 1, which is classified as a non event.

The most probable transition is from State 1 is to State 3, which is shown below as a proportion. We see that 46% of customers that were in State 3 end up in the State 4, and of those, 25% end up churning.  

Building a time model

A common approach to modeling time is the Cox proportional hazards (PH) model. It identifies the effect of several variables on the time a specified event takes to occur. In other words, what is the likelihood that a particular customer will be exposed to a transition event (eg. moving from State 1 to State 2)?

Don’t forget, however, that the subscription model has more than just an active and inactive state — there are many possible states that need to be assessed for risk. With a Cox PH model, each transition (trans) is modeled separately and takes into the account the time since entering a transition state.

R has a very useful engine to calculate separate input statistics for each transition in a Cox PH model. We use a stratified Cox model where the strata is determined by the transition variable. This means that we have 8 separate models for each transition.

The following code models just time and does not (yet) include the impact of any other variables:

c0 <- coxph(Surv(Tstart, Tstop, status) ~ strata(trans), data = msdata_exp, method = breslow)

To incorporate input statistics, we apply the msfit workhorse, which calculates the probability of being in a state given the original state and time spent in each subsequent state. This is the first approach we use when modeling churn across multiple states.

msfit(object = c0, vartype = greenwood, trans = trans_mat)

So what did we find?

This next plot shows the probability that a customer moves from one state to another (also known as the cumulative hazard) with respect to time since the initial subscription began. Keep in mind, however, that this plot is an aggregate of all customers and does not show the impact of a specific variable, such as gender or product type, on the final hazard slope.

From this plot we see that at 1,000 days, customers who initialize their journey in State 1 have a 75% probability of transitioning to State 2 and a 70% probability to State 3.  

We can also explore the probabilities of a state-to-state transition by creating a probability matrix. This snippet shows the probabilities of customers who started in State 1 at the earliest days of their journey (days 0-6) and the end (days 4560 onward).

By day 5 of the customer journey, there is a 99% likelihood the customer will be still in State 1 and a less than 1% probability of being in State 3. As the journey progresses, however, there is only a 16% probability of still being in State 1 come day 4,787. Of crucial importance for the brand is the likelihood of moving to State 5, the end of the relationship: we see a 33% probability of this occurring.

Here’s one more way to visualize the same trend:

The distance between two adjacent curves represents the probability of being in the corresponding state with respect to time.

Adding more precision with covariates

A full model can be calculated with the workhorse coxph function by introducing stratification by transition and including all additional explanatory variables. From this model we can extract the relevant covariates that explain the likelihood of moving between states for a given demographic or behavioral variable.

In this table, a positive covariate indicates an increase in the hazard of moving from one state to an end state. In other words, the larger the covariate, the more exposed a customer is to the “risk” of transition.

With this model we can predict the distribution of states for a given time since beginning the subscription while taking gender, age or discounts into account. For example, imagine two customers with the following profiles:

Customer A Customer B
  • Discount: Yes
  • Gender: Female
  • Joined: 2013-2017
  • Age: Younger than 20    
  • Discount: No
  • Gender: Male
  • Joined: 2002-2007
  • Age: 20-40

Given these customer profiles, Customer B is more likely over the course of its journey with the brand to churn (State 5) or delay the subscription once and reject promotional emails (State 4). Comparatively, Customer A has a 30% likelihood of remaining an active client with no subscription pauses over the course of 10 years, while client B has a 20% likelihood.

Summary

Multistate models are common in business applications. They allow decision makers to see, literally, how states are distributed across a customer journey for a diverse variety of customer segments. Armed with this intelligence, brand decision makers can focus their outreach and acquisition efforts toward customers that have a higher probability of remaining in an active state for a longer period of time. It also shows where in the customer journey customers are vulnerable to churn — which can then be used to implement a strategy to preemptively mitigate vulnerability before it manifests.

Feature selection with the Boruta Algorithm

One of the most important steps in building a statistical model is deciding which data to include. With very large datasets and models that have a high computational cost, impressive efficiency can be realized by identifying the most (and least) useful features of a dataset prior to running a model. Feature selection is the process of identifying the features in a dataset that actually have an influence on the dependent variable.

High dimensionality of the explanatory variables can cause both high computation times and a risk of overfitting the data. Moreover, it’s difficult to interpret models with a high number of features. Ideally we would be able to select the significant features before performing statistical modeling. This reduces training time and makes it easier to interpret the results.

Some techniques to address the “curse of dimensionality” take the approach of creating new variables in a lower-dimensional space, such as Principal Component Analysis (Pearson 1901) or Singular Value Decomposition (Eckart and Young 1936). While these may be easier to run and more predictive than an un-transformed set of predictors, they can be very hard to interpret.

We’d rather—if possible—select from the original predictors, but only those that have an impact. There are a few sophisticated feature selection algorithms such as Boruta (Kursa and Rudnicki 2010), genetic algorithms (Kuhn and Johnson 2013, Aziz et al. 2013) or simulated annealing techniques (Khachaturyan, Semenovsovskaya, and Vainshtein 1981) which are well known but still have a very high computational cost — sometimes measured in days as the dataset multiplies in scale by the hour.

As genuinely curious, investigative minds, we wanted to explore how one of these methods, the Boruta algorithm, performed. Overall, we found that for small datasets, it is a very intuitive and beneficial method to model high dimensional data. Below follows a summary of our approach.

Why such a strange name?

Boruta comes from the mythological Slavic figure that embodies the spirit of the forest. In that spirit, the Boruta R package is based on ranger, which is a fast implementation of the random forests classification method.

How does it work?

We assume you have some knowledge of how Random Forests work—if not, this may be tough.

Let’s assume you have a target vector T (what you care about predicting) and a bunch of predictors P.

The Boruta algorithm starts by duplicating every variable in P—but instead of making a row-for-row copy, it permutes the order of the values in each column. So, in the copied columns (let’s call them P’), there should be no relationship then between the values and the target vector.

Boruta then trains a Random Forest to predict T based on P and P’.

The algorithm then compares the variable importance scores for each variable in P with it’s “shadow” in P’. If the distribution of variable importances is significantly greater in P than it is in P’, then the Boruta algorithm considers that variable significant.

Application

The dataset of interest here were records of doctors’ appointments for insurance-related matters, and the target variable of interest was whether or not the patient showed up for their appointment. Part of our task was to find the most significant interactions, and with fifty jurisdictions and thirty doctor specialties, we already have a space of 1,500 potential interactions to search through—not including many other variables.

The set of features can be visualized by creating a set of boxplots for the variable importances for each potential feature.

The three red boxplots represent the distribution of minimum, mean and maximum scores of the randomly duplicated “shadow” variables. This is basically the range of variable importances that can be achieved through chance.

The blue bars are features that performed worse than the best “shadow” variables and should not be included in the model. Purple bars are features that have the same explanatory power as the best “shadow” variable, and its use in the model is up to the discretion of the analyst. The green bars are variables with importances higher than the maximum “shadow” variable — and are therefore good predictors to include in a future classification model.

Code

show_mm <-
  model.matrix( ~ 0 +
                 `Doctor Specialty` + `Business Line` + 
                 Jurisdiction,
               data = show_df,
               contrasts.arg =
                 lapply(
                   show_df[, c('Doctor Specialty',
                               'Business Line',
                               'Jurisdiction')],
                   contrasts,
                   contrasts = FALSE
                  )
               )

show_mm_st <- cbind(status = show_df$`Appt Status`, show_mm)
show_mdf <- as.data.frame(show_mm_st)

library(Boruta)
b_model <- Boruta(status ~ ., data = show_mdf)

cat(getSelectedAttributes(b_model), sep = "\n")
# Doctor SpecialtyChiropractic Medicine
# Doctor SpecialtyNeurology
# Doctor SpecialtyNurse
# Doctor SpecialtyOrthopaedic Surgery
# Doctor SpecialtyOther
# Doctor SpecialtyRadiology
# Business LineDisability
# Business LineFirst Party Auto
# Business LineLiability
# Business LineOther
# Business LineThird Party Auto
# Business LineWorkers Comp
# JurisdictionCA
# JurisdictionFL
# JurisdictionMA
# JurisdictionNJ
# JurisdictionNY
# JurisdictionOR
# JurisdictionOther
# JurisdictionTX
# JurisdictionWA

## Importance plot 

plot(b_model, las =2, cex.axis=0.75)

 

Segmenting customers by their purchase histories using non-negative matrix factorization

Businesses often want to better understand their customers by segmenting them along a common set of attributes. In a previous post, we explored how to build segments based on customers’ trajectories of interactions with a brand. In this post, we’ll show how to build segments based purely on the products that customers have purchased; this approach has the added bonus of not just segmenting your customers, but your products too! And this isn’t just a strategic intelligence tool—finding groups of similar customers can lead to advanced recommendations systems that are personalized for each one of your customers.

How one can find groups of similar customers that purchase similar products? How do we define what “similar” means in an assortment of thousands (or tens of thousands) of SKUs? Flip the question on its head: how do we find products that are purchased by similar customers? How do we define similarity between customers?

As you’ll see—asking either one of those questions individually is a bit like asking which blade of the scissors cuts the paper. In a product segmentation, customers are said to be similar when they purchase from the same set of products; and products are similar when they are purchased by the same set of customers.

Ok—let’s dive into how it’s done.

We start with what we’re trying to explain, which is our observed customer-by-product matrix:

SKU1 SKU2 SKU3
CUST1 0 1 1
CUST2 0 0 1
CUST3 1 0 0

 

To keep things really simple, we’re going to put a 1 in the cell if the customer has ever purchased that product, and 0 if they never have.

Again, let’s remember that we’re trying to explain this data in terms of customers (rows) and products (columns)—sounds like we’re trying to split this matrix in two! In fact we are—the tool that we use to explain our observed data is called non-negative matrix factorization. It is a group of algorithms that simplify the original matrix of data (let’s call it V) by 2 other matrices (W and H), which, when multiplied together, come to (approximately) the original matrix.

So, let’s say you have 10,000 customers and sell 1,000 products. Your customer-by-product matrix is going to be 10,000 rows and 1,000 columns. But, you could factor this matrix into a:

  • 10,000 row by 2[†] column matrix (W), and a
  • 2[†] row by 1,000 column matrix (H)

†2 is arbitrary—but is typically determined by trying a number of different options

This would mean that for each customer, you have two pieces of information that tell you what kinds of products they purchase (instead of 1,000); and for each product, you have two pieces of information that tell you which kinds of customers purchase them.

 

Link: https://en.wikipedia.org/wiki/Non-negative_matrix_factorization#/media/File:NMF.png License: https://creativecommons.org/licenses/by-sa/3.0/

 

This approach can be thought of as a multidimensional scaling algorithm that has better features than Principal Component Analysis as it is well defined for non-negative values in the data (and counts of purchases are always non-negative).

Working backwards, if you work out the sums, a single cell of the matrix that you arrive at when you multiply the customer- and product-segment matrices together is:

CS1 * PS1 + CS2 * PS2 + … + CS5 * PS5

Where CS1 is that customer’s score for segment 1, and PS1 is that product’s score for being in segment 1—and so on through to segment 5. So if a customer and product have high scores for the same segments, then our factorization is implying that this cell in the customer-by-product matrix has a high value.

Results

Visually, the results can be shown in the form of a heatmap, which shows each customer’s score by each segment (the charts below use the word “basis” in lieu of segment).

The dark entries (in column 2, for example) mean that those customers preferentially buy products from segment 2. And which products are those? Well, we have a corresponding heatmap for our product segments, that looks like this:

Now we have the two pieces of information together—the customers that tend to purchase similar products, and the products that tend to be purchased by similar customers.

With this information in hand, you can start using these scores as the basis for strategic decisions and marketing enhancements, like:

  • Developing product-based customer segments to build need-based personas
  • Deciding which products should be offered together as a bundle
  • Building a product recommendation engine that uses a customer’s segment to determine which products should be merchandised

Code Through

If you’re interested—check out our code-through here.

Challenges

Working through such an analysis, a common challenge is having a very sparse dataset. Typically there are many products, and customers tend to only purchase a few of them. This can usually be addressed by applying some expert knowledge to develop a hierarchy of information about the product—from brand, to style, to size (or SKU), and choosing the appropriate level of the hierarchy to use.

In addition, non-negative matrix factorization takes a number of options and parameters—there a number of different algorithms and choices to make for each. One needs to determine which loss function to use and how to determine the starting state of the matrix in the estimation process.

Not to mention, you have to choose the number of segments for your analysis—this is typically done by trying a range of possible segments and comparing how well they explain the data (by comparing their errors) and how well they perform for you, the analyst. Too many segments, and the information is hard to digest; too few, and you are not explaining the data well.

Finally, the computations are expensive and tricky. We use the NMF library in R, which is as performant and flexible as they come—but even still we often encounter tricky and hard-to-diagnose errors all the time.

Segmenting customer journeys

Segmentation is the art understanding your customer-base so that you may better serve them. It comes from an understanding that customers are diverse, and to serve a large market effectively, you must understand  how customers differ —and which differences are important.

One thing our clients often want to understand about their customer-base is the customer journey: all the phases and key points of a customer’s lifecycle of interactions with the brand.

For example, some customers may start in-store and move exclusively online; others may only shop on mobile; still others may be truly omni-channel and buy online when they’re at work, in-store when they’re in the area, and on mobile when they’re on their commute. Some may become more loyal after redeeming a discount; others less so—and so on. But if you have thousands (or millions) of customers, how can you understand the behavior patterns across your customer base?

How can we do this? Sequence analysis—which, at its core, is analyzing a large set of sequences (one from each customer) and drawing  meaningful statements about which sequences are similar or different from each other.

Continue reading “Segmenting customer journeys”

America is on the move

Wherever I travel, I ask myself if I could live wherever I’ve found myself. Is the food good? How are the parks? Are the people friendly? And most importantly, can I find a job I like and a decent place to live? Now that we are in prime summer vacation season, perhaps other Americans are asking the same question at their holiday destinations.

As the effects of the financial crisis are waning and housing prices have largely rebounded, Americans are again on the move. On top of this, many struggling cities are seeing population growth for the first time in decades and employers are following along.

We’re curious where and why Americans are packing up and moving. With some creative number crunching with Census Bureau migration data and beautiful mapping applications we built a few tools to visualize migration patterns. 

First, we tackle the simple matter of showing which cities are attracting more people than they are losing. Because cities are complex creatures and have an influence far beyond “downtown”, we used migration data from each of the 374 metropolitan areas (called “MSAs”) across the country to calculate the number of people that moved in and out to arrive at a net migration value.

The map below displays net domestic migration for each metro area. There are clear source and destination clusters. Check out the the industrial Midwest and Northeast — massive out migration. Meanwhile, parts of the South, Texas and Pacific Northwest are attracting many new residents from other parts of the country. Don’t forget about Puerto Rico — the debt crisis and economic malaise are driving a lot of Puerto Ricans to the mainland.

What surprises do you see? Columbus, Ohio and Des Moines, Iowa were surely not on my up-and-coming radar although both have large, well-regarded public universities that are attracting the region’s best students and, critically, providing opportunities and amenities to convince them to stay after graduation.

To me, the biggest surprise is the decline in large coastal cities in the Northeast and California. As a resident of Philadelphia and a frequent visitor to cities up and down the Northeast Corridor, LA and SF, the energy on the streets and sky-high housing costs seem to indicate an unstoppable resurgence.

So why are more people leaving large, established cities than are moving in? Well, one hypothesis could be that the sky-high prices and endless energy are deterring more people than they attract. Perhaps people who are moving to cities are wealthier and live in less-dense housing than residents who are leaving. This is definitely a question we will attempt to answer using predictive models and algorithms in the next phase of the project. (We’re really excited about this part!)

Having established which metro areas are growing or shrinking, we really wanted to know: “Ok, so where are Americans moving from?” and the inverse: “Where are Americans moving to?” The interactive map below shows exactly that. Pick your favorite (or least) metro area and select either Incoming or Outgoing migration change. Any surprises?

Most metro areas aren’t attracting too many people from far away. A long-distance move is certainly not the norm. There are a few exceptions, however. Take a look at the incoming migration to Atlanta —  thousands of people are fleeing NYC, DC, LA, SF, and Chicago for Atlanta.

image

Source

This is just the beginning of our look into Americans’ migration patterns. Over the next few weeks we will be modeling the drivers to moving and embark on a deep-dive into one metro area to identify prime business opportunities using a site suitability analysis.

So while enjoying your summer vacation be sure to ask yourself if it’s somewhere you can live. You might end up moving there.


If you want to look under the hood, the source code is available here.

Our Principles

One thing all of us at Gradient agreed on was the need to become a principle-driven company. Why is this important? Principles don’t change, even if everything else does. After about a month of discussion, collaboration, and revision, our team came together and developed eleven principles—and we’re so proud of them we decided to post them on our main website, for everyone to see. Take a look below, and let us know what you think.

Gradient leaders…*

  1. Are honest and integral to a fault: if we say something will be done, then it will be done. We conform our words to reality (honesty) and reality to our words (integrity).
  2. Do more with less: We are frugal and look for ways to avoid spending time and money when it is not needed.
  3. Think win-win: Success is not zero-sum. Our clients, partners, and vendors’ success is our success. We build credible, reliable, and honest relationships with every client, partner, and ally.
  4. Prove themselves wrong: We seek diverse perspectives, look for alternative hypotheses, investigate the details, and stress-test our analyses to ensure that we are right. We never assume we are right.
  5. Are obsessed with constant improvement: Individually, we are always looking to learn new techniques and develop valuable skills sets. We proactively seek feedback to improve our collective performance.
  6. Collaborate and communicate extremely well: We value team contribution over individual contribution. We are excellent team players that go the extra mile to make it easy to work with others.
  7. Deliver results, not work: We don’t value work, we value results. We are always moving toward delivering value to our clients.
  8. Take care of each other: We care for each other’s well-being and celebrate alternative perspectives.
  9. Self-Manage: We take ownership of our work by prioritizing and organizing effectively with our colleagues while acting on behalf of the entire company.
  10. Investigate deeply: We are never satisfied with the first layer of understanding, or fixing symptoms instead of underlying causes. We fix problems so they stay fixed.
  11. Are ambitious risk takers: We push the definition of normal by moving fast and pursuing new, unconventional solutions.

*and we’re all leaders in the company