Building analytics at Cooper Hewitt

Micah Walter, Independent, USA


At Cooper Hewitt, Smithsonian Design Museum, we've been collecting large amounts of data. This data has been typically in the form of Web analytics, ticketing, and CRM data, and more recently data generated by our interactive Pen! As a side project, we have attempted to build a new data analytics tool to allow our staff to look into all of our data sets at once and generate queries and reports in a more holistic and cohesive way by using a technique called "data warehousing."

Keywords: data, analytics

1. Introduction

Cooper Hewitt, Smithsonian Design Museum (CHSDM) is generating and collecting large amounts of data. This data, generated by the use of an interactive pen (the Pen) presents an opportunity for CHSDM to develop a deeper understanding of its visitors. To better understand the data, CHSDM has started to develop new analytics tools to allow its staff the ability to look into all of its data sets at once, generating queries and reports in a more holistic and cohesive way by using a technique known as data warehousing.

This paper will use data collected and generated by CHSDM as a case study. At the time of this writing, The Pen has been issued to over 130,000 visitors and used to collect over three million objects in galleries and through interactive tables. Each time an object is collected, a record is created of the pen that was used to collect it, the time and date it was collected, and whether it was collected by tapping The Pen to a label near an object or through the use of an interactive table.

CHSDM knows which objects are the most popular at any given time, how long visitors spend in its galleries, and many other quantitative facts about visitor behavior it previously was unable to understand. The museum is beginning to develop a deep understanding around the ways its visitors behave with The Pen, and this paper will attempt to explain how it can continue to develop the tools necessary to dig even deeper.

2. Background

For just about as long as museums have been around, they have been in the business of collecting and storing data. Museums collect data about objects, people, and even time. One could argue that a museum is in a way a kind of data warehouse.

Initially, museums focused their efforts on understanding and making accessible their collection metadata. Today, museum collection metadata is pretty common, and more recently, museums have begun to take steps to make this metadata available to the public.

The publication of this data has allows the public, students, scholars, and researchers easy access to the knowledge developed by the museum around its holdings. Building rich collections websites allows this knowledge to be more discoverable, offering new ways of learning.

But collection metadata is only one aspect of the kind of data that might be found in and around a place like a museum. Recently, museums have begun collecting and storing metadata about visits to the museum’s website, about ticket sales and attendance at public programs, and even about the temperature and humidity inside its galleries.

In parallel to this extensive work to generate and collect the data is the notion that there is some sort of underlying knowledge living within. The idea is that if institutions like museums are collecting all of this data, they will one day understand more about the museum and visitors, and eventually be able to be more successful as an institution.

In 2014, CHSDM reopened to the public after a long period of renovation and redesign. To coincide with the museum’s reopening, CHSDM launched a reimagined visitor experience, filled with technology and innovation and built from the ground up. One of the more interesting aspects of this new experience (The New Experience) at CHSDM is certainly The Pen. The Pen is an interactive device issued to each visitor allowing one to collect the things he or she finds interesting and store them on a personal Web page for retrieval at a later date.

There have been many writings, reviews, and discussions about The Pen and its implications within the context of a museum. This paper will address the behind-the-scenes aspects of The Pen, all of the data it and its related system generate and how these sources of data may be used to help the museum begin to ask questions around the behavior of its visitors.

3. Beginning analytics

Web traffic data

Analytics is a set of tools and methods that allows one to understand some aspect of a complex data set. Formally, analytics is the systematic or computational analysis of data or statistics (Google definition). It’s important to understand that analytics is not data, and data is not analytics.

For a long while, museums have been using tools such as Google Analytics to better understand the Internet traffic to their websites. Internet traffic to a museum’s website can offer insight into its audience and offerings, and can help museum staff to make informed decisions and develop a strategy for future efforts. Through the use of Google Analytics, a museum is able to easily understand who is coming to its website, what these Web visitors are interested in, and what they might find frustrating or confusing. Overall, Google Analytics is an incredibly powerful tool that has been used for years to uncover just one aspect of a museum’s operations.

One of the more interesting side effects of the use of Google Analytics is that it has raised the expectation about what is possible. The tool itself is very easy to use and after a short tutorial can prove very enlightening to even the least tech-savvy employee.

Collection data

Another major data set within a museum is its collection metadata. This is typically the data stored in collection databases such as The Museum System (TMS), a powerful collections management tool created by Gallery Systems. These types of databases were originally designed as repositories of information to help a museum store and locate objects in their collections. Collections management systems were originally meant to behave like inventory management applications, allowing a museum to track its vast collections and the movements of all of its objects and holdings.

Eventually, collection metadata became interesting in another way. Museums began to use it as a new way of exploring the shapes of their collections. While an individual collection record might tell the story of a single object in the collection, the total set of data becomes useful in telling the story of the museum’s collection as a whole.

One of the most popular analytics tools for exploring collection data is the museum’s collections website. Museums the world over have gone to great lengths to put their collections online with the usual purpose being to allow easier access for its visitors. The end result is an extensive tool, similar to Google Analytics, which allows anyone to learn about the shape and purpose of a collection by dividing it up into browsable and searchable parts. At CHSDM, this idea of browsability and discovery is at the forefront of the work, often times leading the creative direction across the entire visitor experience.

4. A system of parts

CHSDM has developed a complex system of parts and services that make the entire museum experience possible. Nearly every aspect of this system generates, collects, and stores some kind of data.

The Web servers powering create log messages for every request, registrars record entries in TMS for every movement of an object, and ticketing systems record information about visitors and the tickets they purchase.

The Pen generates data each time a visitor receives one and uses it to collect an object, create a digital design, browse their collection at the digital tables, and finally when they return it to the visitor services staff. Each system stores this data for safekeeping and generates log messages to record the events that happen as they unfold. This data is all stored on a long list of servers and systems, each with its own complex configurations and formats.

5. Application programming interfaces

The nexus of all of the systems involved in the experience at CHSDM is its application programming interfaces (API). These APIs are the glue that holds the entire system together and allows for each service and system to communicate with one another. Tap The Pen to a table, and the data on The Pen is transmitted through that API to a database. Every time that interaction happens, log messages are created and stored.

An API is a set of procedures within an application that can be accessed from another application. One can think of an API like an old telephone operator. The phone is used to send instructions, the operator receives these instructions, and as long as the instructions sent are readable, the operator carries out the call, connecting the caller to whomever they were trying to reach.

In the case of the Pen, CHSDM has developed several APIs that allow all necessary components to communicate, so that when a visitor taps their pen at one of the digital tables, the call gets through to its expected destination, and the visitor is instantly connected with the right data on the other end.

Every time this interaction happens, the API makes a record of it in a database. Every time a visitor docks their pen at one of the digital tables, there is a log recorded. Each time an object label is tapped with a Pen and the Pen is read by one of the reader boards, another API method is called, more data is stored, and a log recording this event is saved in a database.

CHSDM now has several APIs. There is an API that deals with the purchasing of tickets at the visitor experience desk and online. Once a ticket has been purchased, the visitor experience staff “pairs” your ticket with a Pen. This act is critical, as it makes sure our systems know which Pen the visitor was using. The pairing procedure calls both the ticketing API and the Pen API, and connects these two sets of data together by storing their respective IDs next to each other in both databases.

6. Foreign keys

CHSDM now has two separate databases (one having to do with tickets, and one having to do with The Pen) in two separate systems. The only connections between these two systems are the ID of the ticket and the ID of the Pen.

The data is stored in completely different physical locations and with completely different underlying systems (MySQL vs. MSSQL), but the connection between the two is there, and this means CHSDM can theoretically connect the two data sets together if it wants to.

In databases, this is called a Foreign Key, and it is one of the most common methods for creating relationships among data both inside the same database and between completely separate databases.

7. Log files

Mentioned previously, the systems and services at CHDSM create millions of log messages. These messages are written as files on disks and are generated for a wide variety of reasons, including the tracking of events happening with the Pen.

As an example, Diagram 01 illustrates a typical log message created when a visitor returns their pen at the end of their visit.

Diagram 01Diagram 01

The first message, which is collapsed in Diagram 01, is the “pen-return” log, indicating that the pen was returned and all the details about its safe return. The second message, also illustrated in Diagram 01, happened just before the “pen-return” and is an “item-collect” event. This refers to an object that the Pen was used to collect. In the boxes below, most of the details (some blurred for privacy reasons) about the item that this Pen was used to collect are visible. The system knows which object it collected (18796987), what time the object was collected, and which visit it has to do with (also blurred). These log messages are stored on a server within CHSDM’s data infrastructure, but as illustrated here, CHSDM has developed a simple interface so staff can view these logs at any time from an administration area on one of its websites.

At the digital tables, a near field communication (NFC) reader board downloads all the data off the visitor’s Pen. This data is then formatted and sent to the Pen API, which processes the data and stores it before responding by displaying the items that the visitor collected on the screen in front of them. This operation invokes several log messages, recording every step in the process. If something were to go wrong during one of these steps, CHSDM staff would be able to see an error message like the one in Diagram 02, which would allow staff to diagnose the problem with the underlying system.

In this case, the API responded with an “invalid user” error. This was probably a configuration problem with one of the underlying systems. A log like this can be used to diagnose such issues, and the ability for CHSDM staff to look at this data from within an administration panel makes it possible for staff to quickly resolve these kinds of problems.

Diagram 02

8. Logstash

CHSDM has begun to develop millions of log files stored on a multitude of systems. Log files are typically stored on the systems that generate them, and CHSDM has many of these systems. Log files are stored on server instances within the Amazon Web Services cloud, our data center in Herndon, Virginia, and locally on PCs and Raspberry Pi computers in numerous locations throughout the museum.

In order to better deal with all of these logs, CHSDM has employed an open source system called Logstash. Logstash works a conduit that log messages travel through. As systems generate log files, they are sent to Logstash, which formats and processes into a variety of outputs.

Logstash can be highly customized, allowing CHSDM to store copies of its logs in multiple locations and formats. At CHSDM, logs get processed into text files. Additionally, each log message is formatted into JSON, which is then inserted into an ElasticSearch index for ease of access and searchability. This ElasticSearch index is used to create the administration pages illustrated in Diagrams 01 and 02. These administration pages, leveraging the data generated by the logs, have become one of the main diagnostic and analytics tools for the entire system at CHSDM.

Diagram 03Diagram 03

Logstash comes packaged with an application known as Kibana (illustrated in the Diagram 03). Kibana is an analytics tool that allows one to search through data stored in an ElasticSearch index. At CHSDM, Kibana has proven to be a very useful tool. It is easy to set up and a great way to develop ideas around how one might use log data to gain insights.

However, CHSDM eventually found Kibana a little limited in what it could do. It also became troublesome to expose Kibana on the public Internet, allowing CHSDM staff to log in from anywhere to investigate a problem. CHSDM staff, in order to take control over how it analyzes and accesses its logs, decided to begin building administration pages that would allow one to view the data stored in its Logstash-generated ElasticSearch data set.

Below is a series of diagrams of example log pages. CHSDM has built in graphic functions to provide a simple data visualization of what has been happening over time. CHSDM staff plan to expand the facility of these administration pages in future iterations.

Administration diagrams

Diagram 04Diagram 04

Pictured in Diagram 04 are two graphs representing “visits.” The top graph is “visits by day,” where the green line represents the past twenty-eight days, and the blue represents the twenty-eight days prior. Pictured below are visits by hour. With this simple diagnostic tool, CHSDM staff can easily see which days and times are the most active in the galleries.

Diagram 05Diagram 05

Pictured in Diagram 05 is a series of useful administration charts. CHSDM staff can see data in these charts having to do with objects being collected each day. Staff members can see in the top graph objects collected and created overall by day. Below, data is broken out into “collected with the Pen” and “collected at the tables.”

Diagram 06

Using Diagram 06, CHSDM staff are able to look at objects collected by hour. In this example, staff can see that on January 16, at 20:00 UTC (3:00 p.m. in New York City), visitors used their pens to collect and save 4,772 objects. Studying the diagram closely, staff can see that the next day, on January 17 at 19:00 UTC (2:00 p.m. in New York City), the museum had its daily peak. It is possible that this was the busiest time during these two days, explaining the peak, but it allows CHSDM staff to start asking questions. For example, why is this the busiest time? Do these peaks correspond to ticket sales? As illustrated here, CHSDM log files are starting to give staff insight into its visitors’ behavior.

9. Monitoring and notifications

Log messages allow CHSDM staff to know when something has gone wrong. Viewing the figures above, CHSDM staff can easily tell when things are working properly. Just as easily, CHSDM staff can start to notice when things are not working the way they should.

CHSDM staff use numerous services and tools to quickly find out about problems and respond to them (and hopefully fix the problem) as soon as possible. Whenever something fails, it usually means visitors are stuck waiting, becoming frustrated, and in general not having a great experience. These are some of the tools that help CHSDM staff alleviate those frustrations:

  • Supervisord: Monitors many of CHSDM’s services. If something has stopped, it tries to start it back up again.

  • Watchdog: If the service can’t be restarted by Suporvisord, watchdog sends messages. These messages take the form of logs, but also alerts that wind up in CHSDM Slack channels.

  • New Relic: This service monitors the health of CHSDM applications and servers. CHSDM staff can log into a dashboard and inspect all the data having to do with nearly all components of CHSDM systems. If a server is straining to keep up with its load, New Relic sends an alert to Slack so CHSDM staff are alerted to the problem.

  • Pingdom: This service continuously checks if a website is working. If it’s not, this service sends CHSDM staff a text message and posts a message to Slack.

  • Slack: Slack is a tool CHSDM staff are using for internal communications. Additionally, Slack has become a hub for CHSDM staff alerting by many of the systems listed above.

10. Re-play and failing gracefully

Another incredibly useful aspect of logging data has to do with the concept of re-play and graceful failure. In a complex system with many moving parts, it’s only a matter of time before something breaks. Typically it’s one small piece of the larger complex system. It would be a shame that one single issue would result in CHSDM visitors not being able to have an enjoyable experience, so systems are designed to do their best to “fail gracefully.” This means that if one small part of the system becomes temporarily unavailable, the rest of the parts can continue to operate. Since each system produces log messages for every action it performs, the part that was temporarily unavailable can usually be “re-played” once it has recovered.

The following example will be used to illustrate this concept.

A visitor arrives at the museum and purchases a ticket. This invokes a series of requests to a collection of different services. First, the ticket is created using a Constituent Relationship Management (CRM) system known as Tessitura. Once the ticket has been purchased and printed, it is scanned and tested to ensure it is valid. Then the ticket is “paired” with a pen, which is handed to the visitor for the duration of their visit. If during this process there is a failure with the system that pairs the visitor’s ticket with their Pen, a problem would occur that would normally not allow the visitor services staff to issue the Pen. This could potentially cause a backup at the visitor services desk, resulting in many frustrated customers.

Instead of issuing an error, the system simply writes the log message to disk and allows the system to carry on as if everything has worked. Later, when the pen pairing API is back online, the system can go through these log messages and “re-play” the events that failed to work, ultimately presenting the visitor with a seamless experience.

However, this scenario doesn’t always work as planned. For example, in this scenario, if the visitor’s pen was not successfully paired with their ticket and they went to visit their personal website, they wouldn’t see any of the things they’d collected during their visit. The experience they are expecting will eventually become available to them, ideally before they check, at some point in the future.

Diagram 07Diagram 07

Diagram 07 illustrates what this might mean. In this example, one can see that a Pen was used to collect object 18383769. If the visitor got home and didn’t see this object on their website as expected, they would hopefully send CHSDM an email explaining their issue. Because CHSDM has a log of this event taking place, CHSDM staff are able to press the “Re-Play This Activity” button, which would reprocess this event, thus allowing the visitor to see the expected result.

Without this type of intensive logging, CHSDM wouldn’t be able to re-create the experience. In this instance, CHSDM has the potential for a visitor to become temporarily frustrated, but has also ensured that it can eventually alleviate this frustration by re-playing the events and producing the expected outcome.

11. Constituent relationship management

Within the Smithsonian Data Center is housed CHSDM’s CSM system, Tessitura. Tessitura is an enterprise-class Microsoft SQL Server based database. Client software is installed on staff members’ desks and at the visitor services stations at the front desk at CHSDM.

The main purpose of Tessitura is to store data about each visitor to CHSDM. This data takes the form of address and contact information, memberships and subscriptions purchased, and tickets purchased for general admission to the museum.

Tessitura is an incredible source of information about CHSDM’s constituents and should be treated with white gloves, making sure that CHSDM and Smithsonian follow strict protocols. It’s critical that CHSDM not only ensure the safety and privacy of its visitors’ data, but also maintains the privacy agreement between CHSDM and its visitors.

Since Tessitura stores personally identifiable information (PII) such as a visitor’s address and phone number, this system is kept in a separate “zone” within the Smithsonian Network, where every aspect of the network and servers within this zone are scrutinized and kept in compliance with Payment Card Industry (PCI) standards. Major aspects of CHSDM data are stored in entirely different physical locations, using entirely different system architectures.

All the data and logs about a visitor’s Pen activity are stored in the Amazon Web Services (AWS) cloud. This data is stored in MySQL databases, along with log files, which are stored in multiple locations and eventually processed with Logstash, which makes them neatly available to CHSDM in an ElasticSearch index, also built on AWS infrastructure.

PII data generated by Tessitura is stored in the PCI-compliant zone of the Smithsonian Data Center. This data is mostly stored in a Microsoft SQL Server database.

How can CHSDM begin to make sense of all of this data if it’s in such a wide variety of forms, and living in such different physical locations? How can the museum connect activity with the Pen with its information about the visitors who are using the Pens? How can CHSDM develop an understanding about visitors who go to their websites after their visit, when that data is stored in Google Analytics and the Pen data and ticketing data is stored elsewhere? How much easier would all of this be if the data were stored all together and in one single format, allowing CHSDM to look at it all at once?

12. Building the warehouse

Data warehousing is not a new concept. The idea of gathering together data sets into a single place, using a single architecture, is pretty common. What is new is the ability to build a data warehouse completely from scratch, scalable at any time, using cloud-based services, with the push of an administration console button.

For CHSDM, the staff chose to experiment with Amazon’s Redshift as its data warehouse. “Amazon Redshift a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools.”

In practical terms, Redshift is a fork of Postgresql, a powerful and common open source relational database system. However, Redshift turns things around a bit by redesigning the underlying architecture of Postgresql so that it acts as a columnar data-store. This means that one can use any Postgresql client (such as psql) to connect to a Redshift server.

Starting up a Redshift cluster takes all of five minutes. Users simply need to log into their AWS consoles, navigate to the Redshift panel, and select from a few options. Like most of the offerings from AWS, users can easily scale their Redshift clusters later if they discover they need more resources. Setting up and connecting to Redshift is the easy part.

13. Data transfer and ingestion

CHSDM experienced difficulty in getting all of its data into Redhsift, getting it there safely and securely, and getting it there periodically. Amazon offers a number of tools and tutorials to make this part easier, but this has been one of the biggest hurdles to overcome with using this type of system so far.

The steps are:

  1. Identify the data source
  2. Map the data source to a Redshift-compatible schema
  3. Export the data to a CSV file
  4. Upload the CSV file to Amazon S3
  5. Use the Redshift COPY command to import the CSV from S3

Diagram 08Diagram 08

There is a good deal of engineering to be done around the five steps listed above. Mapping data from multiple databases and log file systems is complex and requires a lot of time and typing. To automate the process defined above, CHSDM staff have developed simple shell scripts that can be run on a nightly basis. These each connect to read replicas of live data, which mean that there isn’t an issue with the exports disturbing the live environments. Pictured in Diagram 08 is an early prototype of a shell script for doing the steps listed above. A few details are blurred out for privacy reasons.

CHSDM staff have been experimenting with one-directional hashing of values such as user ids and personal information using bcrypt and other cryptographic techniques. These techniques work but can be resource intensive. Getting all the above setup is tedious and is still largely a work in progress at CHSDM.

14. Visualizing the warehouse

Immediately, one can query Redshift using a simple Postgresql-compliant client like psql. psql lets you connect to a Redshift cluster and query the data using standard SQL. The results are sent back in text form. It’s typically the first tool one might use to test a connection and start building basic queries.

However, it’s pretty easy to see that writing SQL on the command line and making sense of results from Redshift can be a little bit of a hassle. Additionally, while it’s pretty easy for a developer or data analyst to log into Redshift via the psql client, it’s pretty much impossible to imagine any other staff member doing the same. The next step in building a data warehouse is to attach some kind of analytics tool on top of it that makes the retrieval of data, generation, and sharing of reports much easier.

There are a number of off-the-shelf solutions for working with data in a warehouse like Redshift. CHSDM has taken a close look at many of them.

  1. Periscope is a powerful set of business intelligence tools that can connect to a wide variety of data sources, including Redshift. It allows you to build SQL-based queries through a Web interface and chart results using a series of customizable chart types. The reports can be shared with colleagues and easily updated to reflect live data.
  2. is very similar to Persicope.
  3. is also very similar to Periscope and, but a little more simplistic and much less expensive. After a trial period, CHSDM found Mode to do most of what it wanted to accomplish. The main feature CHSDM was interested in—being able to design an SQL statement and then export the results to Excel—seemed to work perfectly out of the box.

Below is a series of diagrams illustrating some of the reports CHSDM has generated using Mode, connected to its RedShift cluster on AWS. These are mainly first efforts and prototypes, but they illustrate the scenario CHSDM is trying to achieve.

Diagram 09Diagram 09

Popular Objects

One of the first thoughts CHSDM had was to report on the most popular objects collected with the Pen. “Popular Objects” quickly became the topic of many meetings, and it is now available via a public statistics page.

Pictured in Diagram 09 is how Mode can simply generate a table of data. This data can easily be exported to a CSV file or Excel spreadsheet.

Pictured in Diagram 10 is the SQL used to generate this report. It’s pretty straightforward, but the main point is the INNER JOIN on collection_objects and “q,” which in this case refers to collection_visits_items. Within our current topology, these two tables are stored in different databases, making a simple join like this impossible.

Diagram 10Diagram 10

Collected Objects By Department

Here, CHSDM staff have grouped collected objects by department, as seen in the Diagram 11. Since the warehouse joined this data with collection data, it’s possible to read the full-text description of each department within the report.

Diagram 11Diagram 11

Collected Objects By Country

Here is another report based on the same data and joined with the collection data. In this case, CHSDM is able to group the collected objects by their country of origin. This type of report can be revealing and helps CHSDM staff to see how its visitor behavior, combined with its curatorial voice, starts to play out and display bias toward one thing or another.

Diagram 12Diagram 12

Below is the simple SQL required to make this query in Redshift via Mode.

Diagram 13Diagram 13

Digital Creations

CHSDM has created a report having to do with Digital Creations, or the things its visitors have created using the interactive tables. This report only uses the data stored in the Pen database, but one can see how CHSDM used simple SQL to modify the report right in Mode, so it has labels on the axis and skips unnecessary items.

Diagram 14Diagram 14

Diagram 15Diagram 15

Diagram 16Diagram 16

Finally, the Diagrams 17 and 18 illustrate some of the benefits of using a product like Mode. Here one can see it is possible to easily share reports with other staff members and create a “portfolio” of reports for easy access and sharing among staff.

Diagram 17Diagram 17

Diagram 18Diagram 18

15. Further experimentation

So far, Mode has proven very effective as a tool for visualizing the data stored in the CHSDM Redshift data warehouse. It has plenty of simple-to-use tools and, if anything, makes it really easy to export data to a CSV or to Excel. Mode has proven useful as a prototyping and exploration tool. It’s easy to try out new ideas for queries and see the results in a graph right away. This type of playful experimentation is what usually leads to discovery of more and more insight into the data at CHSDM.

There are of course many other tools CHSDM would like to explore. Microsoft Power BI seems to offer a variety of similar tools and utilities, but CHSDM has yet to figure out how to connect up Power BI with its Redshift warehouse. Tessitura’s T-Stats is, in itself, essentially a data warehouse. It seems a little like swimming upstream to try and imagine a method of getting all of CHSDM’s external data into Tessitura, but the one upside would be that it is already stored in Smithsonian’s PCI-compliant zone.

Mainly, CHSDM staff are interested in using tools like Mode to prototype and develop queries quickly. Ultimately, CHSDM would like to develop code within the same administration areas of its collections website mentioned in the beginning of this report. The idea would be that CHSDM would have all of its diagnostics and analytics tools and data available in one convenient place. CHSDM staff would be able to have easy access to these administration areas and could carefully control their design and output. So far, CHSDM staff have developed simple connection code between its admin areas and Redshift and a few simple report pages, but this work is in its infancy.

16. Conclusions

Data warehousing can be an effective method for building analytics within an institution. The concept of collecting all types of data and putting that in a single place where the institution can have easy and secure access seems like the way forward.

Migrating data in and out of the warehouse can be time consuming and inaccurate. Steps need to be taken to ensure the data is mapped between its original source and the warehouse in a way that ensures this accuracy. As well, data sent to the warehouse needs to be securely transferred and stored. The periodic nature of these types of data transfers means that data within the warehouse will always lag behind the real-time data being generated on a daily basis.

Once the data is in the warehouse, its can be easily manipulated, visualized, and used to express the answers to many questions. With the warehouse in place, and analytics tools attached, it is possible to prototype visualizations of vast amounts and types of data. This scenario can have a positive impact on museum staff interested in developing questions around visitor behavior.

CHSDM staff are embarking on a yearlong study of visitors and behavior related to the Pen and interactive experiences within the museum. Using a data warehouse to store results from these types of surveys should allow CHSDM staff to better analyze results, pairing the raw data from the study with Pen data, ticketing data, and more, eventually providing a clearer picture of the visitor experience from start to finish.


Cooper Hewitt – New Experience ( )

Data Warehousing – Wikipedia ( )

The API at the center of the museum – Cooper Hewitt Labs, Seb Chan ( )

Foreign Key – Wikipedia ( )

Near Field Communication – Wikipedia ( )

Payment Card Industry standard – Wikipedia ( )

bcrypt – Wikipedia ( )

Cite as:
Walter, Micah. "Building analytics at Cooper Hewitt." MW2016: Museums and the Web 2016. Published February 6, 2016. Consulted .