Setup
Overview
Time: minObjectives
Install and authenticate Tableau subscription
Install and setup Zoom if needed
Download data files for this workshop
Software setup
Tableau
There are three steps to take for installing Tableau Desktop Version. First, you must request a free student license (access lasts one year) and then install Tableau software and activate it for use using the student license.
Getting a Student License for Tableau
For activating the tableau with Free student key, you will need to register on the below link using university ID:
https://www.tableau.com/university-students
Click on Get Student License for free.
Instructions to Download Tableau Software
We can install the Tableau Desktop version from the link below: https://www.tableau.com/support/releases
- Open the Product Downloads and Release Notes(Link opens in a new window) page and select Tableau Desktop or Tableau Prep Builder from the list on the left side of the page.
- Click on the version you want to install.
- In the Download Files section of the page, click on the installer for your operating system to download it. Select Windows or mac according to your requirements.
Activating Tableau
After installation is complete, you will need to activate and register the product to begin using it.
Fill out the fields on the registration form and then click Activate Tableau.
Or, activate with a product key: Use this option if you are activating using the user interface and have a product key from Free student version product key.
All set you should be set to use Tableau Desktop version.
Run the installer and follow the prompts.
Open the Disk image file (.DMG) and double-click the installer package (.PKG) to start the installation.
Install the videoconferencing client
If you haven't used Zoom before, go to the official website to download and install the Zoom client for your computer.
Set up your workspace
Like other Carpentries workshops, you will be learning by "coding along" with the Instructors. To do this, you will need to have both the window for the tool you will be learning about (a terminal, RStudio, your web browser, etc..) and the window for the Zoom video conference client open. In order to see both at once, we recommend using one of the following set up options:
- Two monitors: If you have two monitors, plan to have the tool you are learning up on one monitor and the video conferencing software on the other.
- Two devices: If you don't have two monitors, do you have another device (tablet, smartphone) with a medium to large sized screen? If so, try using the smaller device as your video conference connection and your larger device (laptop or desktop) to follow along with the tool you will be learning about.
- Divide your screen: If you only have one device and one screen, practice having two windows (the video conference program and one of the tools you will be using at the workshop) open together. How can you best fit both on your screen? Will it work better for you to toggle between them using a keyboard shortcut? Try it out in advance to decide what will work best for you.
Data files:
Please download the following file(s) to particpate in the workshop:
Link to Data Set: https://uofi.box.com/s/6q4p6u601z28b5iacmmydbuw3p6r0ezm
Link to Tableau File: https://uofi.box.com/s/v7gfpbwsxu1zng4f2t30albt56cfv9lk
About the Data Used in this Workshop:
The dataset used is a sample dataset from Tabelau. It is a basic data collection of US supermarkets.
Let us understand the attributes of this data set:
- ‘Row ID’ - This is nothing but Serial No.
- ‘Order ID’ - ID created when a product order is placed.
- ‘Order Date’ - Date on which a customer places his/her order.
- ‘Ship Date’ - Date on which the order is shipped.
- ‘Ship Mode’ - Mode of shipment of each order.
- ‘Customer ID’ - ID assigned to each customer who places an order.
- ‘Customer Name’ - Name of Customer.
- ‘Segment’ - Section from where the order is placed.
- ‘Country’ - Country details of this data set. We are looking only for US store data.
- ‘City’ - Cities of US are listed here.
- ‘State’ - States of US are listed here.
- ‘Postal Code’ - pin code
- ‘Region’ - grouped into region wise
- ‘Product ID’ - Product ID of each product
- ‘Category’ - Category to which each product belongs to.
- ‘Sub-Category’ - Sub-Category of each Category
- ‘Product Name’ - Name of products.
- ‘Sales’ - Selling Price of each product.
- ‘Quantity’ - number of quantity available for a particular product.
- ‘Discount’ - Discount available on each product.
- ‘Profit’ - Profit gained on each product.
Key Points
Introduction
Overview
Time: 0 minObjectives
Creating Relationships
Performing Calculations
Creating Advanced Charts
FIXME
Key Points
Creation of Relationships, performing Calculations and Advanced Charts explained in subsequent lessons.
Relationships in Tableau
Overview
Time: minObjectives
How to build relationships in tableau
- Tableau allows combining more than one table from the dataset. There are two ways in which we can accomplish this –joining the tables or establishing relationship between the tables.Establishing a relationship is more flexible as compared to joins is because you do notneed the specify the join type. Tableau creates a relationship based on matching column names or existing key constraints. It adjusts the join and preserves the LOD (level of detail) of the data. Relationships are dynamic, tables are not merged at data source level but are joined automatically, based on fields in use. The difference between joins and relationship is that –when you create a relationship, the tables are combined at the logical layer whereas when you create a join tables are combined at the physical level. Hence relationships are flexible,and joins are static.
- Let us start building relationships.Start by importing data in Tableau. The Data Source page will have all the tables from the uploaded dataset.
- Now drag the first table that you require on the canvasand you will see all the rows and columns populated.Drag the second table and you can see a connection between the two tables. You have now established a relationship between two tables.
- The relationship is created by connecting common columns from both the tables. You can add more fields/columns you want to connect.The performance options include selecting cardinalityand referential integrity. If you are unsure about any of the parameters,you can stick to the defaults.
- You can establish relation between multiple tables.
-
Suppose you have two tables, employee table with employee_id as the primary key, employee_names, employee_address and employee_hiredate and department table with department_id, department_name and worker_id.Now in these two tables, employee_id and worker_id columns are the same, but they have different column name. In that case, you will have to choose the columns you want to join on. That is,you will have to select employee_id and worker_id to establish the connection.
-
Now, let us say you want to create a join rather than a relationship. In that case double click on the orders table. This will give you access to the physical layer. Now drag the table you want to join.
- To change the type of join, click on the join icon (venn diagram) and you will have all the join options.
- Let us look at all the measures and dimensions populated after we have established the relationship.We can see that the measures and dimensions have been split up table-wise.
- Whereas if we perform join, we can see all the dimensions from both the tables and all the dimensions from both the tables.
Key Points
Performing Calculations in Tableau
Overview
Time: minObjectives
How to perform calculations in tableau
Sometimes, the data lacks some fields that are necessary for analysis. In that case, we can createa new field(column)by performing desired calculations. Those fields are called calculated fields.The calculated field could be a row level calculated field or a column level calculated field.This new calculated field created is saved to the data source.
The 3 broad categories of calculations include – Basic Calculations
- Level of Detail (LOD) expressions
-
Table Calculations
- Basic Calculations - Basic calculations can further be divided into Row-level calculations and aggregate calculations.
1.Row level Calculations –When we apply row level calculation, results are generated for every line in the datasource.
Let us see few examples –Transform/Split –Using split we can split strings and perform string manipulation. Click on customer name, transform, and split. This will split the customer’s name in two fields (first name, last name).
We can also splitthe string based on a delimiter/separator.Let us split the order id column to obtain only the numbers in the id.
The column has a delimiter ‘-‘ in the format. We specify that as the delimiter. Next, we want all the numbers in the id which are at the end of the string. We select split off lastoption.A new calculated field has been created with the number string of the order id for each row in the data source.
Finding strings/pattern -We can also spot substring or strings in the column. We will try to see product ids string that contains “TA” in it. Create a new calculated fieldwhich mentions the keyword contains. This will search for all the rows containing TA as a part of the string and returns a true value for them.
If/Else Conditional Calculation –Let us create a calculated field to see if the sales are profitable or not.
We add order id to the row shelf and profits to text marks card. Next, we add the calculated field created to the row shelf. The output is as shown.
-
Aggregate calculations
-
Generally, adding any measure to row or column shelf aggregates the measure. But we can create calculated field and perform aggregation on the rows in the data source.Let’s count the number of customers. This aggregation will give the count of customers against the dimension we choose in the visualization. Let us add region to the row shelf, profits,and the calculated field (Count_Customer) to the column shelf.
The visualization gives the profits across region and count of customers for each region.
- Level of Detail expression
Allows calculations at data source as well as visualization level(just like basic calculations). But provides more control over the level of granularity. The level of detail could be high-level details (less granularity, summary, more aggregated) or low-level details (more granularity, more details, less aggregated). To understand the how the high-level/low-level details work, let us take an example-
- Drag the sales to the text marks card.We can see the sumof sales of all the products. This has less granularity and more aggregation. This gives the high-level picture of sales for the orders table.
- Add region to the row column shelf. This adds a level of details because now we are viewing the sum of sales across the 4regions.This has added a little more granularity and little less aggregation as compared to the first step.
- Add states to the row shelf. This is a more granular visualization as compared to step 2. So, we started from a high-level visualization in step 1 and kept adding more details to reach to a low-level detail visualization.
There are 3 types of LOD expressions:
- Fixed
- Exclude
- Include
Fixed expression is independent of view. When we use exclude expression, it is like removing some dimensions from the view. When we use include expression, it is like adding dimensions to the view. Syntax –{Type of expression [Dimension list] : Aggregate}
-
- Fixed –It returns a scalar value. It is independent of the dimensions in the view. Let us create a fixed expression field. Start by adding sum of sales to the text marks card.
Next, create a calculated field with the fixed expression for sum of sales.
We created a grand total field in the measures. Double click on the fieldfrom the measures.
This gives us the grand total and sales. Move the measure names from rows shelf to column and add region to rows shelf.
Here we are viewing sales across all the regions. Now remove region and add category from the dimensions. We can see that the grand total is the same irrespective of the whatever dimension we add to the row shelf. The fixed grand total value is independentof the dimensionsas the calculation does not contain any dimension. Let us try and add filter on the region. We can see that, even if we unselect some of the regions the grand total is unaffected.
Exclude LOD –This type of calculation is sensitive to dimensions. Exclude does not return a scalar value (unlike fixed LOD), it returns an aggregated value.It gives dimension in the view minus what is in the expression.Let’s create some visualization using exclude calculation.
In the above visualization, we exclude region, so now if we add region to the filter, the grand total will change when we select/unselect some regions.
Let us add another dimension to the row shelf and observe the values. The exclude calculations values are now changed. Adding another dimension has affected the calculation values. We can add more than one dimension in the exclude LOD calculation. Edit the exclude calculation and add category and sub-category to the exclude calculation. Remove region.
IncludeLOD –These calculations will include dimensions in the view and the dimension mentioned in the include expression.The results are aggregated.Let uscreate a table calculation to understand the topic.We create the calculated fieldas follows. We are including sub-category in it.
Now add the calculated fieldthat we created.
Now the visualization shows maximum of sum of sales as the total for each category. Now we remove sub-category from the row shelf.
So now we don’t have sub-category in the visualization but since it is a part of the calculated field, we still have the max sum of sales of the sub-category for every category.
- Table Calculations Table calculations are the transformation that you apply to the values in the visualization. They are calculated based on what is present in the visualization.So, basically when we create a visualizationand then add the table calculation. They are added on top of the aggregated function used in visualization. Three things that will affect the table calculations are – Layout
- Direction/ addressing
- Scope/ partitioning Let us start with creating a visualization to understand it better. Add Region to the row shelf and order date to the column shelf. Add sum of sales to the text marks card.Click on totals in the analytics tab to get the grand total.
Now when we click on sum of sales, we can see a quick table calculation option. And we have all the commonly used table calculations like running total, percent total etc. So,we are applying aggregation/calculationto the already existing aggregation in the visualization. Select percent of total from the options.
After applying percent of total, the original values in the table are changed to the percent values. Now let us see how the three aspects of table calculation –layout, direction and scope affect the calculation.
Table calculations are sensitive to the layout of the visualization. If we add another dimension let us say sub-category to row shelf and try and move the region dimension from row to column shelf, we see that the values in the tables change.Next, we can see that the percent of total is being calculated from left to right.
That means the calculation is sensitive to the direction of consideration.We can have the calculation from left to right or from top to bottom. To change the direction, we can select an option as follows.We can also see options like cell, order date and region which gives us the scope/partition for the calculation.
Key Points
Building advanced charts in Tableau
Overview
Time: minObjectives
How to develop advanced charts in tableau
- Lollipop Chart – Let us start by adding sub-category to column shelf and sales to the row shelftwice.Select dual axis.
You will get a chart like this.
Next, select bar chart from the marks card sum of sales.
Adjust the size of the bar as follows.
Similarly, you can adjust the size of the bubble. To add labels to the visualization, click on marks card and select show marks label.
To categorize it according to the category, add category from the dimensions to colors marks cardfor the sum of sales.
- Word Cloud – Add sub-category to text marks cardand profits to size marks card.
Add category to colors marks card.
- Donut Chart – Start by selecting pie in the marks cardand ship mode to color.Addsales to angle. The chart now shows the ship mode by sales.Fit the chart in entire view.
To create a donut chart,we now need to duplicate thepie chart. To do so double click on column shelf and type in 1. This will create a fake axis. Change the aggregation to minimum. And create a duplicate of the fake axis by pressing control and dragging the axis in column shelf again.
Increase the size of the pie chart through the size marks card. Remove ship mode from the color in the second min(1) marks card.
Decrease the size of the pie and change the axis to dual axis in the column shelf.
Next step is to synchronize both the axis and remove the header(uncheck the show header option).
Change the color of the center part as whiteand add sales to the label marks card.
Now if we add region to the column shelf, we will have multiple donut chart.
- Waterfall Chart – Start by creating a calculated field which has negative profit values in it.
Add sub-category to columnshelf, profits to row shelfand negative profits (calculated field) to the size marks card.
From sum of profits in the rows column select running total from the quick table calculations.
Select Gantt Bar from the marks card.
- Funnel Chart – Start by adding sub-category to rows shelf and sales to column.Arrange the output in descending order. Duplicate the sales in column shelf.
Click on edit axis on the first chart and select scale as reverse.
This gives us a funnel chart.
Key Points