SQL & Tableau: Data Analytics For an Online Retailer Part 1
Background
I completed Maven Analytic’s ‘Advanced SQL + MySQL for Analytics & Business Intelligence’ course. During this course there were 2 projects we were to complete to get simulated experience with not only the advanced SQL skills taught in the body of the course but also in the context of a business setting. The relational database, mavenfuzzyfactory, was imported into MySQL Workbench and used throughout the course.
I played the role of a data analyst for a new online retailer, Maven Fuzzy Factory. They make stuffed animals. In this database, we had info dealing with website sessions, website pageviews, products, orders, order items, and refunds. The board of directors and executives needed answers to some questions to see if the business was growing and doing so efficiently.
The analysis was performed in MySQL Workbench and the reporting as
follows was done using an R Markdown file in R Studio by connecting to
the database using the RMariaDB
R Package and using SQL chunks.
SQL Concepts Implemented
- The Big 6:
SELECT
–>FROM
–>WHERE
–>GROUP BY
–>HAVING
–>ORDER BY
- Aggregation (
COUNT
,SUM
,MAX
, ect.) LEFT JOIN
VIEW
’s and SubqueriesCASE
andCOUNT(DISTINCT CASE WHEN...)
(count-case pivoting)
Midterm Project
Note: These queries will show only a limited number of rows for the sake of seeing the structure of the output. They are not the full result of the queries in terms of number of rows
Some of the types of inquiries and requests I recieved involve:
- Analyzing Traffic Sources (Conversion Rates, Bid Optimization, Traffic Source Trending)
- Analyzing Website Performance (Analyzing Top Pages & Entry Pages, Bounce Rates & Landing Page Tests, Landing Page Trend Analysis, Conversion Funnels & Testing Conversion Paths)
Objective 1: Order Conversion Rate Trending
Pull monthly trends for ‘gsearch’ sessions and ‘orders’ so that we can showcase Order Rate growth (In terms of Website Session volume, Order volume, and Conversion Rate)
(Note: ‘gsearch’ is a UTM parameter that indicates if a potential customer found the website via the ‘gsearch’ search engine)
SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,
COUNT(DISTINCT website_sessions.website_session_id) AS sessions,
COUNT(DISTINCT orders.order_id) AS orders,
COUNT(DISTINCT orders.order_id)/COUNT(DISTINCT website_sessions.website_session_id) AS order_rate
FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
GROUP BY 1;
yr_mo | sessions | orders | order_rate |
---|---|---|---|
2012/03 | 1843 | 59 | 0.0320 |
2012/04 | 3569 | 93 | 0.0261 |
2012/05 | 3405 | 96 | 0.0282 |
2012/06 | 3590 | 121 | 0.0337 |
2012/07 | 3797 | 145 | 0.0382 |
2012/08 | 4887 | 184 | 0.0377 |
2012/09 | 4487 | 186 | 0.0415 |
2012/10 | 5519 | 237 | 0.0429 |
2012/11 | 9732 | 414 | 0.0425 |
2012/12 | 7033 | 335 | 0.0476 |
2013/01 | 4211 | 244 | 0.0579 |
2013/02 | 5108 | 338 | 0.0662 |
2013/03 | 4445 | 272 | 0.0612 |
2013/04 | 5729 | 395 | 0.0689 |
2013/05 | 5991 | 392 | 0.0654 |
Displaying records 1 - 15
Evidently our Session volume and Conversion Rate has been increasing steadily since the inception of the business, exactly what we would want to see.
Objective 2: Order Conversion Rate Trending by Advertising Campaign
Get a monthly Order Rate trend for ‘gsearch’ but this time split out ‘nonbrand’ and ‘brand’ campaigns (In terms of Website Session volume, Order volume, and Conversion Rate)
(Note: The ‘brand’ and ‘nonbrand’ campaigns are paid advertisements)
SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN website_sessions.website_session_id ELSE NULL END) AS brand_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN orders.order_id ELSE NULL END) AS brand_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN website_sessions.website_session_id ELSE NULL END) AS brand_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN website_sessions.website_session_id ELSE NULL END) AS nonbrand_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN orders.order_id ELSE NULL END) AS nonbrand_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN website_sessions.website_session_id ELSE NULL END) AS nonbrand_order_rt
FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
GROUP BY 1;
yr_mo | brand_sessions | brand_orders | brand_order_rt | nonbrand_sessions | nonbrand_orders | nonbrand_order_rt |
---|---|---|---|---|---|---|
2012/03 | 8 | 0 | 0.0000 | 1835 | 59 | 0.0322 |
2012/04 | 64 | 6 | 0.0938 | 3505 | 87 | 0.0248 |
2012/05 | 113 | 6 | 0.0531 | 3292 | 90 | 0.0273 |
2012/06 | 141 | 6 | 0.0426 | 3449 | 115 | 0.0333 |
2012/07 | 150 | 10 | 0.0667 | 3647 | 135 | 0.0370 |
2012/08 | 204 | 10 | 0.0490 | 4683 | 174 | 0.0372 |
2012/09 | 265 | 16 | 0.0604 | 4222 | 170 | 0.0403 |
2012/10 | 333 | 15 | 0.0450 | 5186 | 222 | 0.0428 |
2012/11 | 463 | 21 | 0.0454 | 9269 | 393 | 0.0424 |
2012/12 | 539 | 34 | 0.0631 | 6494 | 301 | 0.0464 |
2013/01 | 517 | 32 | 0.0619 | 3694 | 212 | 0.0574 |
2013/02 | 381 | 38 | 0.0997 | 4727 | 300 | 0.0635 |
2013/03 | 357 | 18 | 0.0504 | 4088 | 254 | 0.0621 |
2013/04 | 400 | 32 | 0.0800 | 5329 | 363 | 0.0681 |
2013/05 | 476 | 26 | 0.0546 | 5515 | 366 | 0.0664 |
Displaying records 1 - 15
Brand sessions have been increasing but the Conversion Rate has been volatile with an over all upward trend. Nonbrand sessions have had significant growth in Website Sessions peaking around November/December of 2012 and settling in the 4000-5000 range. They have observed consistent growth in Conversion Rate, nearing the 7% mark in latter months.
Objective 3: Order Conversion Rate Trending by Device Type
While on ‘gsearch’, dive into ‘nonbrand’ and pull volumes for Monthly Sessions and Orders along with Conversion Rates split by Device Type (‘mobile’, ‘desktop’)
SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS desktop_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN orders.order_id ELSE NULL END) AS desktop_orders,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS desktop_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN website_sessions.website_session_id ELSE NULL END) AS mobile_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN orders.order_id ELSE NULL END) AS mobile_orders,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN website_sessions.website_session_id ELSE NULL END) AS mobile_order_rt
FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
WHERE website_sessions.utm_source = 'gsearch' AND website_sessions.utm_campaign = 'nonbrand'
GROUP BY 1;
yr_mo | desktop_sessions | desktop_orders | desktop_order_rt | mobile_sessions | mobile_orders | mobile_order_rt |
---|---|---|---|---|---|---|
2012/03 | 1119 | 49 | 0.0438 | 716 | 10 | 0.0140 |
2012/04 | 2135 | 76 | 0.0356 | 1370 | 11 | 0.0080 |
2012/05 | 2271 | 82 | 0.0361 | 1021 | 8 | 0.0078 |
2012/06 | 2678 | 107 | 0.0400 | 771 | 8 | 0.0104 |
2012/07 | 2768 | 121 | 0.0437 | 879 | 14 | 0.0159 |
2012/08 | 3519 | 165 | 0.0469 | 1164 | 9 | 0.0077 |
2012/09 | 3169 | 154 | 0.0486 | 1053 | 16 | 0.0152 |
2012/10 | 3929 | 203 | 0.0517 | 1257 | 19 | 0.0151 |
2012/11 | 7010 | 351 | 0.0501 | 2259 | 42 | 0.0186 |
2012/12 | 4952 | 285 | 0.0576 | 1542 | 16 | 0.0104 |
2013/01 | 2795 | 189 | 0.0676 | 899 | 23 | 0.0256 |
2013/02 | 3594 | 269 | 0.0748 | 1133 | 31 | 0.0274 |
2013/03 | 3088 | 227 | 0.0735 | 1000 | 27 | 0.0270 |
2013/04 | 4046 | 332 | 0.0821 | 1283 | 31 | 0.0242 |
2013/05 | 4255 | 323 | 0.0759 | 1260 | 43 | 0.0341 |
Displaying records 1 - 15
We see fast growth in desktop orders early on and slower growth late, but with consistent growth in order rate throughout. Mobile order sessions are volatile all around but have had fewer drop offs in recent months. There is consistent growth with their order rates despite the volatility in session volume.
Objective 4: Order Conversion Rate Trending by Website Channel Type
Pull monthly trends by volume and Order Conversion Rate for ‘gsearch’ alongside each of our other channels
(Note: If it is a ‘bsearch’ or ‘gsearch’ session, then they got to the website via search engine by clicking on a paid advertisement. If it was through a search engine but instead of a paid advertisement they clicked on a normal link, then that is an ‘organic session’. If they directly typed in the website’s http address then it is a ‘direct session’.)
SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN website_sessions.website_session_id ELSE NULL END) AS gsearch_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN orders.order_id ELSE NULL END) AS gsearch_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN website_sessions.website_session_id ELSE NULL END) AS gsearch_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN website_sessions.website_session_id ELSE NULL END) AS bsearch_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN orders.order_id ELSE NULL END) AS bsearch_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN website_sessions.website_session_id ELSE NULL END) AS bsearch_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN website_sessions.website_session_id ELSE NULL END) AS organic_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN orders.order_id ELSE NULL END) AS organic_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN orders.order_id ELSE NULL END)
/COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN website_sessions.website_session_id ELSE NULL END) AS organic_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN website_sessions.website_session_id ELSE NULL END) AS direct_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN orders.order_id ELSE NULL END) AS direct_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN orders.order_id ELSE NULL END)
/COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN website_sessions.website_session_id ELSE NULL END) AS direct_order_rt
FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
GROUP BY 1;
yr_mo | gsearch_sessions | gsearch_orders | gsearch_order_rt | bsearch_sessions | bsearch_orders | bsearch_order_rt | organic_sessions | organic_orders | organic_order_rt | direct_sessions | direct_orders | direct_order_rt |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012/03 | 1843 | 59 | 0.0320 | 2 | 0 | 0.0000 | 8 | 0 | 0.0000 | 9 | 0 | 0.0000 |
2012/04 | 3569 | 93 | 0.0261 | 11 | 0 | 0.0000 | 76 | 2 | 0.0263 | 71 | 5 | 0.0704 |
2012/05 | 3405 | 96 | 0.0282 | 25 | 0 | 0.0000 | 148 | 3 | 0.0203 | 150 | 8 | 0.0533 |
2012/06 | 3590 | 121 | 0.0337 | 25 | 1 | 0.0400 | 194 | 10 | 0.0515 | 169 | 8 | 0.0473 |
2012/07 | 3797 | 145 | 0.0382 | 44 | 2 | 0.0455 | 206 | 13 | 0.0631 | 188 | 9 | 0.0479 |
2012/08 | 4887 | 184 | 0.0377 | 696 | 19 | 0.0273 | 265 | 14 | 0.0528 | 250 | 11 | 0.0440 |
2012/09 | 4487 | 186 | 0.0415 | 1438 | 74 | 0.0515 | 332 | 13 | 0.0392 | 284 | 12 | 0.0423 |
2012/10 | 5519 | 237 | 0.0429 | 1770 | 83 | 0.0469 | 427 | 22 | 0.0515 | 442 | 26 | 0.0588 |
2012/11 | 9732 | 414 | 0.0425 | 3115 | 155 | 0.0498 | 623 | 26 | 0.0417 | 569 | 25 | 0.0439 |
2012/12 | 7033 | 335 | 0.0476 | 1700 | 90 | 0.0529 | 692 | 46 | 0.0665 | 642 | 38 | 0.0592 |
2013/01 | 4211 | 244 | 0.0579 | 924 | 63 | 0.0682 | 660 | 47 | 0.0712 | 607 | 35 | 0.0577 |
2013/02 | 5108 | 338 | 0.0662 | 1043 | 74 | 0.0709 | 524 | 44 | 0.0840 | 472 | 38 | 0.0805 |
2013/03 | 4445 | 272 | 0.0612 | 954 | 67 | 0.0702 | 477 | 34 | 0.0713 | 409 | 18 | 0.0440 |
2013/04 | 5729 | 395 | 0.0689 | 1228 | 80 | 0.0651 | 510 | 41 | 0.0804 | 478 | 32 | 0.0669 |
2013/05 | 5991 | 392 | 0.0654 | 1284 | 86 | 0.0670 | 627 | 51 | 0.0813 | 567 | 45 | 0.0794 |
Displaying records 1 - 15
We see steady growth across the board with ‘direct’ sessions and ‘organic’ sessions having the most impressive conversion rates. This makes sense since those users were seeking out specifically our company. However, the vast majority of the traffic comes from ‘bsearch’ and ‘gsearch’.
Objective 5: Estimating Order Volume Differential
Estimate the additional orders we have recieved as a result from switching from ‘/home’ to ‘/lander-1’ (June 19 to July 28) as the landing page for ‘gsearch nonbrand’ sessions
(Note: here, these url’s represent the first page a user sees when they get to the site)
This is a multistep problem. First, we need to get the differential in
order rates between the two landing pages. This will require using a
subquery and the following query will be the inner query imbedded in the
FROM
statement:
SELECT
website_sessions.website_session_id,
website_pageviews.pageview_url,
MIN(website_pageviews.website_pageview_id) AS land_pg_view_id
FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.created_at BETWEEN '2012-06-19' AND '2012-07-28'
AND website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url IN ('/home','/lander-1')
GROUP BY 1,2;
website_session_id | pageview_url | land_pg_view_id |
---|---|---|
11660 | /home | 23443 |
11661 | /home | 23444 |
11662 | /home | 23448 |
11663 | /home | 23451 |
11664 | /home | 23454 |
11665 | /home | 23457 |
11666 | /home | 23461 |
11668 | /home | 23464 |
11670 | /home | 23466 |
11671 | /home | 23469 |
11672 | /home | 23473 |
11673 | /home | 23474 |
11674 | /home | 23478 |
11675 | /home | 23481 |
11677 | /home | 23486 |
Displaying records 1 - 15
Now we know what url was the landing page for each website session. Let’s get the order rates for our two landing url’s:
SELECT DISTINCT
land_pg.pageview_url,
COUNT(DISTINCT land_pg.website_session_id) AS sessions,
COUNT(DISTINCT orders.order_id) AS orders,
COUNT(DISTINCT orders.order_id)/COUNT(DISTINCT land_pg.website_session_id) AS order_rt
FROM (
SELECT
website_sessions.website_session_id,
website_pageviews.pageview_url,
MIN(website_pageviews.website_pageview_id) AS land_pg_view_id
FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.created_at BETWEEN '2012-06-19' AND '2012-07-28'
AND website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url IN ('/home','/lander-1')
GROUP BY 1,2
) AS land_pg
LEFT JOIN orders
ON orders.website_session_id = land_pg.website_session_id
GROUP BY 1;
pageview_url | sessions | orders | order_rt |
---|---|---|---|
/home | 2272 | 73 | 0.0321 |
/lander-1 | 2304 | 94 | 0.0408 |
2 records
Evidently ‘/lander-1’ outperforms ‘/home’ by about .0088 orders per session.
We would like to see when the last time was that ‘/home’ was used as a landing page. This will give us our cutoff point for the next query when we get the total number of ‘/lander-1’ sessions.
SELECT
MAX(website_sessions.website_session_id) AS max_session_id
FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url = '/home';
max_session_id |
---|
17145 |
1 records
So the last ‘/home’ session was at the 17145th website session.
Now we will get the number of ‘/lander-1’ sessions
SELECT
COUNT(website_sessions.website_session_id) AS session_count
FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
AND website_sessions.website_session_id > 17145
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url = '/lander-1';
session_count |
---|
35734 |
1 records
35,734 sessions X .0088 differential => 314 more orders with ‘/lander-1’ than what would’ve been with ‘/home’. Evidence that ‘lander-1’ is in the best interest of the company to use as the landing page for the website.
Objective 6: Conversion Tunnel Comparison
Create full conversion tunnels for both the ‘/home’ and ‘/lander-1’ (‘gsearch nonbrand’)
(Note: This conversion tunnel takes you through each necessary page in order to reach a completed order. We would like to compare the tunnels of the ‘/home’ and ‘/lander-1’ landing pages beginning on June 19, 2012 (since that is when ‘/lander-1’ was first created.)
This will also be a multistep problem. If we really wanted to, we could
do this by embedding multiple subqueries. However, that can be difficult
to read for someone else who didn’t write the code but also for the one
who wrote it who may need to refer back to it later. Instead we can make
use of VIEW
’s in SQL that store queries that we can access later by
using SELECT
statements on them like we would on a table like so:
CREATE VIEW home_monthly_conv
AS
SELECT
website_sessions.created_at,
website_sessions.website_session_id,
CASE WHEN website_pageviews.pageview_url = '/home' THEN 1 ELSE 0 END AS home,
CASE WHEN website_pageviews.pageview_url = '/lander-1' THEN 1 ELSE 0 END AS lander,
CASE WHEN website_pageviews.pageview_url = '/products' THEN 1 ELSE 0 END AS prod,
CASE WHEN website_pageviews.pageview_url = '/the-original-mr-fuzzy' THEN 1 ELSE 0 END AS fuzzy,
CASE WHEN website_pageviews.pageview_url = '/cart' THEN 1 ELSE 0 END AS cart,
CASE WHEN website_pageviews.pageview_url = '/shipping' THEN 1 ELSE 0 END AS ship,
CASE WHEN website_pageviews.pageview_url IN ('/billing','/billing-2') THEN 1 ELSE 0 END AS bill,
CASE WHEN website_pageviews.pageview_url = '/thank-you-for-your-order' THEN 1 ELSE 0 END AS thanks_last
FROM website_pageviews
LEFT JOIN website_sessions
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_pageviews.created_at > '2012-06-19'
AND website_sessions.website_session_id < 17145
AND website_sessions.utm_campaign = 'nonbrand'
AND website_sessions.utm_source = 'gsearch';
This VIEW
looks like this:
created_at | website_session_id | home | lander | prod | fuzzy | cart | ship | bill | thanks_last |
---|---|---|---|---|---|---|---|---|---|
2012-06-18 19:05:26 | 11660 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:05:26 | 11660 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:05:26 | 11660 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2012-06-18 19:07:51 | 11661 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:07:51 | 11661 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:36:10 | 11662 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:36:10 | 11662 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:36:10 | 11662 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2012-06-18 19:59:23 | 11663 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-06-18 19:59:23 | 11663 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Displaying records 1 - 10
1’s represent if that session id ever made it to that page. What we would like to do is see this as a funnel, in other words not having seperate rows for each distinct session id so that we can see where the funnel stops more easily. We can do this as follows, making use of our view:
SELECT
website_session_id,
MAX(home) AS home_pg,
MAX(lander) AS lander_pg,
MAX(prod) AS prod,
MAX(fuzzy) AS fuzzy,
MAX(cart) AS cart,
MAX(ship) AS ship,
MAX(bill) AS bill,
MAX(thanks_last) AS thanks
FROM home_monthly_conv
GROUP BY 1;
website_session_id | home_pg | lander_pg | prod | fuzzy | cart | ship | bill | thanks |
---|---|---|---|---|---|---|---|---|
11660 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
11661 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
11662 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
11663 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
11664 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
11665 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11666 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11668 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11670 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
11671 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Displaying records 1 - 10
This is exactly what we wanted. We can clearly see which lander was used
for each session and then how they traversed the next several pages.
Since this query isn’t too complicated, let’s use it as a subquery
instead of making another VIEW
, although that would definitely be a
valid approach. We will use this as a subquery to get our counts and
rates for each step in the order funnel depending on what their lander
was. This final query is as follows:
SELECT
CASE
WHEN home_pg = 1 THEN 'home'
WHEN lander_pg = 1 THEN 'lander'
ELSE 'something else'
END AS landers,
COUNT(DISTINCT website_session_id) AS sessions,
COUNT(DISTINCT CASE WHEN prod = 1 THEN website_session_id ELSE NULL END) AS product_pg,
COUNT(DISTINCT CASE WHEN fuzzy = 1 THEN website_session_id ELSE NULL END) AS fuzzy_pg,
COUNT(DISTINCT CASE WHEN cart = 1 THEN website_session_id ELSE NULL END) AS cart_pg,
COUNT(DISTINCT CASE WHEN ship = 1 THEN website_session_id ELSE NULL END) AS ship_pg,
COUNT(DISTINCT CASE WHEN bill = 1 THEN website_session_id ELSE NULL END) AS bill_pg,
COUNT(DISTINCT CASE WHEN thanks = 1 THEN website_session_id ELSE NULL END) AS order_pg
FROM(
SELECT
website_session_id,
MAX(home) AS home_pg,
MAX(lander) AS lander_pg,
MAX(prod) AS prod,
MAX(fuzzy) AS fuzzy,
MAX(cart) AS cart,
MAX(ship) AS ship,
MAX(bill) AS bill,
MAX(thanks_last) AS thanks
FROM home_monthly_conv
GROUP BY 1
) AS funnel_table
GROUP BY 1;
landers | sessions | product_pg | fuzzy_pg | cart_pg | ship_pg | bill_pg | order_pg |
---|---|---|---|---|---|---|---|
home | 2346 | 975 | 708 | 306 | 208 | 176 | 77 |
lander | 2381 | 1119 | 801 | 361 | 238 | 202 | 96 |
2 records
It is evident that ‘/lander-1’ has had better stepthrough’s than ‘/home’. This indicates to us that we should move forward with featuring the ‘/lander-1’ landing page.
Objective 7: Analyzing Revenue per Session
Quantify the impact of the billing test (‘/billing’ vs. ‘/billing-2’) by analyzing the lift generated from the test (Sept 10 to Nov 10) in terms of revenue per billing page session
We will once again make use of a subquery to get our desired information.
SELECT
website_pageviews.created_at,
website_pageviews.website_session_id,
website_pageviews.pageview_url,
orders.order_id,
orders.price_usd
FROM website_pageviews
LEFT JOIN orders
ON orders.website_session_id = website_pageviews.website_session_id
WHERE website_pageviews.created_at BETWEEN '2012-09-10' AND '2012-11-10'
AND website_pageviews.pageview_url IN ('/billing','/billing-2');
created_at | website_session_id | pageview_url | order_id | price_usd |
---|---|---|---|---|
2012-09-09 21:49:26 | 25306 | /billing | 870 | 49.99 |
2012-09-10 00:13:05 | 25325 | /billing-2 | 871 | 49.99 |
2012-09-10 05:39:09 | 25343 | /billing | 872 | 49.99 |
2012-09-10 07:53:37 | 25353 | /billing | 873 | 49.99 |
2012-09-10 08:36:13 | 25358 | /billing-2 | 874 | 49.99 |
2012-09-10 09:26:12 | 25368 | /billing | 875 | 49.99 |
2012-09-10 11:22:08 | 25393 | /billing | 876 | 49.99 |
2012-09-10 12:22:36 | 25411 | /billing-2 | 877 | 49.99 |
2012-09-10 15:08:37 | 25454 | /billing-2 | 878 | 49.99 |
2012-09-10 15:18:08 | 25459 | /billing-2 | NA | NA |
Displaying records 1 - 10
Here we have information concerning when a purchase was made, which session it is associated with, what billing page they used (key information), and then the order info as well. We can use this now to compare the two billing pages to see which had better perfomance as follows:
SELECT DISTINCT
pageview_url,
COUNT(website_session_id) AS billing_sessions,
COUNT(order_id) AS orders,
SUM(price_usd)/COUNT(pageview_url) AS rev_per_session
FROM(
SELECT
website_pageviews.created_at,
website_pageviews.website_session_id,
website_pageviews.pageview_url,
orders.order_id,
orders.price_usd
FROM website_pageviews
LEFT JOIN orders
ON orders.website_session_id = website_pageviews.website_session_id
WHERE website_pageviews.created_at BETWEEN '2012-09-10' AND '2012-11-10'
AND website_pageviews.pageview_url IN ('/billing','/billing-2')
) AS billing_pg
GROUP BY 1;
pageview_url | billing_sessions | orders | rev_per_session |
---|---|---|---|
/billing | 657 | 301 | 22.90257 |
/billing-2 | 653 | 410 | 31.38729 |
2 records
The billing test indicates that the second billing page-type, ‘/billing-2’ is more effective in converting more orders. There is more orders by volume and also more money being spent persession as a result.