SQL & Tableau: Data Analytics For an Online Retailer Part 1

16 minute read

Background

I completed Maven Analytic’s ‘Advanced SQL + MySQL for Analytics & Business Intelligence’ course. During this course there were 2 projects we were to complete to get simulated experience with not only the advanced SQL skills taught in the body of the course but also in the context of a business setting. The relational database, mavenfuzzyfactory, was imported into MySQL Workbench and used throughout the course.

I played the role of a data analyst for a new online retailer, Maven Fuzzy Factory. They make stuffed animals. In this database, we had info dealing with website sessions, website pageviews, products, orders, order items, and refunds. The board of directors and executives needed answers to some questions to see if the business was growing and doing so efficiently.

The analysis was performed in MySQL Workbench and the reporting as follows was done using an R Markdown file in R Studio by connecting to the database using the RMariaDB R Package and using SQL chunks.

SQL Concepts Implemented

  • The Big 6: SELECT–>FROM–>WHERE–>GROUP BY–>HAVING–>ORDER BY
  • Aggregation (COUNT, SUM, MAX, ect.)
  • LEFT JOIN
  • VIEW’s and Subqueries
  • CASE and COUNT(DISTINCT CASE WHEN...) (count-case pivoting)

Midterm Project

Note: These queries will show only a limited number of rows for the sake of seeing the structure of the output. They are not the full result of the queries in terms of number of rows

Some of the types of inquiries and requests I recieved involve:

  • Analyzing Traffic Sources (Conversion Rates, Bid Optimization, Traffic Source Trending)
  • Analyzing Website Performance (Analyzing Top Pages & Entry Pages, Bounce Rates & Landing Page Tests, Landing Page Trend Analysis, Conversion Funnels & Testing Conversion Paths)

Pull monthly trends for ‘gsearch’ sessions and ‘orders’ so that we can showcase Order Rate growth (In terms of Website Session volume, Order volume, and Conversion Rate)

(Note: ‘gsearch’ is a UTM parameter that indicates if a potential customer found the website via the ‘gsearch’ search engine)

SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,

COUNT(DISTINCT website_sessions.website_session_id) AS sessions,
COUNT(DISTINCT orders.order_id) AS orders,
COUNT(DISTINCT orders.order_id)/COUNT(DISTINCT website_sessions.website_session_id) AS order_rate

FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
GROUP BY 1;
yr_mo sessions orders order_rate
2012/03 1843 59 0.0320
2012/04 3569 93 0.0261
2012/05 3405 96 0.0282
2012/06 3590 121 0.0337
2012/07 3797 145 0.0382
2012/08 4887 184 0.0377
2012/09 4487 186 0.0415
2012/10 5519 237 0.0429
2012/11 9732 414 0.0425
2012/12 7033 335 0.0476
2013/01 4211 244 0.0579
2013/02 5108 338 0.0662
2013/03 4445 272 0.0612
2013/04 5729 395 0.0689
2013/05 5991 392 0.0654

Displaying records 1 - 15

Evidently our Session volume and Conversion Rate has been increasing steadily since the inception of the business, exactly what we would want to see.

Get a monthly Order Rate trend for ‘gsearch’ but this time split out ‘nonbrand’ and ‘brand’ campaigns (In terms of Website Session volume, Order volume, and Conversion Rate)

(Note: The ‘brand’ and ‘nonbrand’ campaigns are paid advertisements)

SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,

COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN website_sessions.website_session_id ELSE NULL END) AS brand_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN orders.order_id ELSE NULL END) AS brand_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'brand' THEN website_sessions.website_session_id ELSE NULL END) AS brand_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN website_sessions.website_session_id ELSE NULL END) AS nonbrand_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN orders.order_id ELSE NULL END) AS nonbrand_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_campaign = 'nonbrand' THEN website_sessions.website_session_id ELSE NULL END) AS nonbrand_order_rt

FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
GROUP BY 1;
yr_mo brand_sessions brand_orders brand_order_rt nonbrand_sessions nonbrand_orders nonbrand_order_rt
2012/03 8 0 0.0000 1835 59 0.0322
2012/04 64 6 0.0938 3505 87 0.0248
2012/05 113 6 0.0531 3292 90 0.0273
2012/06 141 6 0.0426 3449 115 0.0333
2012/07 150 10 0.0667 3647 135 0.0370
2012/08 204 10 0.0490 4683 174 0.0372
2012/09 265 16 0.0604 4222 170 0.0403
2012/10 333 15 0.0450 5186 222 0.0428
2012/11 463 21 0.0454 9269 393 0.0424
2012/12 539 34 0.0631 6494 301 0.0464
2013/01 517 32 0.0619 3694 212 0.0574
2013/02 381 38 0.0997 4727 300 0.0635
2013/03 357 18 0.0504 4088 254 0.0621
2013/04 400 32 0.0800 5329 363 0.0681
2013/05 476 26 0.0546 5515 366 0.0664

Displaying records 1 - 15

Brand sessions have been increasing but the Conversion Rate has been volatile with an over all upward trend. Nonbrand sessions have had significant growth in Website Sessions peaking around November/December of 2012 and settling in the 4000-5000 range. They have observed consistent growth in Conversion Rate, nearing the 7% mark in latter months.

While on ‘gsearch’, dive into ‘nonbrand’ and pull volumes for Monthly Sessions and Orders along with Conversion Rates split by Device Type (‘mobile’, ‘desktop’)

SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,

COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS desktop_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN orders.order_id ELSE NULL END) AS desktop_orders,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'desktop' THEN website_sessions.website_session_id ELSE NULL END) AS desktop_order_rt,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN website_sessions.website_session_id ELSE NULL END) AS mobile_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN orders.order_id ELSE NULL END) AS mobile_orders,
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.device_type = 'mobile' THEN website_sessions.website_session_id ELSE NULL END) AS mobile_order_rt

FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
WHERE website_sessions.utm_source = 'gsearch' AND website_sessions.utm_campaign = 'nonbrand'
GROUP BY 1;
yr_mo desktop_sessions desktop_orders desktop_order_rt mobile_sessions mobile_orders mobile_order_rt
2012/03 1119 49 0.0438 716 10 0.0140
2012/04 2135 76 0.0356 1370 11 0.0080
2012/05 2271 82 0.0361 1021 8 0.0078
2012/06 2678 107 0.0400 771 8 0.0104
2012/07 2768 121 0.0437 879 14 0.0159
2012/08 3519 165 0.0469 1164 9 0.0077
2012/09 3169 154 0.0486 1053 16 0.0152
2012/10 3929 203 0.0517 1257 19 0.0151
2012/11 7010 351 0.0501 2259 42 0.0186
2012/12 4952 285 0.0576 1542 16 0.0104
2013/01 2795 189 0.0676 899 23 0.0256
2013/02 3594 269 0.0748 1133 31 0.0274
2013/03 3088 227 0.0735 1000 27 0.0270
2013/04 4046 332 0.0821 1283 31 0.0242
2013/05 4255 323 0.0759 1260 43 0.0341

Displaying records 1 - 15

We see fast growth in desktop orders early on and slower growth late, but with consistent growth in order rate throughout. Mobile order sessions are volatile all around but have had fewer drop offs in recent months. There is consistent growth with their order rates despite the volatility in session volume.

Pull monthly trends by volume and Order Conversion Rate for ‘gsearch’ alongside each of our other channels

(Note: If it is a ‘bsearch’ or ‘gsearch’ session, then they got to the website via search engine by clicking on a paid advertisement. If it was through a search engine but instead of a paid advertisement they clicked on a normal link, then that is an ‘organic session’. If they directly typed in the website’s http address then it is a ‘direct session’.)

SELECT
DATE_FORMAT(website_sessions.created_at, '%Y/%m') AS yr_mo,

COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN website_sessions.website_session_id ELSE NULL END) AS gsearch_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN orders.order_id ELSE NULL END) AS gsearch_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'gsearch' THEN website_sessions.website_session_id ELSE NULL END) AS gsearch_order_rt,

COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN website_sessions.website_session_id ELSE NULL END) AS bsearch_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN orders.order_id ELSE NULL END) AS bsearch_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN orders.order_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN website_sessions.utm_source = 'bsearch' THEN website_sessions.website_session_id ELSE NULL END) AS bsearch_order_rt,

COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN website_sessions.website_session_id ELSE NULL END) AS organic_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN orders.order_id ELSE NULL END) AS organic_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN orders.order_id ELSE NULL END)
/COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NOT NULL THEN website_sessions.website_session_id ELSE NULL END) AS organic_order_rt,

COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN website_sessions.website_session_id ELSE NULL END) AS direct_sessions,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN orders.order_id ELSE NULL END) AS direct_orders,
COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN orders.order_id ELSE NULL END)
/COUNT(DISTINCT CASE WHEN website_sessions.utm_source IS NULL AND http_referer IS NULL THEN website_sessions.website_session_id ELSE NULL END) AS direct_order_rt

FROM website_sessions
LEFT JOIN orders
ON website_sessions.website_session_id = orders.website_session_id
GROUP BY 1;
yr_mo gsearch_sessions gsearch_orders gsearch_order_rt bsearch_sessions bsearch_orders bsearch_order_rt organic_sessions organic_orders organic_order_rt direct_sessions direct_orders direct_order_rt
2012/03 1843 59 0.0320 2 0 0.0000 8 0 0.0000 9 0 0.0000
2012/04 3569 93 0.0261 11 0 0.0000 76 2 0.0263 71 5 0.0704
2012/05 3405 96 0.0282 25 0 0.0000 148 3 0.0203 150 8 0.0533
2012/06 3590 121 0.0337 25 1 0.0400 194 10 0.0515 169 8 0.0473
2012/07 3797 145 0.0382 44 2 0.0455 206 13 0.0631 188 9 0.0479
2012/08 4887 184 0.0377 696 19 0.0273 265 14 0.0528 250 11 0.0440
2012/09 4487 186 0.0415 1438 74 0.0515 332 13 0.0392 284 12 0.0423
2012/10 5519 237 0.0429 1770 83 0.0469 427 22 0.0515 442 26 0.0588
2012/11 9732 414 0.0425 3115 155 0.0498 623 26 0.0417 569 25 0.0439
2012/12 7033 335 0.0476 1700 90 0.0529 692 46 0.0665 642 38 0.0592
2013/01 4211 244 0.0579 924 63 0.0682 660 47 0.0712 607 35 0.0577
2013/02 5108 338 0.0662 1043 74 0.0709 524 44 0.0840 472 38 0.0805
2013/03 4445 272 0.0612 954 67 0.0702 477 34 0.0713 409 18 0.0440
2013/04 5729 395 0.0689 1228 80 0.0651 510 41 0.0804 478 32 0.0669
2013/05 5991 392 0.0654 1284 86 0.0670 627 51 0.0813 567 45 0.0794

Displaying records 1 - 15

We see steady growth across the board with ‘direct’ sessions and ‘organic’ sessions having the most impressive conversion rates. This makes sense since those users were seeking out specifically our company. However, the vast majority of the traffic comes from ‘bsearch’ and ‘gsearch’.

Objective 5: Estimating Order Volume Differential

Estimate the additional orders we have recieved as a result from switching from ‘/home’ to ‘/lander-1’ (June 19 to July 28) as the landing page for ‘gsearch nonbrand’ sessions

(Note: here, these url’s represent the first page a user sees when they get to the site)

This is a multistep problem. First, we need to get the differential in order rates between the two landing pages. This will require using a subquery and the following query will be the inner query imbedded in the FROM statement:

SELECT 
website_sessions.website_session_id,
website_pageviews.pageview_url,

MIN(website_pageviews.website_pageview_id) AS land_pg_view_id

FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.created_at BETWEEN '2012-06-19' AND '2012-07-28' 
AND website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url IN ('/home','/lander-1')
GROUP BY 1,2;
website_session_id pageview_url land_pg_view_id
11660 /home 23443
11661 /home 23444
11662 /home 23448
11663 /home 23451
11664 /home 23454
11665 /home 23457
11666 /home 23461
11668 /home 23464
11670 /home 23466
11671 /home 23469
11672 /home 23473
11673 /home 23474
11674 /home 23478
11675 /home 23481
11677 /home 23486

Displaying records 1 - 15

Now we know what url was the landing page for each website session. Let’s get the order rates for our two landing url’s:

SELECT DISTINCT
land_pg.pageview_url,

COUNT(DISTINCT land_pg.website_session_id) AS sessions,
COUNT(DISTINCT orders.order_id) AS orders,
COUNT(DISTINCT orders.order_id)/COUNT(DISTINCT land_pg.website_session_id) AS order_rt

FROM (
SELECT 
website_sessions.website_session_id,
website_pageviews.pageview_url,

MIN(website_pageviews.website_pageview_id) AS land_pg_view_id

FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.created_at BETWEEN '2012-06-19' AND '2012-07-28' 
AND website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url IN ('/home','/lander-1')
GROUP BY 1,2
) AS land_pg

LEFT JOIN orders
ON orders.website_session_id = land_pg.website_session_id
GROUP BY 1;
pageview_url sessions orders order_rt
/home 2272 73 0.0321
/lander-1 2304 94 0.0408

2 records

Evidently ‘/lander-1’ outperforms ‘/home’ by about .0088 orders per session.

We would like to see when the last time was that ‘/home’ was used as a landing page. This will give us our cutoff point for the next query when we get the total number of ‘/lander-1’ sessions.

SELECT
MAX(website_sessions.website_session_id) AS max_session_id

FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url = '/home';
max_session_id
17145

1 records

So the last ‘/home’ session was at the 17145th website session.

Now we will get the number of ‘/lander-1’ sessions

SELECT
COUNT(website_sessions.website_session_id) AS session_count

FROM website_sessions
LEFT JOIN website_pageviews
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_sessions.utm_source = 'gsearch'
AND website_sessions.website_session_id > 17145
AND website_sessions.utm_campaign = 'nonbrand'
AND website_pageviews.pageview_url = '/lander-1';
session_count
35734

1 records

35,734 sessions X .0088 differential => 314 more orders with ‘/lander-1’ than what would’ve been with ‘/home’. Evidence that ‘lander-1’ is in the best interest of the company to use as the landing page for the website.

Objective 6: Conversion Tunnel Comparison

Create full conversion tunnels for both the ‘/home’ and ‘/lander-1’ (‘gsearch nonbrand’)

(Note: This conversion tunnel takes you through each necessary page in order to reach a completed order. We would like to compare the tunnels of the ‘/home’ and ‘/lander-1’ landing pages beginning on June 19, 2012 (since that is when ‘/lander-1’ was first created.)

This will also be a multistep problem. If we really wanted to, we could do this by embedding multiple subqueries. However, that can be difficult to read for someone else who didn’t write the code but also for the one who wrote it who may need to refer back to it later. Instead we can make use of VIEW’s in SQL that store queries that we can access later by using SELECT statements on them like we would on a table like so:

CREATE VIEW home_monthly_conv
AS
SELECT
website_sessions.created_at,
website_sessions.website_session_id,

CASE WHEN website_pageviews.pageview_url = '/home' THEN 1 ELSE 0 END AS home,
CASE WHEN website_pageviews.pageview_url = '/lander-1' THEN 1 ELSE 0 END AS lander,
CASE WHEN website_pageviews.pageview_url = '/products' THEN 1 ELSE 0 END AS prod,
CASE WHEN website_pageviews.pageview_url = '/the-original-mr-fuzzy' THEN 1 ELSE 0 END AS fuzzy,
CASE WHEN website_pageviews.pageview_url = '/cart' THEN 1 ELSE 0 END AS cart,
CASE WHEN website_pageviews.pageview_url = '/shipping' THEN 1 ELSE 0 END AS ship,
CASE WHEN website_pageviews.pageview_url IN ('/billing','/billing-2') THEN 1 ELSE 0 END AS bill,
CASE WHEN website_pageviews.pageview_url = '/thank-you-for-your-order' THEN 1 ELSE 0 END AS thanks_last

FROM website_pageviews
LEFT JOIN website_sessions
ON website_sessions.website_session_id = website_pageviews.website_session_id
WHERE website_pageviews.created_at > '2012-06-19' 
AND website_sessions.website_session_id < 17145
AND website_sessions.utm_campaign = 'nonbrand'
AND website_sessions.utm_source = 'gsearch';

This VIEW looks like this:

created_at website_session_id home lander prod fuzzy cart ship bill thanks_last
2012-06-18 19:05:26 11660 1 0 0 0 0 0 0 0
2012-06-18 19:05:26 11660 0 0 1 0 0 0 0 0
2012-06-18 19:05:26 11660 0 0 0 1 0 0 0 0
2012-06-18 19:07:51 11661 1 0 0 0 0 0 0 0
2012-06-18 19:07:51 11661 0 0 1 0 0 0 0 0
2012-06-18 19:36:10 11662 1 0 0 0 0 0 0 0
2012-06-18 19:36:10 11662 0 0 1 0 0 0 0 0
2012-06-18 19:36:10 11662 0 0 0 1 0 0 0 0
2012-06-18 19:59:23 11663 1 0 0 0 0 0 0 0
2012-06-18 19:59:23 11663 0 0 1 0 0 0 0 0

Displaying records 1 - 10

1’s represent if that session id ever made it to that page. What we would like to do is see this as a funnel, in other words not having seperate rows for each distinct session id so that we can see where the funnel stops more easily. We can do this as follows, making use of our view:

SELECT
website_session_id,

MAX(home) AS home_pg,
MAX(lander) AS lander_pg,
MAX(prod) AS prod,
MAX(fuzzy) AS fuzzy,
MAX(cart) AS cart,
MAX(ship) AS ship,
MAX(bill) AS bill,
MAX(thanks_last) AS thanks

FROM home_monthly_conv
GROUP BY 1;
website_session_id home_pg lander_pg prod fuzzy cart ship bill thanks
11660 1 0 1 1 0 0 0 0
11661 1 0 1 0 0 0 0 0
11662 1 0 1 1 0 0 0 0
11663 1 0 1 1 0 0 0 0
11664 1 0 1 1 1 1 1 0
11665 1 0 0 0 0 0 0 0
11666 1 0 0 0 0 0 0 0
11668 1 0 0 0 0 0 0 0
11670 1 0 1 1 1 1 1 1
11671 1 0 1 0 0 0 0 0

Displaying records 1 - 10

This is exactly what we wanted. We can clearly see which lander was used for each session and then how they traversed the next several pages. Since this query isn’t too complicated, let’s use it as a subquery instead of making another VIEW, although that would definitely be a valid approach. We will use this as a subquery to get our counts and rates for each step in the order funnel depending on what their lander was. This final query is as follows:

SELECT
CASE 
      WHEN home_pg = 1 THEN 'home'
    WHEN lander_pg = 1 THEN 'lander'
    ELSE 'something else' 
END AS landers,

COUNT(DISTINCT website_session_id) AS sessions,
COUNT(DISTINCT CASE WHEN prod = 1 THEN website_session_id ELSE NULL END) AS product_pg,
COUNT(DISTINCT CASE WHEN fuzzy = 1 THEN website_session_id ELSE NULL END) AS fuzzy_pg,
COUNT(DISTINCT CASE WHEN cart = 1 THEN website_session_id ELSE NULL END) AS cart_pg,
COUNT(DISTINCT CASE WHEN ship = 1 THEN website_session_id ELSE NULL END) AS ship_pg,
COUNT(DISTINCT CASE WHEN bill = 1 THEN website_session_id ELSE NULL END) AS bill_pg,
COUNT(DISTINCT CASE WHEN thanks = 1 THEN website_session_id ELSE NULL END) AS order_pg

FROM(
SELECT
website_session_id,

MAX(home) AS home_pg,
MAX(lander) AS lander_pg,
MAX(prod) AS prod,
MAX(fuzzy) AS fuzzy,
MAX(cart) AS cart,
MAX(ship) AS ship,
MAX(bill) AS bill,
MAX(thanks_last) AS thanks

FROM home_monthly_conv
GROUP BY 1

) AS funnel_table
GROUP BY 1;
landers sessions product_pg fuzzy_pg cart_pg ship_pg bill_pg order_pg
home 2346 975 708 306 208 176 77
lander 2381 1119 801 361 238 202 96

2 records

It is evident that ‘/lander-1’ has had better stepthrough’s than ‘/home’. This indicates to us that we should move forward with featuring the ‘/lander-1’ landing page.

Objective 7: Analyzing Revenue per Session

Quantify the impact of the billing test (‘/billing’ vs. ‘/billing-2’) by analyzing the lift generated from the test (Sept 10 to Nov 10) in terms of revenue per billing page session

We will once again make use of a subquery to get our desired information.

SELECT 
website_pageviews.created_at,
website_pageviews.website_session_id,
website_pageviews.pageview_url,
orders.order_id,
orders.price_usd

FROM website_pageviews
LEFT JOIN orders
ON orders.website_session_id = website_pageviews.website_session_id
WHERE website_pageviews.created_at BETWEEN '2012-09-10' AND '2012-11-10'
AND website_pageviews.pageview_url IN ('/billing','/billing-2');
created_at website_session_id pageview_url order_id price_usd
2012-09-09 21:49:26 25306 /billing 870 49.99
2012-09-10 00:13:05 25325 /billing-2 871 49.99
2012-09-10 05:39:09 25343 /billing 872 49.99
2012-09-10 07:53:37 25353 /billing 873 49.99
2012-09-10 08:36:13 25358 /billing-2 874 49.99
2012-09-10 09:26:12 25368 /billing 875 49.99
2012-09-10 11:22:08 25393 /billing 876 49.99
2012-09-10 12:22:36 25411 /billing-2 877 49.99
2012-09-10 15:08:37 25454 /billing-2 878 49.99
2012-09-10 15:18:08 25459 /billing-2 NA NA

Displaying records 1 - 10

Here we have information concerning when a purchase was made, which session it is associated with, what billing page they used (key information), and then the order info as well. We can use this now to compare the two billing pages to see which had better perfomance as follows:

SELECT DISTINCT
pageview_url,

COUNT(website_session_id) AS billing_sessions,
COUNT(order_id) AS orders,
SUM(price_usd)/COUNT(pageview_url) AS rev_per_session

FROM(
SELECT 
website_pageviews.created_at,
website_pageviews.website_session_id,
website_pageviews.pageview_url,
orders.order_id,
orders.price_usd

FROM website_pageviews
LEFT JOIN orders
ON orders.website_session_id = website_pageviews.website_session_id
WHERE website_pageviews.created_at BETWEEN '2012-09-10' AND '2012-11-10'
AND website_pageviews.pageview_url IN ('/billing','/billing-2')

) AS billing_pg
GROUP BY 1;
pageview_url billing_sessions orders rev_per_session
/billing 657 301 22.90257
/billing-2 653 410 31.38729

2 records

The billing test indicates that the second billing page-type, ‘/billing-2’ is more effective in converting more orders. There is more orders by volume and also more money being spent persession as a result.