;
0 index
1 Purpose of Data Analysis
2 Basic types of analysis
3 Univariate Analysis
4 Bivariate Analysis
5 Correlation and Causation
6 An example of bivariate analysis
7 Modifying data (sometimes necessary)
8 Creating a graph
9 Predicting Correlation
10 Quantifying your guess
11 Naming regions
12 Create named regions for your project
13 Generating the line
14 Examining the new data
15 Testing Quality of fit
16 Problem with sum(resid)
17 Squaring the residuals
18 Using the trendline tool
19 Exploring other types of regression
20 Your challenge
21 Assignment
22 Deliverables - Paper with:

outline
created using slideshow.cgi by Andy Harris















IUPUI Computer Science: Basic Data Analysis
1. Purpose of Data Analysis
  • Associate variables
  • Ascertain truth
  • Predict relationships
  • Indicate confidence in results



































IUPUI Computer Science: Basic Data Analysis
2. Basic types of analysis
  • Univariate
  • Multivariate
  • Regression



































IUPUI Computer Science: Basic Data Analysis
3. Univariate Analysis
  • One variable, lots of measurements
  • Histograms
  • Normal curves



































IUPUI Computer Science: Basic Data Analysis
4. Bivariate Analysis
  • Two variables, lots of measurement
  • When one increases, what happens to other?
  • Attempt to ascertain correlation



































IUPUI Computer Science: Basic Data Analysis
5. Correlation and Causation
  • Ice cream causes baseball!
  • Which months have high rates of ice cream consumption?
  • Which months have high incidence of baseball games?
  • How strong is the correlation?



































IUPUI Computer Science: Basic Data Analysis
6. An example of bivariate analysis



































IUPUI Computer Science: Basic Data Analysis
7. Modifying data (sometimes necessary)
  • Use 'text to table' on Data menu
  • Excel often guesses format correctly
  • You may have to help
  • Usually you'll still need to clean up
  • Best to have only two columns of data



































IUPUI Computer Science: Basic Data Analysis
8. Creating a graph
  • Highlight the data set
  • Select graph icon or Chart from Insert menu
  • Choose a scatter plot



































IUPUI Computer Science: Basic Data Analysis
9. Predicting Correlation
  • Look at chart for patterns
  • Can you see a pattern?
  • Does information look linear? curved? random?



































IUPUI Computer Science: Basic Data Analysis
10. Quantifying your guess
  • Goal: create a formula that predicts the data
  • Using the line formula (y = mx + b)
  • Our version: predicted = slope * running + intercept



































IUPUI Computer Science: Basic Data Analysis
11. Naming regions
  • highlight a cell (or cells)
  • type name in textbox to LEFT of = at top of screen
  • To fix mistakes, use insert->name->define menu



































IUPUI Computer Science: Basic Data Analysis
12. Create named regions for your project
  • Name running series
  • Name speed series
  • Create a named cell for slope (put 1)
  • Create a named cell for intercept (put 0)



































IUPUI Computer Science: Basic Data Analysis
13. Generating the line
  • In first cell of predicted, write line formula
  • =(slope * running) + intercept
  • replicate this formula in rest of column



































IUPUI Computer Science: Basic Data Analysis
14. Examining the new data
  • Delete old chart
  • Make new chart including prediction
  • Modify prediction series in graph to be a line
  • Change slope and intercept values to improve line



































IUPUI Computer Science: Basic Data Analysis
15. Testing Quality of fit
  • Residual measures difference between prediction and actual value
  • Create a residual column
  • Get sum of residuals
  • Find slope and intercept with sum(resid) = 0



































IUPUI Computer Science: Basic Data Analysis
16. Problem with sum(resid)
  • negative and positive numbers offset each other
  • bad fits look good
  • measurements should all be positive



































IUPUI Computer Science: Basic Data Analysis
17. Squaring the residuals
  • To get positive values, could take absolute value
  • However, squaring works just as well, is easier computationally
  • Make new column squaring each residual
  • Sum residuals squared
  • Modify slope and intercept to get smallest resid^2 value



































IUPUI Computer Science: Basic Data Analysis
18. Using the trendline tool
  • Right click on data series in graph
  • Choose add trendline
  • Select linear (for now)
  • On options tab, display equation and R-squared value
  • Compare your estimates with calculated



































IUPUI Computer Science: Basic Data Analysis
19. Exploring other types of regression
  • Other regressions use other kinds of formulas
  • polynomial, logorithmic, exponential often useful



































IUPUI Computer Science: Basic Data Analysis
20. Your challenge



































IUPUI Computer Science: Basic Data Analysis
21. Assignment
  • Enter data into a spreadsheet
  • Estimate using line formula
  • Enter m and b variables to get a close match
  • Solve by hand first!
  • Check answer by inserting a trendline
  • Look for better fit with another trendline



































IUPUI Computer Science: Basic Data Analysis
22. Deliverables - Paper with:
  • Original data
  • columns for estimate, residual, residual squared
  • Chart showing your estimated trendline and automated trendline
  • A short (one-two paragraph) description of what you've learned



































outline

Purpose of Data Analysis

Basic types of analysis

Univariate Analysis

Bivariate Analysis

Correlation and Causation

An example of bivariate analysis

Modifying data (sometimes necessary)

Creating a graph

Predicting Correlation

Quantifying your guess

Naming regions

Create named regions for your project

Generating the line

Examining the new data

Testing Quality of fit

Problem with sum(resid)

Squaring the residuals

Using the trendline tool

Exploring other types of regression

Your challenge

Assignment

Deliverables - Paper with: