MGT 3 UCSD Mathematics Worksheet

Content Type

User Generated

User

Wrssyrr

Subject

Mathematics

Course

MGT 3

School

University of California San Diego

Department

MGT

Description

Unformatted Attachment Preview

MGT 3: Quantitative Methods in Business Extra Credit 1—Naïve Bayes (20 points) The Breast Cancer Dataset This dataset contains information about 277 women who were treated for breast cancer; specifically, for the removal of a malignant tumor from one of their breasts. This dataset is well-known and has been regularly cited in machine learning literature to explore or demonstrate various approaches to classification modeling. Analysts use the information to predict which of the women experienced a recurrence of her tumor within five years of the initial tumor’s removal. For this exercise, the data has been split into two sets: a training set containing ~80% of the original data (221 rows), and a testing set containing the remaining ~20% (56 rows). You will build your model on the training set and generate predictions on the testing set. This dataset was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. 9 instances which contained missing data were removed from the original dataset. Thanks go to M. Zwitter and M. Soklic for providing the data. The names of the variables contained in the dataset are listed in the table below, along with a brief description of each: Variable Description recur yes/no: whether or not the patient suffered a recurrence of the tumor within five years of the tumor’s removal (yes = recurred) age the age of the patient, bucketed by decade (e.g., 20-29, 30-39) post_meno yes/no: whether or not the patient is post-menopausal (yes = post-menopausal) tumor_size the maximum diameter of the tumor (in millimeters) inv_nodes number of axillary (armpit) lymph nodes with visible metastic breast cancer at the time of diagnosis node_caps yes/no: whether or not the cancer metastasized to a lymph node (yes = metastasized) deg_malig “degree malignant”: the severity of the malignancy of the tumor, ranked on a 3-point scale (1 = least severe, 3 = most severe) breast indicates whether the tumor occurred on the left or right breast quadrant the location of the tumor within the breast, categorized by quadrant (e.g., upper-left, lower-right) radiation yes/no: whether or not the patient received radiation therapy (yes = received radiation) © Ryan Wagner, 2020. Do not copy or distribute without permission. Installing R Packages R comes with built-in functionality (called base R), and the ability to easily load additional commands that greatly enhance the software’s capabilities. These extra commands are found in packages. The commands needed to run and analyze a Naïve Bayesian classifier are found in two packages: e1071 and caret. Fortunately, installing and loading these packages is typically very fast, and requires only simple commands. To install these packages, run the following commands: install.packages(“e1071”) install.packages(“caret”) Note the use of quotes around each package name in the above commands. You only ever need to run these commands one time. Once the package has been successfully installed, it lives permanently on your computer unless you manually uninstall it. Installing a package typically takes between 30-60 seconds, though the process make take longer if your internet connection is weak. You will know the package has finished installing when the > symbol re-appears in the console. Loading R Packages Every time you start R, only the commands contained within base R are automatically loaded. If you wish to use a command contained in a package, you must first load that package. This process must be repeated every time you start R (assuming you wish to use that package during your session), but the command only needs to be run once at the start of your session. After installing a package for the first time (see above), you will still need to load each time you begin a new R session, using the commands below. To load the e1071 and caret packages, run the following commands: library(e1071) library(caret) Note that unlike install.packages(), the library() command does not use quotes around the package name. A best practice is to save all relevant library() calls (those required to execute the work contained in a given script) at the beginning of your script, so that if you revisit the script at a later date, you already know which packages are needed, and are reminded to load them. © Ryan Wagner, 2020. Do not copy or distribute without permission. Training the model When predictions are generated via Naive Bayes, an expanded version of Bayes’ Theorem is used to calculate a posterior probability distribution of the target variable for each row in the dataset. The terms in the theorem (prior probability, evidence, and likelihood) reflect patterns that are believed to govern the system of variables being analyzed. Model training refers to the step in which these patterns are analyzed and stored, so that they can be used to generate predictions on future cases. To train a Naive Bayes, use the naiveBayes() command. This command takes the following parameters: formula: the target variable (that which we would like to predict the outcome of), and the predictors (those variables we believe contain useful information that help us predict the outcome of the target variable.) Formulas take the form y~a+b+c, where y is the target variable, and a,b,c are the predictors. The list of predictors may be extended infinitely, each separated by the + symbol. For example: in the iris dataset, if you wished to predict the species of an iris using information about its petal length and width, you would write the formula as: Species ~ Petal.Length + Petal.Width A few notes on the formula: • The use of spacing between variable names is optional; you may prefer to insert spaces to make your code easier to read. • Variable names are written without $ notation, and no quotes are used around them. Shortcut: If you wish to use all remaining variables (besides the target variable) predictors, you can simply write a period after the ~ symbol in your formula. For example, the following formula assigns Species as the target variable, and uses all four remaining variables in the iris dataset as predictors: Species ~ . data: the dataset containing the target variable and predictors, to be used to train your model. For example, to train your model on the iris dataset, you would write: data = iris Assign the output of the naiveBayes() command to an object with a name of your choice. Continuing the above example, the complete naiveBayes() call would appear as: nb
Purchase answer to see full attachment

Explanation & Answer:

3 pages

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running Head: MGT 3: Quantitative Methods in Business

MGT 3: Quantitative Methods in Business
Name
Institutional Affiliation
Date

1

MGT 3: Quantitative Methods in Business

2

Outline: Extra Credit 1—Naïve Bayes
The Breast Cancer Dataset
For this exercise, the data has been split into two sets:
•
•
•

A training set containing ~80% of the original data (221rows)
A testing set containing the remaining ~20% (56 rows).
You will build your model on the training set and generate predictions on the testing set.

Running Head: MGT 3: Quantitative Methods in Business

MGT 3: Quantitative Methods in Business
Name
Institutional Affiliation
Date

1

MGT 3: Quantitative Methods in Business

2

Extra Credit 1—Naïve Bayes
Q1. Write the command to train a model on the training dataset that predicts the
recurrence of breast cancer using only information about the patient’s age. Save the output
as an object called nb_1. (2 points)
Solution:
#required packages
𝑙𝑖𝑏𝑟𝑎𝑟𝑦(𝑒1071)
𝑙𝑖𝑏𝑟𝑎𝑟𝑦(𝑐𝑎𝑟𝑒𝑡)
#Load the training and test data
𝑏𝑐_𝑡𝑟𝑎𝑖𝑛 < −𝑟𝑒𝑎𝑑. 𝑐𝑠𝑣("𝐸:\\𝐸𝑥𝑝𝑒𝑟𝑡 𝑃𝑎𝑢𝑙\\2020\\18𝑡ℎ 𝐹𝑒𝑏\\𝑏𝑐_𝑑𝑎𝑡𝑎𝑠𝑒𝑡\
\𝑏𝑐_𝑡𝑟𝑎𝑖𝑛. 𝑐𝑠𝑣")
𝑏𝑐_𝑡𝑒𝑠𝑡 < −𝑟𝑒𝑎𝑑. 𝑐𝑠𝑣("𝐸:\\𝐸𝑥𝑝𝑒𝑟𝑡 𝑃𝑎𝑢𝑙\\2020\\18𝑡ℎ 𝐹𝑒𝑏\\𝑏𝑐_𝑑𝑎𝑡𝑎𝑠𝑒𝑡\\𝑏𝑐_𝑡𝑒𝑠𝑡. 𝑐𝑠𝑣")
#training a model on the training dataset to predict recurrence of breast cancer with age
...

Completion Status:

100%

Review

Anonymous

I was having a hard time with this subject, and this was a great help.

Studypool

4.7

Trustpilot

4.5

Sitejabber

4.4

24/7 Homework Help

Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!

Post question

Similar Content

equations and percentage

Purpose of Assignment The purpose of this assignment is to provide students an opportunity to apply the concepts of equati...

Finding tangents using old methods: Fermat's Barrow's and Newton's Methods, assignment help

This is or a history of math class. These need to be solved using the specific methods being asked. I will upload the file...

Probability hw

please see attachement...

Perfect english and math knowledge a must!

Answer both questions 1. Mean, mode and median are all measures of central tendency. Why would a research use one of the ...

math assignment

urgent ...

I need help answering these questions step-by-step

try to answer the question step-by-step so I can understand it more... ...

Homework help

Our tutors provide high quality explanations & answers.

Post question

Mathematics Question

Question 2 Suppose a researcher gathered survey data from 19 employees and asked the employees to rate their job satisfa ...

Mathematics Question

Question 2 Suppose a researcher gathered survey data from 19 employees and asked the employees to rate their job satisfaction on a scale from 0 to 100 (with 100 being perfectly satisfied). Suppose the following data represent the results of this survey. Assume that relationship with their supervisor is rated on a scale from 0 to 50 (0 represents a poor relationship and 50 represents an excellent relationship); overall quality of the work environment is rated on a scale from 0 to 100 (0 represents poor work environment and 100 represents an excellent work environment); and opportunities for advancement is rated on a scale from 0 to 100 (0 represents no opportunities and 100 represents excellent opportunities).Answer the following questions via the attached chart:A) What is the regression formula based on the results from your regression? B) How reliable do you think the estimates will be based on this formula? Explain your answer by citing the relevant metrics. C) Are there any variables that do not appear to be good predictors of job satisfaction? How can you tell? D) If a new employee reports that her relationship with her supervisor is 40, rates her opportunities for advancement to be at 30, finds the quality of the work environment to be at 75, and works 60 hours per week, what would you expect her job satisfaction score to be?

Operation management, business and finance homework help

Please see attached

Operation management, business and finance homework help

Please see attached

Write conditional, converse, inverse, and contrapositive statements

Write conditional, converse, inverse, and contrapositive statements for geometry 100 words if references or quotations are ...

Write conditional, converse, inverse, and contrapositive statements

Write conditional, converse, inverse, and contrapositive statements for geometry 100 words if references or quotations are used please cite them

write formal geometric proofs

write formal geometric proofsaround 100 words if references or quotations are used please cite them geometry class

write formal geometric proofs

write formal geometric proofsaround 100 words if references or quotations are used please cite them geometry class

How does a teacher encourage a parents to be involved in their child's mathematics education?

name three ways to involved parents / caregiver's

How does a teacher encourage a parents to be involved in their child's mathematics education?

name three ways to involved parents / caregiver's

Trignometry assignment

I need working solutions to solution to some trignometric assignment questions

Trignometry assignment

I need working solutions to solution to some trignometric assignment questions

SUBJECTS WE COVER
Accounting	Communications	Geology	Physics
Algebra	Computer Science	Health & Medical	Political Science
Art & Design	Economics	History	Programming
Article Writing	Engineering	Law	Psychology
Biology	Excel	Management	Python
Business & Finance	Environmental Science	Marketing	SAT
Calculus	Film	Mathematics	Social Science
Chemistry	Foreign Languages	Philosophy	Sociology
Statistics	Science	Website Design	Java

SUBJECTS WE COVER
Accounting	Environmental Science	Political Science
Algebra	Programming	Physics
Art & Design	Film	Psychology
Article Writing	Foreign Languages	Python
Biology	Geology	Excel
Business & Finance	Health & Medical	SAT
Calculus	History	Science
Chemistry	Law	Social Science
Communications	Management	Sociology
Computer Science	Marketing	Statistics
Economics	Mathematics	Website Design
Engineering	Philosophy	Java

SUBJECTS WE COVER
Accounting	Communications
Geology	Physics
Statistics	Algebra
Computer Science	Health & Medical
Political Science	Science
Art & Design	Economics
History	Programming
Website Design	Article Writing
Engineering	Law
Psychology	Java
Biology	Excel
Management	Python
Business & Finance	Foreign Languages
Environmental Science	Marketing
SAT	Philosophy
Calculus	Film
Mathematics	Social Science
Sociology	Chemistry

MGT 3 UCSD Mathematics Worksheet

Description

Unformatted Attachment Preview

Explanation & Answer

24/7 Homework Help

The Age of Innocence

The Lost Man

Freakonomics

The Goldfinch

The Secret Garden

The Chosen

The Adventures of Huckleberry Finn

Blink

Milkweed

MGT 3 UCSD Mathematics Worksheet

Description

Unformatted Attachment Preview

Explanation & Answer

24/7 Homework Help

The Age of Innocence

The Lost Man

Freakonomics

The Goldfinch

The Secret Garden

The Chosen

The Adventures of Huckleberry Finn

Blink

Milkweed

Ongoing Conversations