Least Squares Regression Analysis in MATLAB Lab Report

User Generated

eeee3333

Mathematics

Description

Unformatted Attachment Preview

The Application of Linear Algebra to Recommender Systems: Linear Regression and Singular Value Decomposition Approaches SAMPLE FINAL REPORT Professor E. G. Puckett. MAT 167 Spring Quarter 2014 Table of Contents Introduction and Motivation……………. 1 Data Set…………………………………. 2 Methods…………………………………. 2 Results…………………………………… 6 Summary and Conclusions………………. 9 Figures…………………………………... 10 Tables…………………………………… 15 Appendix: Code…………………………. 17 Introduction and Motivation The ubiquity of Internet access and its ever-growing role in activities ranging from commerce to personal leisure has created an explosion in data over the last decade. Much of this data is gathered from user-generated behavior within a web browser whether or not the user is aware of its collection. The quantity and richness of this data has helped foster demand for a userdriven model of online experiences, particularly with regard to hedonic pursuits. As the realm of human experience has increasingly shifted from the physical space to the digital, businesses have attempted to exploit data on website users in order to increase revenue or provide a better service in order to retain loyal customers. With consumers’ brief attention spans mercilessly pulled from one corner of the Internet to the next, a business’ survival may depend upon delivering the right material to the right people at the right time. A recommender system is one vehicle by which this goal is accomplished. Such systems are algorithms that analyze data for revealed user preferences and translate those preferences into targeted content. User data may come in the form of clicks on links, time spent reading a given page, or ratings provided for a product. Recommendation systems come in forms both subtle and explicit. Services such as Google News analyze consumer interests by browsing habits and recommend news stories tailored to users’ tastes. Advertisers collect data on website visits in order to classify a user into pre-specified market segments. Online shopping goliath Amazon.com records purchases and monitors previously viewed items in order to suggest products that will catch a customer’s eye. Streaming music providers like Pandora use song or artist seeds to create a station suited to an individual listener. And, perhaps most famously of all, Netflix makes use of movie-watching history and user ratings in order to recommend movies the user has not seen but is likely to enjoy. As the first online movie rental service to excel in its niche, Netflix developed a physical and digital delivery system and unique business model that rapidly put pressure on its store-based competitors that retained an antiquated and unpopular regime of due dates and late fees. Netflix has remained competitive by, among other things, maintaining a state-of-the-art recommender system that predicts preferences with reasonable accuracy in order to retain the interest of consumers and to keep the service fresh and relevant. Netflix famously sought to enhance this recommender system through a public competition. Though the Netflix Prize Contest ended in 2009, the idea of movie recommender systems has remained a teaching point and a case study in the field of machine learning. Recommender systems are categorized by the kind of data on which they rely. So-called content-based filtering systems build a profile of users and products; this kind of system relies on the relationship between a single user and the available products. The user profile indicates the kinds of product features a user prefers (e.g. a user watches mostly science fiction movies) and the product profile indicates to what degree the product can be placed in a certain category (e.g. a movie may be science fiction or a romantic comedy, but rarely both). In contrast, recommendations made by a collaborative filtering system are formed by an analysis of similarities among users and products; this system relies principally on the relationship of users to users or products to products, rather than users to products. It is common in movie recommendation to use a hybrid contentbased and collaborative filtering system that relies on all the movies to receive at least one rating by some user, and recommends movies to users based on their own preferences. This approach is particularly vital for the regression and singular value decomposition approaches employed below. 1 Data Set Although the Netflix Prize data, including nearly a half-million users and 18,000 movies, were released for the duration of the competition, they are no longer publicly available. Nevertheless the University of Minnesota’s GroupLens Research team has released a similar data set1 with ratings of 1682 movies by 943 users from the MovieLens database. These ratings, like those from Netflix, come in integer values between 1 and 5, sometimes called the number of “stars” a user gives a movie. Thus these data can be used in an identical way to the Netflix Prize data. The MovieLens data come with a variety of additional information, including the genre of each movie in the database. While there are ways to make use of information about movie features, this project is focused on specific applications of linear algebra and will make predictions based only on known user ratings. Movie genres will be discovered ex post, rather than entered ex ante. Due to computational limitations it was impracticable to work with the entire MovieLens data set. Even with its relatively small size of 1.5 million entries (an order of magnitude smaller than that of the Netflix data), the rating matrix is simply too large to analyze in a reasonable amount of time on a humble laptop computer. Instead I elected a much smaller subset of the data, choosing 20 movies and 200 users at random. The purpose of choosing such a small data set was to make the results easy to interpret and present, and to keep the computations brief enough that they could be repeated many times under different conditions to explore the effects of changing model parameters. With enough resources, however, the methods described in this report could be employed with every movie and user provided in the MovieLens data. Methods Why Matrix Factorization Works The primary interest of the movie recommender system is to predict a user’s ratings for movies he has not seen, and recommend those that he is likely to enjoy. At the heart of this method is the assumption that a user’s rating for any given movie is a linear combination of a vector describing the degree to which the movie contains certain characteristics and themes (the genre vector) and the degree to which the user enjoys these characteristics (the preference vector). Each movie, containing a unique combination of elements from different genres, has its own genre vector whose elements correspond to the level of each characteristic; for example, a romantic comedy would contain high values for humor, romance, and feel-good categories and low values for action or thriller categories. Similarly, because of the uniqueness of personal preferences, each user has his or her own preference vector; a hopeless romantic would have high values for the romantic comedy characteristics just described and perhaps low values for the action and thriller categories. In the context of movie ratings, matrix factorization methods essentially break down the ratings matrix into component matrices representing movie genres and user preferences. Each movie and each user is assigned a vector in feature-space (genres for the movies, preferences over those genres for the users). When the interests of a user align with the genres of a movie, that user is likely to enjoy the movie; geometrically, this is like a close alignment of the genre and preference vectors in space. When user preferences do not agree with a movie’s features, the genre and preference vectors are orthogonal. Therefore, the inner product of the movie’s genre vector with the user’s preference vector would provide a prediction about the user’s enjoyment of the movie. However, we do not have the right data in the first place: we lack both the genre vectors of movies and preference vectors of users. All we have is a sparse rating matrix! Moreover, while we 1 Available at http://www.grouplens.org/node/73. 2 require genre and preference vectors, these are not our primary ends in movie recommendation; rather, we want the predicted user ratings themselves. Because the vectors of interest are unknown to us, these genre and preference parameters are called “latent” factors and we must discover them by means of collaborative and content-based filtering. If there is at least one rating for every movie by some user, then we can discover the genre vector that makes the best rating predictions. The converse is also true: if there is at least one rating for every movie then we have enough data to discover the preference vector that makes the best prediction. Linear Regression Approach Given the nature of MovieLens rating data, in which the diversity of user rating strategies makes rating distributions highly non-normal and varied across users, the notion of multiple linear regression for user rating predictions would seem something of a folly. Instead, however, the method is quite powerful with the aid of a carefully controlled gradient descend algorithm. Formally, we would represent the user-specific linear model as where is the rating given by the user for the movie. The parameters stand for the values of movie j’s genre vector for genre g, where the (arbitrary) number of genres is G. The user’s preference vector is represented by the parameters with similar notation. (The intercept is omitted for reasons that will become clear.) More generally, this linear model defines a (J*I) x 1 rating vector whose entries may be arranged in a J x I rating matrix Y with movies as rows and users as columns, whose entries are defined by where is the J x G matrix formed by stacking movie genre vectors and P is the I x G matrix formed by stacking user preference vectors. In reality, this rating matrix is sparse, meaning that most of its entries are empty because not every user has rated every movie. Thus the challenge is to use a form of collaborative filtering in order to impute these missing ratings as accurately as possible. The engine of linear regression is the minimization of the sum of squared errors (SSE), and in order to get errors we need predictions. This problem is one of concurrently predicting the genre and preference matrices and . In order to accomplish this, we define the sum of squared errors as ∑ ( ( ) ) where indicates the set of all pairs (i, j) for which user i has provided a rating for movie j, where a bold character indicates a vector, and where is the actual rating provided by the user. Our objective is to predict genres and preferences in a way that minimizes this error. Furthermore, we are interested in obtaining small and realistic values of the genre and preference vectors. One danger of unchecked simultaneous optimization is that the parameters may take any values necessary to achieve the objective. This problem can lead to excessively large values of some parameters and exceedingly small values of others, posing the problem of overfitting the model to a specific data set where such parameters may produce good estimates but fail to generalize to other samples. The simple solution is to penalize (or “regularize”) the elements of the genre and preference vectors so that the individual elements do not become too large. Therefore we let denote the regularization parameter and define the objective function as 3 ∑ ( ( ) ∑∑ ∑∑ ) where we optimize with respect to genre and preference vectors simultaneously, controlling the size of and by introducing squared terms into the cost function so that their expansion in the positive or negative direction is checked by rapidly increasing penalties. The value of need not be large, and in fact by the nature of the cost function the root-mean-squared error of the predictions will increase as increases. Nevertheless, its presence in the objective is critical to a plausible outcome. With an infinite number of combinations of the parameters to test, the optimization is computationally unwieldy without a well-defined rule for sniffing out the minimum. Rather than map out every value of SSE in parameter space, we compute the error at a given value of our parameters, we determine the slope of the SSE in that neighborhood, and we ride the slope downward toward a lower error and repeat. In calculus the gradient of a function is defined as the direction of steepest ascent for a given value of inputs and is computed as the vector of partial derivatives of all variables evaluated at those inputs. Thus we employ a method known as gradient descent to determine, at some point in parameter space, the appropriate change in parameters that reduces the SSE most efficiently. This procedure employs a parameter known as the learning rate, defined as the size of the “step” down the gradient before the next computation. A larger learning rate will seek out the optimum more quickly, but too large a value will make too large a step and will risk overshooting the minimum and increasing the parameters uncontrollably. Because we predict both genres and preferences, the gradient descent is applied to both the genre and preference vectors in each iteration of the algorithm, a process called the gradient descent update. Using the partial derivatives of the objective function and the learning rate, the updates are (∑( ) ) (∑( ) ) which define new values of the parameters and . These are then tested in the objective function again and another round of gradient descent is conducted. Thanks to the gradient descent update, we can in theory start from any point in parameter space and, given enough iterations of the algorithm and assuming we do not fall into a local minimum, we can work our way toward the global minimum of the objective function. In this way the machine can “learn” or discover the optimal parameters, once we have specified an arbitrary quantity of genres. This is also the reason that we do not need an intercept in our original linear model: if an intercept term belongs in the model, the algorithm will adapt and learn this feature. Therefore, we assemble the final algorithm in the following way: first, we initialize the values of the genre and preference vectors to small random values; second, we run the gradient descent update until the convergence of the objective function to its minimum; and finally we predict ratings with the inner product of the genre and preference vectors for each user and movie. This process is considered a low-rank matrix factorization procedure because a minimization is used to approximate the actual user ratings while the predicted rating matrix is decomposed into preference and genre matrices, so that the final prediction is given by ̂ ̂ ̂ . 4 Without much surprise we note that this result is simply the estimated factorization proposed by our linear model above. With some exploration of the model’s behavior and the right combination of learning rate and regularization parameter, this method performs astoundingly well in terms of minimizing the difference between prediction and actual rating. The predicted star rating will, of course, not occur in integer values like the ratings supplied by users. This does not pose an obstacle: in fact, Netflix’s “best guess” for ratings has always come in the form of fractions of stars, and there would be tremendous loss of information if predictions were rounded to the nearest star. Singular Value Decomposition Approach Many of the principles that make prediction effective in the case of linear regression also apply to another commonly implemented recommendation technique, the singular value decomposition (SVD). Like regression, the SVD also relies on discovering component matrices whose product is the ratings matrix, and on a gradient descent update to refine those factors. Furthermore the component matrices are the genre and preference matrices whose product is the prediction of use ratings for a movie. The singular value decomposition factors any real matrix A into three components: a matrix of (orthogonal) eigenvectors of AAT, a matrix of (orthogonal) eigenvectors of ATA, and a diagonal matrix of the singular values (the square roots of the eigenvalues of ATA). This SVD can serve the purpose of dimensionality reduction because it orders the eigenvectors and singular values by the size of the singular values; that is, by the relative importance (i.e. amount of information stored) of each vector. Then the most important vectors can be chosen and recombined to produce an approximation of the original matrix. This is relevant to the movie rating matrix because the act of discovering the latent genre and preference vectors is exactly the problem of dimensionality reduction: insofar as the assumption holds that movies are categorized into genres and users may have similar tastes, we may use the SVD to discover the hidden features that characterize these similar movies and users into fewer groups. Specifically, if we let our sparse rating matrix be the matrix A just described (filling in missing values with the mean rating by movie), we can decompose it with the SVD. What emerges is a complete picture of all possible genres (as many as there are movies) and all user preferences (as many as there are users). In other words, by the SVD we obtain where, as for linear regression, represents the matrix of movie genres and represents the matrix of user preferences. Here, represents the diagonal matrix of singular values. Some of these genres and preferences do not actually contain much information on the variety of the movies themselves, but SVD has already done the job of sorting out the important genres by ordering the vectors appropriately. Now we can choose the k most important genres by taking appropriate subsets of the three component matrices. With these subset matrices, we can derive a prediction for all the entries of the original matrix A by multiplying the factors back together. This prediction will be reasonable for a first approximation, but a gradient descent is still needed in order to refine the prediction. Specifically, after subsetting the component matrices, we merge the genre matrix with the matrix of singular values : ̃ where ̃ . We then proceed with an iterative gradient descent update process. In the SVD setting we do not explicitly pursue the objective of minimizing the sum of squared errors as we do in linear regression, but our updates are still guided by seeking the lowest possible difference between predicted and actual ratings. 5 We define the predictive error (for all movies with a rating), , as where, as above, we have the genre vector ,the preference vector , and the actual rating . To find the descent trajectory, we use the partial derivative of the squared sum of errors with respect to each parameter. As before, we subtract this derivative (times a learning rate) from the original parameter in order to update the genres and preferences. The updates are given by ∑( ) ∑( ) where is again the learning rate. After the completion of the gradient descent update, we have a predicted genre matrix ̂̃ and the predicted preference matrix ̂ which we multiply for the final prediction, ̂̃ ̂ ̂ Results Computational Challenges In order to accomplish the gradient descent updates required for accurate prediction, the algorithms implemented in this study compute anywhere from tens of thousands to hundreds of thousands of matrix multiplications with each run of the custom functions. Thus the machine time required for a single run can quickly escalate when very accurate predictions are desired. The computation time is highly dependent both on the size of the matrix being computed and the number of gradient descent iterations used. Moreover, both the matrix size and number of iterations have a bearing on the accuracy of the final prediction; a larger matrix contains more information about users and movies, and more iterations allows for a further descent into an error minimum. Figures 1 and 2 report the effect of matrix size and number of iterations on computation time for a single run of both predictive approaches. Happily, Figure 1 shows that computational time increased linearly in the number of matrix entries for both the regression and SVD approach, indicating a stable function that operates as efficiently as it can as the matrix grows. The SVD approach has a slight time advantage over the regression approach, an advantage that appears to grow with the number of entries. This result indicates that, for my chosen default matrix of 20 movies and 200 users, a regression approach with 10,000 iterations requires about 5.5 seconds, while an equivalent SVD approach requires about 4.5 second. Not surprisingly, the two functions also exhibit linear time consumption trends in the number of iterations in the gradient descent, as demonstrated in Figure 2. Here again the SVD method had a slight advantage that widened as the number of iterations grew. However, the regression algorithm could achieve more accurate results with fewer iterations, giving it a comparative advantage in the end. In fact, regression could perform better with 2,000 iterations than SVD could with 200,000 iterations. Thus regression was a more efficient algorithm. 6 Number of Iterations With the root-mean-squared error (RMSE) as the judge of performance, we can quantify how much better the regression approach performed. Both functions were run using successively larger numbers of iterations ranging from 10,000 to 200,000. The RMSE was computed for each and the results were plotted in Figure 3. Although regression has a slightly longer run time for any given number of iterations, it has a vastly superior predictive capability. In fact, over the range tested, SVD never outperformed regression at a given number of iterations. Moreover, the regression approach appears to have settled into a local minimum at around 50,000 iterations, with the very low RMSE of 0.07. In contrast, even at 200,000 iterations, the RMSE of the SVD approach was still descending, presumably toward some minimum that it could achieve if given enough time. It should be noted that, although these two functions employ similar gradient descent methods, they start with very different matrices. Thus their descent path toward a minimum is also quite different, a fact that may help explain their different behavior. Learning Rate There is another factor that causes the disparity in Figure 3, however. Recall that the learning rate is a control parameter that determines how quickly the gradient descent moves in the direction of minimum error. If this parameter is too small, the steps toward the best predictions are also too small and finding the minimum is too slow. However, if the parameter is too large, the steps will also be too large and will overshoot the minimum, possibly causing a gradient ascent away from the minimum instead of a gradient descent toward the minimum. Figure 4 plots the RMSE by value of the learning rate. Both methods were highly sensitive to the value of the learning rate and pinning down a good range for the learning rate was difficult. Finding the best rate was especially challenging for the SVD approach. Although Figure 4 does not reflect it, the RMSE could not be resolved with a learning rate larger than about 8x10-5 in the SVD case. The learning rates thus had to be quite small in order for SVD to return real results. This is another reason why the SVD method required many more iterations than the regression method. With a relatively smaller learning rate, the gradient descent required more time to locate the minimum. Number of Genres One of the unique features of these recommender algorithms is the ability to decide how many genres should fully characterize the movies in the data set. This is a fairly arbitrary decision but it has implications for the performance of the models, as shown in Figure 5. Most notably, the RMSE decreases in the number of genres, reaching its minimum when the number of genres equals the number of movies. So what is the advantage of choosing fewer genres than movies? First, we want to reduce computation time. In a matrix with thousands of movies, reducing the movies to a few dozen genres can result in enormous time savings in practical implementations of these algorithms. Second, we want to label movies. Rather than pay employees to research movies in order to determine their genre designations, a business can instead allow the computer to discover the genres after someone selects how many genres are likely to be present in the data set. Then the movies can be labeled appropriately with genres for which they score highly. Third, and most important for recommendation systems, we want to avoid the problem of overfitting. If we let every movie represent its own “genre” then we are creating a system that makes good predictions for that particular matrix, but will not generalize well to other rating matrices, or may perform a good deal more poorly when new movies and users are added. And of course, in the case of the 7 SVD approach, the method has a built-in mechanism for discovering the most important genres. After a point, there will be rapidly diminishing marginal benefits to additional genres. Predictive Performance When we use the right model parameters, we see a decrease in root-mean-squared error. But the RMSE is just a number and can feel abstract. Instead, it is helpful to visualize how the right choice of parameters leads to a better rating prediction. Figure 6 plots predicted ratings versus actual user ratings for both “good” parameters and “bad” parameters. In both the regression and SVD approaches, we see that bad parameters, leading to a high RMSE, create inaccurate predictions with a wide variance. In contrast, good parameters allow predicted and actual ratings to align so that they are nearly indistinguishable. In this case we can be confident that our predictions are accurate, at least for movies that the user has already rated: the predictions conform exceedingly well with the true values of known ratings. For movies without known ratings, the predictions can take on any value in the rating scale, including non-integer numbers, and so do not indicate the exact star rating the user would give (because the user is limited to an integer selection) but rather indicate the degree to which a user is expected to enjoy a film. If a user were to watch one of these films and rate it, a new implementation of the algorithms would use this information to make all the predictions more accurate. Thus, in these algorithms, every user unknowingly collaborates to produce better recommendations for everyone. Genre Space An underlying principle in recommender dimensionality reduction is that movies will tend to “cluster” together in genre-space. That is, movies that share similar features will be described by similar genre vectors. If we could visualize genre-space, we would see similar films grouping together. Fortunately we can get a glimpse of genre space by running the algorithms, setting the number of genres to two, and plotting the results on a two-way scatterplot. Then each dimension would act as a kind of aggregate genre that describes the films. Figures 7 and 8 plot the films by their values of aggregate genre. While it is difficult to interpret exactly what these two genres really represent, we are more interested in the clustering effects. Indeed, we see this clustering to some extent in Figure 7 (note that the axes merely represent relative levels, so they can take on any value). Both The Godfather and The Godfather: Part II fall on the far left, while both Star Trek III and Star Trek VI fall on the far right. Moreover, the comedies Life of Brian and Raising Arizona both appear in the middle. With a few small differences, we see similar movie groupings in Figure 8, which comes from the SVD approach. In fact, Figure 8 looks like nothing more than a 90-degree clockwise rotation of Figure 7, with a few tweaks, suggesting that the two algorithms are discovering the same genres. We must be careful how we interpret these plots when we use only two genres. Did The Lion King and Jaws group together because they are both about ferocious animals? It would be humorous, but probably untrue. Did Twelve Monkeys and Back to the Future group together because they are both about time travel? Possibly. We would need to compare them in higher dimensions to know for sure. And perhaps with more genres we would see The Lion King and Babe pull away from the more serious and adult-oriented movies. Nevertheless, the clustering effects we see here indicate that users with similar tastes rate movies similarly, allowing the algorithm to separate movies into different categories based on this implicit collaboration effect. This is the true secret of the matrix factorization approach! 8 Recommendations We cannot venture into recommender systems without finally making recommendations. In order to demonstrate how the algorithms provide predicted ratings, I generated a small rating matrix of six movies and eight users from the MovieLens data (Table 1). Some users have rated only one movie, some have rated a few, and one has rated all of them. The results of the predictions are presented in Tables 2 and 3. The most striking feature is how closely the predictions match the actual ratings for both methods, though linear regression appears to have done slightly better. Whether the user has rated only one movie or all of them, the predictions are impressively accurate. There is, however, a difference in how the two methods predict ratings for movies the users have not seen. For example, regression predicts user H will give Top Gun a rating of 1.000, while SVD predicts a 3.439 for the same. In effect, one method would tell user H that Top Gun will be his least favorite, and the other method would tell him it will be his favorite among those not yet rated. This is a curious result, especially given the similar genre plots in Figures 7 and 8. This result implies that the two algorithms have found different matrices that provide similarly good predictions for alreadyrated movies but different predictions for unrated movies. It seems that, because the two methods begin from dramatically different component matrices, the gradient descent has identified two different local minima. Returning to the original 20x200 rating matrix, we can again see the difference in the two methods in Table 4. For the same user, the two methods have mixed opinions about what the user will enjoy. In some cases the predictions are similar, in order if not in absolute rating, while in other cases the algorithms make rather different predictions. Again, this result may occur because the methods have settled into different minima. In fact, neither method may be fully correct. It is common in these kinds of machine learning problems to combine or average the results of many different models. It may be that the best predictions come from a blend of both the regression and SVD approaches, but that is an avenue for future work. Summary and Conclusion The goal of this project was to explore two different but related methods for building a recommender system around user ratings for movies. The objective was to make accurate predictions of already-rated movies under the assumption that the accompanying predictions for unrated movies would also be accurate. One approach was linear regression with simultaneous parameter estimation. It began with genre and preference matrices initialized to small random values and then used gradient descent in order to minimize the sum of squared errors. The other approach was a singular value decomposition of the original rating matrix and the selection of the most important genres discovered by the decomposition. The predictions that resulted from the SVD were then updated with another form of gradient descent. The two methods performed well but differed in some of their recommendations. This result indicates that the algorithms settled at different local equilibria. It is possible that one is “more correct” than the other, or that they may both be two pieces of the same puzzle whose solution is more complete with a combination of the methods. Indeed, the winners of the Netflix Prize Contest ultimately used an ensemble of 107 different methods to make the winning predictions. Machine learning tools such as my algorithms are rarely complete on their own. Moreover, a larger computer with more processing power would be able to deal with a larger rating matrix and take advantage of the additional information afforded by including more users and movies. Nevertheless, this experiment has laid the foundations for a reasonably reliable recommender system that employs the collaboration of individual users for the benefit of all. 9 Figure 1. Effect of matrix size on computation time. Figure 2. Effect of number of iterations on computation. 10 Figure 3. RMSE for different number of iterations of gradient descent. Figure 4. RMSE for different learning rates. 11 Figure 5. RMSE for different number of genres. Figure 6. Predicted and actual ratings for bad and good models. 12 Figure 7. Movies in genre-space, regression approach. 13 Figure 8. Movies in genre-space, SVD approach. 14 Table 1. Sample data set with actual user ratings. Movie \ User Phenomenon Sneakers Titanic Top Gun Cool Hand Luke Return of the Jedi A B C 3 D 5 4 5 E 3 3 4 2 4 5 F 4 4 5 5 G H 2 4 5 3 2 4 3 G 2.063 1.998 3.982 1.000 3.024 1.993 H 2.841 2.719 5.000 1.000 3.974 3.008 G 2.937 1.887 4.050 3.441 2.986 2.028 H 3.417 2.606 5.000 3.439 3.943 3.013 Table 2. Prediction results from regression approach (already-rated movies in bold). Movie \ User Phenomenon Sneakers Titanic Top Gun Cool Hand Luke Return of the Jedi A 2.675 2.502 3.127 2.545 3.184 3.997 B 3.571 3.320 4.996 2.150 4.465 4.481 C 2.995 2.624 3.838 2.421 3.401 3.265 D 3.305 3.118 3.944 3.008 3.997 4.996 E 2.999 2.996 4.000 2.001 4.000 4.996 F 3.999 3.426 3.997 4.995 4.092 5.000 Table 3. Prediction results from SVD approach (already-rated movies in bold). Movie \ User Phenomenon Sneakers Titanic Top Gun Cool Hand Luke Return of the Jedi A 3.364 2.532 4.380 3.506 3.699 3.999 B 3.672 2.829 4.999 3.714 4.155 4.140 C 3.000 2.293 3.932 3.078 3.338 3.626 D 3.503 2.729 4.287 3.570 3.867 4.999 E 2.885 2.909 4.112 2.023 3.955 5.000 F 3.943 2.371 4.022 5.000 3.486 5.000 15 Table 4. User preferences and ordered predictions with rating in parentheses. User Preference (Actual Rating) Regression Prediction (Predicted Rating) SVD Prediction (Predicted Rating) User 1 The Godfather (5) The Godfather II (5) True Lies (5) Goodfellas (4) Back to the Future (4) Silence of the Lambs (4) Jaws (4) First Wives Club (3) Twelve Monkeys (3) Dr. Strangelove (2) Leaving Las Vegas (5.000) Lone Star (4.016) Babe (3.934) Lion King (3.857) Primal Fear (2.915) Star Trek VI (2.293) Die Hard II (2.245) Raising Arizona (2.164) Life of Brian (1.932) Star Trek III (1.609) Star Trek VI (4.730) Primal Fear (4.618) Babe (4.486) Die Hard (4.076) Raising Arizona (3.757) Lone Star (3.559) Star Trek III (3.038) Leaving Las Vegas (1.787) Lion King (1.000) Life of Brian (1.000) User 2 The Godfather (5) The Godfather II (5) Jaws (5) Goodfellas (4) True Lies (4) Silence of the Lambs (4) Die Hard II (3) Twelve Monkeys (3) Star Trek III (2) Star Trek VI (2) Babe (5.000) Lion King (4.504) Lone Star (4.273) Dr. Strangelove (4.067) Back to the Future (3.689) Leaving Las Vegas (3.577) Life of Brian (3.462) First Wives Club (2.303) Raising Arizona (1.969) Primal Fear (1.737) Leaving Las Vegas (5.000) Raising Arizona (5.000) Back to the Future (5.000) Lone Star (4.482) Primal Fear (3.890) Babe (3.083) Lion King (3.071) Life of Brian (2.647) First Wives Club (2.640) Dr. Strangelove (2.574) User 3 Dr. Strangelove (5) Twelve Monkeys (5) The Godfather (4) Back to the Future (4) Silence of the Lambs (4) Jaws (4) Raising Arizona (3) Leaving Las Vegas (3) Babe (2) Star Trek VI (1) Life of Brian (5.000) Goodfellas (4.939) The Godfather II (3.755) True Lies (3.379) Die Hard II (3.219) Lion King (3.190) Lone Star (2.947) First Wives Club (2.057) Primal Fear (1.422) Star Trek III (1.000) Lion King (5.000) Life of Brian (5.000) The Godfather II (3.403) Goodfellas (3.382) Lone Star (2.699) Star Trek III (2.660) True Lies (2.496) Primal Fear (2.481) First Wives Club (2.031) Die Hard II (1.340) 16 APPENDIX: CODE QUICK GUIDE TO R Symbol/Term #
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running head: Least squares regression analysis in MATLAB

Student’s Name:
University affiliations:
Instructor’s Name:
Date of submission:

1

Least squares regression analysis in MATLAB

2

Introduction
The purpose of this lab exercise is to use MATLAB software to perform least square
linear and quadratic fit of a given data. Least squares is a regression analysis method used to
approximate the solution of a system through minimization of the sum of squares of the
residuals made in very equation. The original data is to be plotted on the same graph together
with the least squares fitted data for both the linear and the quadratic fits. The errors arising
both the linear fit and the quadratic fit are to be calculated and documented. The differences
between the original data and the fitted data are to be noted and documented
Methods and procedure
The file CA_02.mat was read into the MATLAB’s working directory using the
command load('CA_02');. It was ensured that the CA_02 was placed in the same directory as
the MATLAB’s working directory. The file was found to be a structure file with two fields
‘x’ and ‘y’. The x and y were both 1000*1 matrices and were plotted on a graph using the
command ‘plot’. Least squares linear regression was done on the ‘x’-‘y’ data to get a linear
relationship between the two sets of data. The linear equation obtained was used to obtain the
linear fit data which was plotted on the same graph as the original data. Proper labelling of
the graph was done to differentiate the fitted and the original data. The error between the
original data and the data fitted by a linear equation were calculated and recorded.
Least squares regression was done on the data to obtain a parabolic (quadratic)
relationship between the two sets of data. The quadratic equation obtained was used to obtain
the linear fit data which was plotted on the same graph as the original data. Proper labelling
of the graph was done to differentiate the fitted and the original data. The error between the
original data and the data fitted by a linear equation were calculated and recorded.
Results
The graph plots of the original data and the data fitted by a linear equation is as shown in the
diagram below:

Least squares regression analysis in MATLAB

3

Figure 1: Graphs of the original data and the least squares linear fit
The graph plots of the original data and the data fitted by the least squares quadratic equation
is as shown in the diagram below:

Least squares regression analysis in MATLAB

4

Figure 2: Graphs of the original data and the least squares quadratic (parabolic) fit
The errors obtained for both the least squares linear fit and the least squared quadratic (parabolic)
fit are summarized in the table below:
Error type
Normalized error
Maximum absolute error

Least squares linear fit
0.1239
1.3148

Least squares quadratic fit
0.0343
0.4134

Conclusion
This lab exercise was a good insight in the study of least squares regression analysis. Both
least squares linear polynomial fit and least squares quadratic polynomial fit were done on
two sets of data. It was found that the errors results from quadratic polynomial fit were less
compared the errors arising from the linear fit. Thus the higher the de...


Anonymous
Very useful material for studying!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4