The forecasting model development using data mining analysis of the leisure sports industry
- Centering horse racing' case -
Abstract
In the information age, the data-flood age, the importance of analysis and management of data is on the increase and, the any amount of data is more and more on the increase in leisure sports industry. In the case of specific leisure sports industries, such as basketball, soccer, and baseball etc., it analyzes data, whereas, the analysis and management of other industries are insufficient.
In this study, it suggests the data analysis and importance of data through the data mining of the horse racing information in horseracing industry of leisure sports industry. In reference to the data to develop the forecasting model, it uses the information in Seoul Race Course between 2010 and July 2013. In order to develop 3017 forecasting models out of total 3772 racings, it sets Training data. In the case of the rest of the racing, 755 times, it sets the Test data for the model verification. For the main variables of analysis, the record data, the result of racing, sets the dependent variable. And, 52 variables; the weight of racehorse, weather, racetrack condition, the weight of jockey, and winning rate etc. are used in independent variables.
For the analysis, there is the dependent variable normalization. On the basis of multiple linear regression analysis, the forecasting model of horse racing is analyzed with the model according to the stepwise regression analysis. The analyzed model is compared and verified with actual result of racing. First, it sets the performance indicator to help the understanding in this verification process. And then, the performance indicator means the average number of horses, which finished first and second in the race in reality, as compared with the forecast of horses, which finished first and second on the basis of daily double. With regard to the result of this study, the variables of horseracing are selected; racehorse group, race distance, weather, racetrack condition, racetrack humidity, racehorse number, age of racehorse, burden weight of racehorse, weight of racehorse, the weight change of racehorse, the number of winningfirst prize of racehorse, the number of winning second prize of racehorse, the number of winning third prize of racehorse, the score of winning group of racehorse, total number of race of race director, the number of winning third prize of race director, first prize winning rate of race director, winning rate of race director during these one year, daily double rate of race director during these one year, the number of winning first prize of jockey, winning rate of jockey, daily double rate of jockey, first prize winning rate of jockey during these one year, and winning rate of jockey during these one year. In the case of the model with multiple linear regression analysis and normalizing dependent variables, it is the best model, which has 0.7981 performance indicator, and it has the topmost predictive ability.