What is a text analysis: Data mining

This is an chronicle calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be info dig. Dividing the customers impinge on company according to their profitability. Yes, this is a information mine labor beca utilisation it requires selective information analysis to deter tap who the costumers atomic number 18 that brings to a greater extent dividing line to the company. Computing the total sales of the company. No, this is not a data exploit task be practice in that location Is not analysis involve, this information tin be pull out of any booking program. Sorting a student database based on student ID numbers.No, this Is not a data milling activity because sorting by ID numbers doesnt convolute any data mining task. This is a simple database query Predicting the future stock price of a company using historical records. Yes. We would attempt to build a mock up that can predict the continual value of th e stock price. This is an example of the bea of data mining known as predictive modeling. We could use regression for this modeling, although researchers in many fields adjudge developed a wide variety of techniques for predicting season series. Monitoring the heart rate of a longanimous for abnormalities. Yes.We would build a model of the normal behaviour of heart rate and raise an alarm when an unusual heart air occurred. This would involve the area of data mining known as anomaly detection. This could to a fault be considered as a classification problem If we had examples of both normal and abnormal heart behavior. For distributively of the by-line, identify the relevant data mining task(s) The capital of Massachu targetts Celtic would like to approximate how many points their next opponent will score against them. A force intelligence officer is interested in instruction about the captives proportions of Sunnis and Shies in a particular strategic region. A NORA defense computer essential resolve immediately whether a blip on the microwave radar is a flick of geese or an incoming nuclear missile. A political strategist is seeking the trump groups to canvass for donations in particular county. A mother country security official would like to determine whether a certain sequence of financial and residence moves implies a tendency to terrorist acts. A Wall Street psychoanalyst has been asked to find out the expected change in stock price for a set of companies with same price/earnings ratios.Question 3 For each of the following meetings, explain which phase in the CRISP-DIM process is represented Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how effective and accurate their model is. This is the Evaluation phase in the CRISP-DIM process. In the evaluation phase the data mining analysts determine if the model and technique apply meets subscriber line objectives established in the first phase. The data mining project jitney meets with data warehousing manager to discuss how the data will be collected. This is theData sympathy phase in the CRISP-DIM process. The data warehouse is identified as a resource during the occupation brain phase however the actual data appealingness takes place during the Data Understanding Phase. In this phase data is collected and doored from the resources listed and identified in the Business Understanding phase. The data mining consultant meets with the offense president for marketing, who says that he would like to move forward with customer relationship management. The main objective of backing is to review during the Business Understanding Phase.So, therefore subsequently the meeting it seems the data mining consultant gained success in convincing UP of marketing to go a style approval for performing data mining on the customer relationship management system. The data mining project manager meets with the reapingion l ine supervisor to discuss implementation of changes and improvements. The discussion of implementation of changes and improvements in the project whether specific improvements or process changes are required to ensure that all important aspects of the rail line are accounted is performed under the Evaluation Phase.The meeting held with business objective to collect and cleanse the data to ensure the quality of data. The analysts meet to discuss whether the neural mesh topology or decision tree model should be applied Question 4 10 points Describe the possible negative effects of proceeding directly to mine data that has not been preprocessed. Before data mining algorithms can be use, a target data set must be assembled. As data mining can only uncover patterns very present in the data, the target data set must be large ample to select these patterns while imagining concise enough to be mine within an acceptable time limit.A common source for data is a data mart or data warehou se. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data. Question 5 1 5 points Which of the three orders for discussion missing values do you prefer? Which regularity is the most conservative and probably the safest, meaning that it fabricates the least amount of data? What are some drawbacks to this method? Methods for replacing missing field values with substance abuser defined constants Means or modesRandom draws from the distribution of the variable Question 6 Describe the differences between the readying set, transcendneling set, and proof set. The training set is employ to build the model. This contains a set of data that has fricasseed target and predictor variables. Typically a hold-out dataset or test set is used to evaluate how well the model does with data outside the training set. The test set contains the fricasseed results data but they are not used when the test set data is run by the mod el until the end, when the fricasseed data are compared against the model results.The model is adjusted to minimize error on the test set. Another hold-out dataset or validation set is used to evaluate the adjusted model in step 2 where, again, the validation set data is run against the adjusted model and results compared to the unused fricasseed data. The training set (seen data) to build the model (determine its parameters) and the test set (unseen data) to measure its performance (holding the parameters constant). Sometimes, we also need a validation set to tune the model (e. G. , for pruning a decision tree). The validation set cant be used for testing (as its not unseen).Data miningThis is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining. Dividing the customers off company according to their profitability. Yes, this is a data mining task because it requires data analysis to determ ine who the costumers are that brings more business to the company. Computing the total sales of the company. No, this is not a data mining task because there Is not analysis involve, this information can be pull out of any booking program. Sorting a student database based on student ID numbers.No, this Is not a data milling activity because sorting by ID numbers doesnt Involved any data mining task. This is a simple database query Predicting the future stock price of a company using historical records. Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modeling. We could use regression for this modeling, although researchers in many fields have developed a wide variety of techniques for predicting time series. Monitoring the heart rate of a patient for abnormalities. Yes.We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem If we had examples of both normal and abnormal heart behavior. For each of the following, identify the relevant data mining task(s) The Boston Celtic would like to approximate how many points their next opponent will score against them. A military intelligence officer is interested in encyclopaedism about the captives proportions of Sunnis and Shies in a particular strategic region. A NORA defense computer must decide immediately whether a blip on the radar is a flick of geese or an incoming nuclear missile. A political strategist is seeking the best groups to canvass for donations in particular county. A homeland security official would like to determine whether a certain sequence of financial and residence moves implies a tendency to terrorist acts. A Wall Street analyst has been asked to find out the expected change in stock price fo r a set of companies with similar price/earnings ratios.Question 3 For each of the following meetings, explain which phase in the CRISP-DIM process is represented Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. This is the Evaluation phase in the CRISP-DIM process. In the evaluation phase the data mining analysts determine if the model and technique used meets business objectives established in the first phase. The data mining project manager meets with data warehousing manager to discuss how the data will be collected. This is theData Understanding phase in the CRISP-DIM process. The data warehouse is identified as a resource during the Business Understanding phase however the actual data collection takes place during the Data Understanding Phase. In this phase data is collected and accessed from the resources listed and identified in the Business Understanding phase. The data mining consultant meets with the vice president for marketing, who says that he would like to move forward with customer relationship management. The main objective of business is to review during the Business Understanding Phase.So, therefore after the meeting it seems the data mining consultant gained success in convincing UP of marketing to provide approval for performing data mining on the customer relationship management system. The data mining project manager meets with the production line supervisor to discuss implementation of changes and improvements. The discussion of implementation of changes and improvements in the project whether specific improvements or process changes are required to ensure that all important aspects of the business are accounted is performed under the Evaluation Phase.The meeting held with business objective to collect and cleanse the data to ensure the quality of data. The analysts meet to discuss whether the neural network or decision tree model should be applied Question 4 10 points Describe the possible negative effects of proceeding directly to mine data that has not been preprocessed. Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while imagining concise enough to be mined within an acceptable time limit.A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data. Question 5 1 5 points Which of the three methods for handling missing values do you prefer? Which method is the most conservative and probably the safest, meaning that it fabricates the least amount of data? What are some drawbacks to this method? Methods for replacing missing field values with User defined constants Means or modesRandom draws from the distribution of the variable Question 6 Describe the differences between the training set, test set, and validation set. The training set is used to build the model. This contains a set of data that has fricasseed target and predictor variables. Typically a hold-out dataset or test set is used to evaluate how well the model does with data outside the training set. The test set contains the fricasseed results data but they are not used when the test set data is run through the model until the end, when the fricasseed data are compared against the model results.The model is adjusted to minimize error on the test set. Another hold-out dataset or validation set is used to evaluate the adjusted model in step 2 where, again, the validation set data is run against the adjusted model and results compared to the unused fricasseed data. The training set (seen data) to build the model (determine its parameters) and the test set (unseen data) to measure its performance (holding the parameters constant). Sometimes, we also need a validation set to tune the model (e. G. , for pruning a decision tree). The validation set cant be used for testing (as its not unseen).Data MiningDetermine the benefits of data mining to the businesses when employing 1. Predictive analytics to understand the behavior of customers Predictive analytics is business intelligence applied science that produces a predictive score for each customer or other organizational element. Assigning these predictive win is the job of a predictive model, which has, in turn been trained over your data, learning from the experience of your organization. Predictive analytics optimizes marketing campaigns and website behavior to increase customer responses, conversions and clicks, and to return churn.Each customers predictive score informs actions to be taken with that customer. 1. Associations discovery in products sold to customers The way in which companies interact with their customers has changed dramatically over the past a few(prenomin al) years. A customers continuing business is no longer guaranteed. As a result, companies have found that they need to understand their customers better, and to quickly respond to their wants and needs. In addition, the time frame in which these responses need to be made has been shrinking.It is no longer possible to wait until the signs of customer dissatisfaction are obvious before action must be taken. To succeed, companies must be proactive and anticipate what a customer desires. For an example in the old days, the storekeepers would simply keep track of all of their customers in their heads, and would know what to do when a customer walked into the store. Today store associates face a much more complex situation, more customers, more products, more competitors, and slight time to react means that understanding your customers is now much harder to do.A number of forces are working together to increase the complexity of customer relationships, such(prenominal) as compressed ma rketing cycles, increased marketing costs, and a stream of new product offers. There are many kinds of models, such as running(a) formulas and business rules. And, for each kind of model, there are all the weights or rules or other mechanics that determine precisely how the predictors are combined. In fact, there are so many choices, it is literally impossible for a person to try them all and find the best one.Predictive analytics is data mining technology that uses the companys customer data to automatically build a predictive model specialized for the business. This process learns from the organizations collective experience by leverage the existing logs of customer purchases, behavior and demographics. The wisdom gained is encoded as the predictive model itself. Predictive modeling software has computer science at its core, undertaking a mixture of number crunching, trial, and error. 2. Web mining to discover business intelligence from Web customers The fast business growth has made both business participation and customers face a new situation. Due to intense competition on the one hand and the customers option to prefer from a number of alternatives, the business familiarity has realized the essential of intelligent marketing strategies and relationship management. Web servers record and accumulate data about user relations whenever requirements for resources are received. Analyzing the Web access logs can help understand the user behavior and the web structure.From the business and applications point of view, knowledge obtained from the web usage patterns could be directly applied to efficiently manage activities correlated to e-business, e-services and e-education. Accurate web usage information could help to attract new customers, retain current customers, improve cross marketing/sales, enduringness of promotional campaigns, tracking leaving customers etc. The usage information can be exploited to improve the performance of Web servers by developi ng victorian perfecting and caching strategies so as to decrease the server response time.User profiles could be built by combining users? navigation paths with other data features, such as page viewing time, hyperlink structure, and page content, according to Sonal Tiwari. 3. Clustering to find related customer information Clustering is a typical unsupervised learning technique for grouping similar data points. A clustering algorithm assigns a large number of data points to a smaller number of groups such that data points in the same group share the same properties while, in different groups, they are dissimilar.Clustering has many applications, including part family formation for group technology, depiction segmentation, information retrieval, web pages grouping, market segmentation, and scientific and engineering analysis. Many clustering methods have been proposed and they can be broadly classified into four categories such as partitioning methods, hierarchical methods, densi ty-based methods and grid-based methods. node clustering is the most important data mining methodologies used in marketing and customer relationship management (CRM).Customer clustering would use customer-purchase transaction data to track buying behavior and create strategic business initiatives. Companies want to keep high-profit, high-value, and low-risk customers. This cluster typically represents the 10 to 20 percent of customers who create 50 to 80 percent of a companys profits. A company would not want to lose these customers, and the strategic initiative for the segment is obviously retention. A low-profit, high-value, and low-risk customer segment is also an attractive one, and the obvious goal here would be to increase profitability for this segment.Cross-selling (selling new products) and up-selling (selling more of what customers soon buy) to this segment are the marketing initiatives of choice. Assess the reliability of the data mining algorithms. Decide if they can b e trusted and predict the errors they are likely to produce. almost methods for validating a data-mining model do not answer business questions directly, but provide the metrics that can be used to guide a business or development decision. There is no comprehensive rule that can tell you when a model is just enough, or when you have enough data.Accuracy is a measure of how well the model correlates an outcome with the attributes in the data that has been provided. There are various measures of accuracy, but all measures of accuracy are dependent on the data that is used. In reality, values might be missing or approximate, or the data might have been changed by triune processes. Particularly in the phase of exploration and development, you might decide to accept a certain amount of error in the data, especially if the data is evenhandedly uniform in its characteristics.For example, a model that predicts sales for a particular store based on past sales can be strongly correlated and very accurate, even if that store consistently used the wrong accounting method. Therefore, measurements of accuracy must be balanced by assessments of reliability. Reliability assesses the way that a data-mining model performs on different data sets. A data-mining model is reliable if it generates the same type of predictions or finds the same general kinds of patterns egardless of the test data that is supplied. For example, the model that you would use to generate for the store that used the wrong accounting method would not generalize well to other stores, and therefore would not be reliable. Analyze privacy concerns raised by the collection of personal data for mining purposes. 1. Choose and describe three (3) concerns raised by consumers. Recent surveys on privacy show a great concern about the use of personal data for purposes other than the one for which data has been collected.The handling of misinformation can cause serious and long-term damage, so individuals should b e able challenge the correctness of data about themselves, such as personal records. The last concern is grain access to personal information, such as personal information about someones health when applying for a job. 2. Decide if each of these concerns is valid and explain your decision for each. These concerns are valid, the first concerned mentioned caused an extreme case to occurred in 1989, collecting over $16 gazillion USD by selling the driver-license data from 19. million Californian residents, the Department of Motor Vehicles in California revised its data selling policy after Robert Brado used their services to obtain the address of actress Rebecca Schaeffer and afterward killed her in her apartment. While it is very unlikely that KDDM tools will reveal directly precise confidential data, the exploratory Knowledge Discovery and Data Mining (KDDM), tools may correlate or disclose confidential, sensitive facts about individuals resulting in a significant reduction of po ssibilities.The second concern is valid due to incident possibility in Washington Cablevision fired an employee crowd Russell Wiggings, on the basis of information obtained from Equifax, Atlanta, about Wiggings conviction for cocaine possession the information was actually about James Ray Wiggings, and the case ended up in court. This illustrates a serious case in defining property of the data containing personal records. The third issue is For example, employers are obliged to perform a background check when hiring a worker but it is widely accepted that information about diet and knead habits should not affect hiring decisions. . Describe how each concern is being allayed. KDDM revitalizes some issues and possess new threats to privacy. Some of these can be directly attributed to the fact that this puissant technique may enable the correlation of separate data sets in other to significantly reduce the possible values of private information. Other can be more attributed to the interpretation, application and actions taken from the inferences obtain with the tools.While this raises concerns, there is a body of knowledge in the field of statistical databases that could potentially be extended and adapted to develop new techniques to balance the rights to privacy and the needs for knowledge and analysis of large volumes of information. Some of these new privacy protection methods are emerging as the application of KDD tools moves to more controversial datasets. Provide at least three (3) examples where businesses have used predictive analysis to gain a competitive advantage and evaluate the effectiveness of each businesss strategy.The first advantage analysis helps when it comes to validity of a product by making a distinction between the positioning of a product and its ability to satisfy customer requirements. Another important attributes include ease of use, innovation, how well the product integrates with other technologies that customers need. The seco nd advantage is the technology provides to customers. Even if a product is well designed, it must be able to help businesses achieve their business goals. Goals crease from gaining insight about customers in order to be more competitive, to using the technology to increase revenue.A key attribute that is measured in this property is how well the product supports companies in meeting their objectives. The third advantage is the strength of the companys strategy. It is not enough to simply have a good vision a company must also have a well-designed road map that can support this vision. Vision attributes also include more tactical aspects of the companys strategy such as a technology platform that can scale, well-articulated messaging, and positioning. A key component of this dimension is pellucidity it must be clear what business problem the company is solving for which customer.ReferencesAlexander, D. (2012). Data Mining. Retrieved from http//www.laits.utexas.edu/anorman/BUS.FOR/ course.mat/Alex/8Josh, K. (2012). Analysis of Data Mining Algorithms. Retrieved from http//www-users.cs.umn.edu/desikan/research/dataminingoverview.htmlExforsys. (2006). Execution for System tie between Data Mining and Customer Interaction. Retrieved from http//www.exforsys.com/tutorials/data-mining/the-connection-between-data-mining-and-customer-interaction.htmlFrand, J. (1996). Data Mining What is Data Mining? Retrieved from http//www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htmPupo, E. (2010). HIMSS News Privacy and Security Concerns in Data Mining. Retrieved from http//www.himss.org/ASP/ContentRedirector.asp?type=HIMSSNewsItem&ContentId=73526Stein, J. (2011). Data Mining How Companies Now Know Everything About You. Retrieved from http//www.time.com/time/magazine/article/0,9171,2058205,00.htmlixzz25MwYNhuh

What is a text analysis

Saturday, May 25, 2019

Data mining

No comments:

Post a Comment