Introduction
Data mining can be referred as a process of analyzing a huge dataset to find the relation, pattern of data such that, the result of data analysis can be used to solve problems appearing in business or to make crucial data-driven decisions related to business (Han and et.al., 2022). A variety of techniques like, machine learning, statistics and database management systems are used in the process of data mining to analyse the data and extract the information from it. As the world is getting digitalized so, the size of data produced each day is also increasing at a high rate. It is manually impossible to manage and extract information from such a huge data and thus, in this case data mining plays a crucial role and helps in managing the data and extracting information from it. Due to this, data mining is used to analyze, manage, extract information from data and also, to make important decisions in multiple sectors like, healthcare, finance, retail, telecommunication, education, etc. You can take Psychology Assignment Help.
This report mainly deals with in-depth details about the data mining investigation to make an appropriate model for the company Confectionaries. The model focuses on increasing profitability over time, for individual countries and collectively both, based upon different variables.
Main Body
Data mining refers to the analysis of large dataset, generally in terabytes or petabytes, to find out a pattern within the data and to extract useful information from the dataset. Generally, data mining and data warehousing are considered similar but, data warehousing refers to management and compilation of data in a database. While, data mining refers to the analyzing the data from the database to finds trends and patterns to convey a useful information. Data mining is done with the help of programming languages like, Python, R, and SQL. In the project, python is used as programming language for designing data mining model.
Python Introduction:
Python is one of the most popular programming language, developed by Guido van Rossum, and released in the year 1991. It is easy to use and understand as, it has an easy and simple syntax. And thus has a variety of implementation. Python is used for creating web applications, software development, data analysis, system scripting, etc. It is a high level, object oriented and interpreted type of programming language. One of the biggest advantage of using python is its strong community support and huge set of libraries that makes coding simple and fast.
Investigation steps:
The complete process of data mining is divided into six stages, Business Understanding, Understanding Data, Preparation of Data, Modeling, Evaluation of Model and, Deployment.
1) Understanding Business:
This is where the data mining process starts. It includes locating and comprehending the problem statement, which must be fixed with the aid of the data mining procedure (Khder and et. al., 2021). Creating a model to forecast and gradually raise the company's profit is the problem statement for Confectionaries. you can also check free examples of assignment.
-
Data Understanding:
In this phase of data mining, the dataset is explored to learn more about its quality, content, and structure. This step aids in determining whether or not the dataset utilised for the mining is appropriate for the given challenge. At this point, patterns within the given dataset can be discovered using association rule mining. For the analysis of market baskets, this algorithm is helpful. Clustering can also be used to identify patterns in the dataset. This algorithm learns on the datasets that are not labelled and makes the predictions about the categorical variables based on the traits that are given.
This provided dataset is as follows which are related to confectionaries such as Date, Country, Confectionary, Units Sold, Revenue (£), Cost (£), and Profit (£). In the case of Date, just by seeing the date column we can say span of given the data is from 2000 to 2005; Similarly Confectionary contains eight items such as Biscuit, Biscuit Nut, Choclate Chunk, Caramel Nut, Caramel, Plain, Chocolate Chunk, and Caramel Nut. Chcoolate Chunk, Chocolate Chunk are the same but having spelling mistakes so should treat them as the same observation. In addition to that units sold, Revenue (£) Cost (£), and Profit (£) which contains blanks. In order to develop association rule to get proper output need to process the data which is called data preprocessing.
-
Data Preparation:
This data mining phase focuses on the appropriate preparation of the dataset for analysis. Cleaning, organize, Transform and enhance will all be performed. During the data preparation stage, the process assures that the data are clean and validated so that the model can be trained and tested with it (Ilyas and, et. al., 2019). The method used during the data preparation stage to detect these outliers will aid in the identification and elimination of abnormalities in the provided dataset. This algorithm may be used to correct or remove the outliers in order to enhance the model's accuracy. The Confectionaries dataset had missing values in the column Units Sold, Revenue (£), Cost (£), Profit (£). To resolve this issue, the column mean value was attached to the blank location. The units sold column name was also changed based on the convenience:
- Country(UK) -> Country
- Revenue(£) -> Revenue
- Cost(£) -> Cost
- Profit(£) -> Profit
The data in the Date column is changed to the date format because both the records are the same and have the spelling mistakes. The words "Choclate Chunks" and the "Caramel nut" are changed to the "Chocolate Chunk" and the "Caramel Nut" respectively.
- Modeling:
Using machine learning models to create a prediction model is the focus of this data mining process. This is a crucial step in the data mining process because it requires choosing the right model at this point in order to give the user the desired result. The "heart" of the data mining process is another name for the modelling step. Regression or classification models might be used during the modelling phase. The classification model is trained using the labelled dataset approach, which is supervised learning. When dealing with new datasets, this model aids in the prediction of categorical variables. Numerous categorization models exist, such as Random Forest, Naïve Bayes, and Decision Tree. you can take Economics Assignment