Hi friends, let’s discuss the important concept of Data Mining and the four common tasks of data mining: Data Clustering, Data classification, regression and association rule learning.
This is an important topic to learn and adopt as a career option these days. Lots of people are trying their luck in this field by mastering the data analysis skills. It is a growing field and by 2020 the vacancy graph for professional data analyst, business analyst and data scientist will be at par.
Hope you guys have checked my previous post on Malicious Programs/Malwares for answering the questions related to this section.
Future Scope of Data Analyst:
Apart from future perspective, data mining is an important topic considering the various government exam vacancies for computer science professionals. So, I have tried to collect every important part of topic data mining in this blog.
If you guys want any other topic to be covered, please let me know by adding a comment. Now let’s first understand what Data mining and data analysis is.
What is Data Mining and the use of Data Mining?
Data mining is the process of extracting patterns from data. It is an important tool used by modern business to drive information from data. Data mining is currently used in marketing, profiling, fraud detection and scientific discovery etc.
Tasks of Data Mining:
- Data Clustering: It is the task of discovering groups and structures in the data that are similar in some way. Data clustering is performed without using known structures in data.
- Data Classification: Data classification is the task of generalizing known structures to apply to new data. Common algorithms related to data classifications are: 1.1. Decision tree learning
1.2. Nearest neighbor
1.3. Naïve Bayesian classification
1.4. Neural networks
1.5. Support Vector Machines
3. Regression: With Regression we attempt to find a function which models the data with the least error. There are different strategies related to regression models.
4. Association Rule learning: This learning is used to search for relationships between variables. I would like to share a big example of association rule learning:
With the help of association rule learning, Amazon displays the items frequently bought together to show as a recommendation. Thus helps the customers and increase its sales.
Approaches to Data Mining Problems:
- Discovery of sequential patterns
- Analysis of patterns in time series
- Discovering of classification rules
- Neural Networks
- Generic Algorithms
- Clustering and Segmentation
Goals of Data Mining and Knowledge Discovery:
- Prediction: Data mining can show how certain attributes within the data will behave in future.
- Identification: Data mining can be used to identify the existence of an item
- Classification: Data mining can partition the data so that different classes or categories can be identified
- Optimization: Data mining can be used to optimize the use of limited resources such as time, space, money or materials to optimize the output
What is OLTP (Online Transaction Processing)?
In order to understand OLTP, it is very important to be aware about Transaction and transaction system. So, what is a transaction? What are the properties of transaction system? Let’s analyze the theory of transactions and then we will cover OLTP.
Transaction and Transaction System:
A transaction is nothing but an interaction between different users or different systems or between a user and a system.
Transaction systems: Every organization needs some on-line application system to handle their day to day activities. Some examples of transaction systems are: Salary Processing Library, banking, airline etc.
Every transaction follows the ACID property. Learn it like this. This is an important section and government exams choose multiple questions from this section.
Atomicity: This means a transaction should either completely succeeded or completely fail.
Consistency: Transaction must preserve the database stability. A transaction must transform the database from one consistent state to another
Isolation: This simply means transaction of one user should not interfere with the transactions of some other user in the database.
Durability: Once a transaction is complete means committed, it should be permanently written to the database. This change should be available to all the transactions followed by it.
I hope the ACID properties are clear to you guys. Please let me know if you need more information on this with examples.
Ever wondered how multiple transaction of different users can be processed simultaneously?? If yes check the below magic:
Concurrency: Currency allows two different independent processes to run simultaneously and thus creates parallelism. This is the thing that utilizes the use of fast processing time of computers.
Learn SQL Basics to get into Data Analytics
What is Deadlock?
Deadlock is a situation where one transaction is waiting for another transaction to release the resource it needs and vice versa. Therefore, it becomes the case of wait and bound situation and the system halts. Each transaction will be waiting forever for the other to release the resource.
How to prevent Deadlock?
The simple rule to avoid deadlock is if deadlock occurs, one of the participating transaction must be rolled back to allow the other to proceed. So, this way transactions can be performed. There are different kinds of schemes available to decide which transaction should be rolled back.
This decision depends on multiple factors given as following:
- The run time of the transaction.
- Data already updated by the transaction
- Data remaining to be updated by the transaction system
I have tried to cover this section completely friends. Learn these concepts about data science and you will be able to solve each and every question that is related to the data mining section.
In order to master this section, please check my next post of the Previous Year questions of Data Mining section.
Not sure about Computer Networking concepts? Need to score good marks in Computer network section? If yes, do read my next post on Why Python is the best language for Data Science and Machine Learning. Till then, C yaa friends 🙂