THEORITICAL BACKGROUNDKnowledge discovery in databases (KDD) is the interactive and iterative process of discovering useful knowledge from a collection of data. KDD incorporates multidisciplinary exercise. The means engaged with KDD process are listed below-• Selection – Data applicable to the analysis task are retrieved from the database• Pre-processing – In this step noise and inconsistent data are removed from large data set. Data cleaning is a fundamental step to solve inconsistency problem and cleanup errors in crude data.• Transformation – The strategies like smoothing, aggregation, normalization to transform them into forms appropriate for mining.• Data mining – Intelligent strategies are applied in order to extract data patterns.• Interpretation/ Evaluation – Data patterns are evaluated and visualized and removing redundant patterns from the patterns we generated. Data mining is the core part of the knowledge discovery process of sorting through large data sets to discover correlation among attributes. There are few noteworthy data mining techniques have been produced and used in data mining projects as outlined in underneath figure. Fig. Taxonomy of Data Mining Methods Description methods concentrate on understanding the way the underlying data operates while prediction-oriented methods aim to build a behavioral model for acquiring new and unseen samples and for foreseeing estimations of at least one variables related to the sample. These strategies falls into two categories in particular supervised and unsupervised learning. In supervised learning a function is inferred from training data while in unsupervised learning, find hidden structured data in unlabeled data. Data mining techniques used for Heart DiseaseThere are two forms of data analysis algorithms introduced in data mining as classification and prediction. 1. classificationClassification is a supervised technique which assigns items in the collection to target category or classes. Mainly two classes are present- binary and muti-class.The classification task takes as input the component vector X and predicts its value for the outcome Y i.e.C(X) ? Ywhere:X is a feature vectorY is a response taking values in the set CC( X) are the values in the set C. It is one of a few strategies utilized for the analysis of substantial datasets adequately. A classification task begins with a data set in which the class assignments are known. In the training phase, a classification algorithm discovers relationships between the values of the predictors and the values of the target. Diverse classification algorithms utilize distinctive techniques for discovering relationships. These relationships are summarized in a model, which would then be able to applied to a alternate data set in which the class assignments are unknown for testing purpose. 2. PredictionRegression is adapted to foresee the scope of numeric or continuous values given a paticular dataset. Following equation demonstrate that regression is the way toward estimating the value of a continuous target (p) as a function (F) of one or more predictors (x1 , x2 , …, xn), a set of parameters (R1 , R2 , …, Rn), and a measure of error (e).Regression helps in distinguishing the behavior of a variable when other variable(s) are changed in the process. 3. ClusteringIt is unsupervised learning technique in which specific arrangement of unlabelled occurrences are gathered in view of their characteristics. By representing the data with fewer clusters loses certain fine details, but achieves simplification. Cluster analysis expects to discover the groups with the end goal that the inter-cluster similarity is low and the intra-group similitude is high. There are few distinctive methodologies of clustering: partitioning, hierarchical, density-based, grid-based and constrained-based methods. Algorithms used for diesease prediction1. Decision tree agorithm(J48)It is a supervised learning algorithm used to predict class / value of target variable using decision rules. Each internal node of the tree corresponds to an attribute, and each leaf node corresponds to a class label. The record’s attribute values are continuously compared with other internal nodes of the tree until leaf node is reached with predicted class value. Decision Trees follow Sum of Product (SOP) representation for all the classes. Decision trees can handle both categorical and numerical data. Attribute selection is based on information gain and gini index. To measure randomness/ unertainty of random variabe entropy is utilized. Decision tree has an issue of overfitting, it happens when the algorithm continues to go deeper and deeper to reduce the training set error but results with an expanded test set error.J48 is an extension of ID3. Different highlights of j48 algorithm are that it supports tree pruning, can handle missing values and furthermore gives out efficient yield for prediction analysis in weka11. 2. Naive Bayes The Naive Bayes Classifier technique is based on the Bayesian hypothesis and is particularly suited when the dimensionality of the inputs is high. Bayes Theorem – It works on conditional probability. Conditional probability is the probability that an occasion will happen, given that other occasion has just happened.where P(H) is the likelihood of hypothesis H being valid. This is known as the prior probability.P(E) is the likelihood of the evidence(regardless of the hypothesis).P(E|H) is the likelihood of the evidence given that hypothesis is true.P(H|E) is the likelihood of the hypothesis given that the evidence is there. Naive Bayes Classifier – It predicts membership probabilities for each class, for example, the likelihood that given record or information point has a place with a specific class. The class with the highest probability is considered as the most likely class. It is a highly fast and scalable algorithm. 3. Support Vector Machine (SVM) A Support Vector Machine (SVM) is a supervised learning classifier characterized by by a separating hyperplane. The hyperplane is a line that partitions a plane in two sections where each class lay in either side.There are 2 sorts of SVM classifiers:1. Linear SVM Classifier2. Non-Linear SVM ClassifierSVMs are powerful when the quantity of highlights are very vast. Since the SVM algorithm works locally on numeric properties, it utilizes a z-score standardization on numeric characteristics.