Question types

Start with

Question limit

of 37 available terms

Print test

36 Matching questions

  1. K fold cross validation
  2. Ordinal data
  4. Classification
  5. Confidence
  6. Numeric data
  7. RapidMiner
  8. Microsoft Enterprise consortium
  9. A priori algorithm
  10. SEMMA
  11. Distance measure
  12. KDD
  13. Sequence mining
  14. Prediction
  15. Knowledge discovery in databases
  16. Area under the ROC curve
  17. Categorical data
  18. Lift
  19. Simple split
  20. Information gain
  21. RHS
  22. Link analysis
  23. Decision tree
  24. Weka
  25. Associations
  26. Bootstrapping
  27. Support
  28. Nominal data
  29. Ratio data
  30. Entropy
  31. Regression
  32. Gini index
  33. Clustering
  34. Data mining
  35. LHS
  36. Interval data
  1. a The most commonly used algorithm to discover association rules by recursively identifying frequent item sets
  2. b Left-hand side, antecedent
  3. c A graphical presentation of a sequence of interrelated decisions to be made under assumed risk
  4. d A sampling technique or a fixed number of instances from the original data is sampled for training and the rest of the data set is used for testing
  5. e The splitting mechanism used in id3
  6. f Data is partitioned into two mutually exclusive subsets called a training set and a test set or hold outset. It is common to designate two thirds of the data as the training set and the remaining one third of the test set.
  7. g The act of telling about the future
  8. h Knowledge Discovery in Databases
  9. i A popular open source, free of charge data mining software suite that employs a graphically enhanced user interface, a rather large number of algorithms, and a variety of data visualization features
  10. j A process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases.
  11. k Data that represents the labels of multiple classes used to divide a variable and a specific groups
  12. l Sample, Explore, Modify, Model and Assess
  13. m A type of data that contains measurements of simple codes assigned to objects as labels which are not measurements for example marital status can be generally categorized as single married and divorced
  14. n The linkage among many objects of interest is discovered automatically such as the link between web pages
  15. o A cross industry standardized process of conducting data mining projects which is a sequence of six steps that starts with a good understanding of the business and the need for the data mining project and ends with the deployment of the solution that satisfies the business needs
  16. p A graphical assessment technique for binary classification models where true positive rate is plotted on the y-axis and the false positive rate is plotted on the x-axis
  17. q A machine learning process that performs rule induction or a related procedure to establish knowledge from large databases
  18. r Worldwide source for access to Microsoft's SQL Server 2012 software suite
  19. s A popular, free of charge, open source suite of machine learning software written in Java
  20. t Partitioning a database into segments in which the members of a segment share similar qualities
  21. u In association rules the conditional probability of finding the RHS of the rule present in a list of transactions where the LHS of the rule already exists
  22. v A pattern discovery method or relationship among the things are examined in terms of their order of occurrence to identify associations over time
  23. w Data that contains codes assigned to objects or events as labels that also represent the rank order among them. for example the variable credit score can be generally categorized as low medium and high
  24. x A metric that measures the extent of uncertainty or randomness in a data set
  25. y A type of data that represents the numeric values of specific variables. for example age number of children etc
  26. z A metric that is used in economics to measure the diversity of a population
  27. aa A data mining method for real world prediction problems where the predicted values are numeric. For example predicting the temperature for tomorrow is 68 degrees
  28. ab The measure of how often products or services appear together in the same transaction. The proportion of transactions in the dataset that contain all of the products and/or services mentioned in a specific role.
  29. ac A popular accuracy assessment techniques for prediction models were the complete data set is randomly split into k mutually exclusive subsets of approximately equal size the classification model is trained and tested k time
  30. ad ratio of the confidence of the rule and the expected confidence of the rule
  31. ae Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior
  32. af Right-hand side, precedent
  33. ag Commonly co-occurring groupings of things. AKA market-basket analysis.
  34. ah The method used to calculate the closeness between pairs of items in most cluster analysis methods
  35. ai Variables that can be measured on interval scale
  36. aj Continuous data were both differences and ratios are interpretable. the distinguishing feature of a ratio scale is the possession of a non arbitrary zero value