NAME

Question types


Start with


Question limit

of 37 available terms

Print test

37 Multiple choice questions

  1. In association rules the conditional probability of finding the RHS of the rule present in a list of transactions where the LHS of the rule already exists
  2. Variables that can be measured on interval scale
  3. An alternative process for data mining projects proposed by the SAS Institute.
  4. Knowledge Discovery in Databases
  5. The most commonly used algorithm to discover association rules by recursively identifying frequent item sets
  6. A graphical presentation of a sequence of interrelated decisions to be made under assumed risk
  7. Data that represents the labels of multiple classes used to divide a variable and a specific groups
  8. A popular, free of charge, open source suite of machine learning software written in Java
  9. A process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases.
  10. A type of data that contains measurements of simple codes assigned to objects as labels which are not measurements for example marital status can be generally categorized as single married and divorced
  11. Data that contains codes assigned to objects or events as labels that also represent the rank order among them. for example the variable credit score can be generally categorized as low medium and high
  12. A graphical assessment technique for binary classification models where true positive rate is plotted on the y-axis and the false positive rate is plotted on the x-axis
  13. Data is partitioned into two mutually exclusive subsets called a training set and a test set or hold outset. It is common to designate two thirds of the data as the training set and the remaining one third of the test set.
  14. Sample, Explore, Modify, Model and Assess
  15. Left-hand side, antecedent
  16. A metric that measures the extent of uncertainty or randomness in a data set
  17. A pattern discovery method or relationship among the things are examined in terms of their order of occurrence to identify associations over time
  18. A machine learning process that performs rule induction or a related procedure to establish knowledge from large databases
  19. ratio of the confidence of the rule and the expected confidence of the rule
  20. Right-hand side, precedent
  21. A popular accuracy assessment techniques for prediction models were the complete data set is randomly split into k mutually exclusive subsets of approximately equal size the classification model is trained and tested k time
  22. The linkage among many objects of interest is discovered automatically such as the link between web pages
  23. A cross industry standardized process of conducting data mining projects which is a sequence of six steps that starts with a good understanding of the business and the need for the data mining project and ends with the deployment of the solution that satisfies the business needs
  24. Commonly co-occurring groupings of things. AKA market-basket analysis.
  25. A popular open source, free of charge data mining software suite that employs a graphically enhanced user interface, a rather large number of algorithms, and a variety of data visualization features
  26. Worldwide source for access to Microsoft's SQL Server 2012 software suite
  27. Continuous data were both differences and ratios are interpretable. the distinguishing feature of a ratio scale is the possession of a non arbitrary zero value
  28. Partitioning a database into segments in which the members of a segment share similar qualities
  29. A type of data that represents the numeric values of specific variables. for example age number of children etc
  30. The method used to calculate the closeness between pairs of items in most cluster analysis methods
  31. A data mining method for real world prediction problems where the predicted values are numeric. For example predicting the temperature for tomorrow is 68 degrees
  32. A sampling technique or a fixed number of instances from the original data is sampled for training and the rest of the data set is used for testing
  33. The measure of how often products or services appear together in the same transaction. The proportion of transactions in the dataset that contain all of the products and/or services mentioned in a specific role.
  34. The act of telling about the future
  35. A metric that is used in economics to measure the diversity of a population
  36. The splitting mechanism used in id3
  37. Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior