Importing data from disparate sources, such as web archives, databases, and spreadsheets Cleaning the data by removing outliers, and noise, and combining data sets Developing an accurate predictive ...