ECON7930 Analyzing Spatial, Textual, and Network Data
-
Data visualization
-
R basics
-
How to lie with data visualization
-
The aesthetic principle of visualization
-
Six types of data visualization
-
Relationship
-
Comparison
-
Distribution
-
Proportion
-
Time series
-
Map
-
-
Making Interactive Graph using highcharter and shiny
-
-
Spatial data
-
Geographic coordinates
-
Working with “geometry”: Shape data
-
Spatial join
-
Buffer
-
Euclidean, network and cost distances
-
Spatial interpolation
-
-
Spatial statistics and site selection
-
Working with “pixels”: Raster data
-
Making beautiful maps
-
Spatial regression discontinuity design
-
-
Network data
-
Summarization: Nodes and edges and direction
-
Types of networks (directed, undirected, binary, weighted, bipartite)
-
Nodes, ties, dyads
-
Load the network data
-
Cliques and Communities
-
-
Measuring network centrality
-
Network visualization: Fun with links, nodes, and edges
-
-
Text as Data (I)
-
Representing text
-
Vector space model of a document
-
Feature choice/representation
-
Pre-processing text: stemming and stopping
-
Pre-processing Chinese text: tokenization
-
Bag of words (and alternatives)
-
Sparseness
-
-
Descriptive inference (I)
-
Word distribution/Zipf’s Law/Heap’s Law
-
Co-occurrence, collections, and phrasemes
-
Keywords in context
-
Dis(similarity) measures and testing for differences
-
-
Descriptive inference (II)
-
Lexical diversity
-
Sophistication/readability/complexity
-
Linguistic style and author attribution
-
Sampling distributions for estimates
-
-
Supervised techniques (I): Dictionary-based approaches
-
Sentiment Analysis
-
Event extraction
-
Lie detection
-
Crowdsourcing
-
-
Supervised techniques (II): Classification
-
Evaluation of Cross-validation: Precision and recall
-
Naive Bayes classification
-
Regularized regression
-
Support vector machines
-
Ensemble Classifiers: Boosting, bagging, and ensembles via the random forest/tree model
-
-
Supervised techniques (III): From classification to scaling
-
Ideological scales with “wordscores”
-
-
From supervised to unsupervised (I): Dimension reduction
-
PCA
-
K-NN models
-
Clustering (documents)
-
Latent semantic analysis/indexing
-
Parametric scaling
-
Count models: “wordfish”
-
-
Unsupervised techniques (II): Topic Models
-
LDA
-
Evaluating and selecting models/choosing k
-
Dynamic topic model
-
Structural topic model = LDA + Metadata
-
-
Word embedding: Representing text meaning
-
An Introduction to the Deep Learning in Text Analysis
-