ECON7930 Analyzing Spatial, Textual, and Network Data
Data visualization
R basics
How to lie with data visualization
The aesthetic principle of visualization
Six types of data visualization
Time series
Making Interactive Graph using highcharter and shiny
Spatial data
Geographic coordinates
Working with “geometry”: Shape data
Spatial join
Euclidean, network and cost distances
Spatial interpolation
Spatial statistics and site selection
Working with “pixels”: Raster data
Making beautiful maps
Spatial regression discontinuity design
Network data
Summarization: Nodes and edges and direction
Types of networks (directed, undirected, binary, weighted, bipartite)
Nodes, ties, dyads
Load the network data
Cliques and Communities
Measuring network centrality
Network visualization: Fun with links, nodes, and edges
Text as Data (I)
Representing text
Vector space model of a document
Feature choice/representation
Pre-processing text: stemming and stopping
Pre-processing Chinese text: tokenization
Bag of words (and alternatives)
Descriptive inference (I)
Word distribution/Zipf’s Law/Heap’s Law
Co-occurrence, collections, and phrasemes
Keywords in context
Dis(similarity) measures and testing for differences
Descriptive inference (II)
Lexical diversity
Linguistic style and author attribution
Sampling distributions for estimates
Supervised techniques (I): Dictionary-based approaches
Sentiment Analysis
Event extraction
Lie detection
Supervised techniques (II): Classification
Evaluation of Cross-validation: Precision and recall
Naive Bayes classification
Regularized regression
Support vector machines
Ensemble Classifiers: Boosting, bagging, and ensembles via the random forest/tree model
Supervised techniques (III): From classification to scaling
Ideological scales with “wordscores”
From supervised to unsupervised (I): Dimension reduction
K-NN models
Clustering (documents)
Latent semantic analysis/indexing
Parametric scaling
Count models: “wordfish”
Unsupervised techniques (II): Topic Models
How to label the topic?
How to choose k
How to visualize topic models?
How to evaluate topic models?
Unsupervised techniques (III): Advanced Topic Models
Imposing structure to the distributions
Dynamic topic model
Correlated topic model
Structural topic model
Author topic model
Increase Interpretability
Textual factors method
Embedding topic model
Add more priori knowledge: Keyword Assisted Topic Models
Remove priori knowledge: Top2vec
Increase performance on short text
Bi-term topic model
Word embedding: Representing text meaning
An Introduction to the Deep Learning in Text Analysis