ECON7930 Analyzing Spatial, Textual, and Network Data
-
Data visualization
-
R basics
-
How to lie with data visualization
-
The aesthetic principle of visualization
-
Six types of data visualization
-
Relationship
-
Comparison
-
Distribution
-
Proportion
-
Time series
-
Map
-
-
Making Interactive Graph using highcharter and shiny
-
-
Spatial data
-
Geographic coordinates
-
Working with “geometry”: Shape data
-
Spatial join
-
Buffer
-
Euclidean, network and cost distances
-
Spatial interpolation
-
-
Spatial statistics and site selection
-
Working with “pixels”: Raster data
-
Making beautiful maps
-
Spatial regression discontinuity design
-
-
Network data
-
Summarization: Nodes and edges and direction
-
Types of networks (directed, undirected, binary, weighted, bipartite)
-
Nodes, ties, dyads
-
Load the network data
-
Cliques and Communities
-
-
Measuring network centrality
-
Network visualization: Fun with links, nodes, and edges
-
-
Text as Data (I)
-
Representing text
-
Vector space model of a document
-
Feature choice/representation
-
Pre-processing text: stemming and stopping
-
Pre-processing Chinese text: tokenization
-
Bag of words (and alternatives)
-
Sparseness
-
-
Descriptive inference (I)
-
Word distribution/Zipf’s Law/Heap’s Law
-
Co-occurrence, collections, and phrasemes
-
Keywords in context
-
Dis(similarity) measures and testing for differences
-
-
Descriptive inference (II)
-
Lexical diversity
-
Sophistication/readability/complexity
-
Linguistic style and author attribution
-
Sampling distributions for estimates
-
-
Supervised techniques (I): Dictionary-based approaches
-
Sentiment Analysis
-
Event extraction
-
Lie detection
-
Crowdsourcing
-
-
Supervised techniques (II): Classification
-
Evaluation of Cross-validation: Precision and recall
-
Naive Bayes classification
-
Regularized regression
-
Support vector machines
-
Ensemble Classifiers: Boosting, bagging, and ensembles via the random forest/tree model
-
-
Supervised techniques (III): From classification to scaling
-
Ideological scales with “wordscores”
-
-
From supervised to unsupervised (I): Dimension reduction
-
PCA
-
K-NN models
-
Clustering (documents)
-
Latent semantic analysis/indexing
-
Parametric scaling
-
Count models: “wordfish”
-
-
Unsupervised techniques (II): Topic Models
-
LDA
-
How to label the topic?
-
How to choose k
-
How to visualize topic models?
-
How to evaluate topic models?
-
-
Unsupervised techniques (III): Advanced Topic Models
-
Imposing structure to the distributions
-
Dynamic topic model
-
Correlated topic model
-
Structural topic model
-
Author topic model
-
-
Increase Interpretability
-
Textual factors method
-
Embedding topic model
-
Add more priori knowledge: Keyword Assisted Topic Models
-
Remove priori knowledge: Top2vec
-
-
Increase performance on short text
-
Bi-term topic model
-
-
-
Word embedding: Representing text meaning
-
An Introduction to the Deep Learning in Text Analysis
-