ECON7930 Analyzing Spatial, Textual, and Network Data

  1. Data visualization 

    1. R basics

    2. How to lie with data visualization 

    3. The aesthetic principle of visualization

    4.  Six types of data visualization

      1. Relationship 

      2. Comparison 

      3. Distribution 

      4. Proportion 

      5. Time series 

      6. Map 

    5. Making Interactive Graph using highcharter​ and shiny

  2. Spatial data​

    1. Geographic coordinates

    2. Working with “geometry”: Shape data

      • Spatial join

      • Buffer

      • Euclidean, network and cost distances

      • Spatial interpolation

    3. Spatial statistics and site selection

    4. Working with “pixels”: Raster data

    5. Making beautiful maps

    6. Spatial regression discontinuity design​​

  3. Network data

    1. Summarization: Nodes and edges and direction

      • Types of networks (directed, undirected, binary, weighted, bipartite)

      • Nodes, ties, dyads

      • Load the network data

      • Cliques and Communities

    2. Measuring network centrality

    3. Network visualization: Fun with links, nodes, and edges

  4. Text as Data (I)

    1. Representing text

      • Vector space model of a document

      • Feature choice/representation

      • Pre-processing text: stemming and stopping

      • Pre-processing Chinese text: tokenization

      • Bag of words (and alternatives)

      • Sparseness

    2. Descriptive inference (I)

      • Word distribution/Zipf’s Law/Heap’s Law

      • Co-occurrence, collections, and phrasemes

      • Keywords in context

      • Dis(similarity) measures and testing for differences

    3. Descriptive inference (II)

      • Lexical diversity

      • Sophistication/readability/complexity

      • Linguistic style and author attribution

      • Sampling distributions for estimates

    4. Supervised techniques (I): Dictionary-based approaches​

      • Sentiment Analysis

      • Event extraction

      • Lie detection

      • Crowdsourcing

    5. Supervised techniques (II): Classification​

      • Evaluation of Cross-validation: Precision and recall

      • Naive Bayes classification 

      • Regularized regression 

      • Support vector machines

      • Ensemble Classifiers: Boosting, bagging, and ensembles via the random forest/tree model

    6. Supervised techniques (III): From classification to scaling

      • Ideological scales with “wordscores”

    7. From supervised to unsupervised (I): Dimension reduction​

      • PCA

      • K-NN models

      • Clustering (documents)

      • Latent semantic analysis/indexing

      • Parametric scaling 

      • Count models: “wordfish”

    8. Unsupervised techniques (II): Topic Models

      • LDA 

      • How to label the topic?

      • How to choose k

      • How to visualize topic models?

      • How to evaluate topic models?

    9. Unsupervised techniques (III): Advanced Topic Models

      • Imposing structure to the distributions

        • Dynamic topic model

        • Correlated topic model

        • Structural topic model

        • Author topic model

      • Increase Interpretability

        • Textual factors method

        • Embedding topic model

        • Add more priori knowledge: Keyword Assisted Topic Models

        • Remove priori knowledge: Top2vec

      • Increase performance on short text​

        • Bi-term topic model​

    10. Word embedding: Representing text meaning ​

    11. An Introduction to the Deep Learning in Text Analysis