ECON7930 Analyzing Spatial, Textual, and Network Data

  1. Data visualization 

    1. R basics

    2. How to lie with data visualization 

    3. The aesthetic principle of visualization

    4.  Six types of data visualization

      1. Relationship 

      2. Comparison 

      3. Distribution 

      4. Proportion 

      5. Time series 

      6. Map 

    5. Making Interactive Graph using highcharter​ and shiny

  2. Spatial data​

    1. Geographic coordinates

    2. Working with “geometry”: Shape data

      • Spatial join

      • Buffer

      • Euclidean, network and cost distances

      • Spatial interpolation

    3. Spatial statistics and site selection

    4. Working with “pixels”: Raster data

    5. Making beautiful maps

    6. Spatial regression discontinuity design​​

  3. Network data

    1. Summarization: Nodes and edges and direction

      • Types of networks (directed, undirected, binary, weighted, bipartite)

      • Nodes, ties, dyads

      • Load the network data

      • Cliques and Communities

    2. Measuring network centrality

    3. Network visualization: Fun with links, nodes, and edges

  4. Text as Data (I)

    1. Representing text

      • Vector space model of a document

      • Feature choice/representation

      • Pre-processing text: stemming and stopping

      • Pre-processing Chinese text: tokenization

      • Bag of words (and alternatives)

      • Sparseness

    2. Descriptive inference (I)

      • Word distribution/Zipf’s Law/Heap’s Law

      • Co-occurrence, collections, and phrasemes

      • Keywords in context

      • Dis(similarity) measures and testing for differences

    3. Descriptive inference (II)

      • Lexical diversity

      • Sophistication/readability/complexity

      • Linguistic style and author attribution

      • Sampling distributions for estimates

    4. Supervised techniques (I): Dictionary-based approaches​

      • Sentiment Analysis

      • Event extraction

      • Lie detection

      • Crowdsourcing

    5. Supervised techniques (II): Classification​

      • Evaluation of Cross-validation: Precision and recall

      • Naive Bayes classification 

      • Regularized regression 

      • Support vector machines

      • Ensemble Classifiers: Boosting, bagging, and ensembles via the random forest/tree model

    6. Supervised techniques (III): From classification to scaling

      • Ideological scales with “wordscores”

    7. From supervised to unsupervised (I): Dimension reduction​

      • PCA

      • K-NN models

      • Clustering (documents)

      • Latent semantic analysis/indexing

      • Parametric scaling 

      • Count models: “wordfish”

    8. Unsupervised techniques (II): Topic Models

      • LDA 

      • Evaluating and selecting models/choosing k

      • Dynamic topic model

      • Structural topic model = LDA + Metadata

    9. Word embedding: Representing text meaning ​

    10. An Introduction to the Deep Learning in Text Analysis 

 

 

 

© 2016 by TING CHEN.