Why visualize Text?

• Understanding — get the “main points” of a document
• Grouping — cluster for overview or classification
• Comparison — compare document collections, or inspect evolution of collection over time
• Correlation — compare patterns in text to those inother data, e.g., correlate with social network

I remember during my uni (bachelor), we were given Qantas text mining, and creatively work on ourselves using SAS Enterprise Miner, it was a great tools, but until now, I was not aware of how and what it is actually effective to generate insights from the dataset. I do not want to repeat my same mistakes, so I wanted to document my learning journey taking the incentives to do another text analysis and what are the steps. This could be like my journal on what to improve now and in the future.

Imagine putting an analysis of 10k rows new data.. what a disaster to read and fully understand the content of (or gain the knowledge from) a document or collection of documents WITHOUT READING them.

I really tried on text parsing, separating each text and classifying them into the level of worry level. And with very limited time, I learned from other group to focus on syntactic (enhanced presentation of textual information) issue–Excentric labeling–Fluid text–Document lens. They focus on presentation of concepts and themes!!

Labeling Problem. Yes, it requires asking the right question•Where are the labels?–Labeling is difficult to do when so many entities exist–Can add to ball of string problem

More Specific Tasks
•Which documents contain text on topic XYZ?•Which documents are of interest to me?•Are there other documents that might be close enough to be worthwhile?•What are the main themes of a document?•How are certain words or themes distributed through a document?

Here are some steps I learned during my uni class:
1. How does one compare the similarity of two documents? Plagiarism Analysis ?

2. Separate all open-ended questions into a single word.. Use this excel features: TEXT-TO-COLUMN

3. One model–Make list of each unique word in document
4. Throw out common words (e.g. we, I, you, a, an, the, …)
5. Make different forms the same (bake, bakes, baked)–Store count of how many times each word appeared–Alphabetize, make into a vector.
5. Count on the frequency of text

Model (continued)–Want to see how closely two vectors go in same direction, inner product
–Can get similarity of each document to every other one, using a mass-spring layout algorithm to position representations of each document.



Spring Layout

Data visualisation


If text is hard to read, several techniques to solve this text analysis are:

  • Magnifying lens
  • Fish eye view
  • Bifocal display
  • Perspective wall


Big Questions
• What can visualization provide to help analysts in gathering information from text document collections?

• What can visualisation provide to assist analysts in comprehension and understanding of the knowledge mined from text document collections?

Several techniques of visualization for visualizing document collections to explore and study are

My name is Novia, I am based in Sydney, Australia, currently studying full time. I know the basics of Python, and am proficient in R, HTML/CSS, and some JS. I have few years working in Marketing and CRM, comfortable enough to user research/idea validation by creating pre-survey or talking directly to user.

My strengths lie in UI/UX research and front end. I am currently working with Python, SQL, javascript, html, css, and Figma. I am interested in artificial intelligence, reinforcement learning using machine learning, however, open to build solution in any topics, look forward to expand my knowledge by joining this hackathon.

You can find more about me and my current projects that I am currently researching:

Curiosity to Data Analytics & Career Journey | Educate and inform myself and others about #LEARNINGTOLEARN and technology automation