Text analysis | Data Visualisation: Wordcloud

Novia Pratiwi - est.2021

4 min readAug 30, 2020

Why visualize Text?

• Understanding — get the “main points” of a document
• Grouping — cluster for overview or classification
• Comparison — compare document collections, or inspect evolution of collection over time
• Correlation — compare patterns in text to those inother data, e.g., correlate with social network

I remember during my uni (bachelor), we were given Qantas text mining, and creatively work on ourselves using SAS Enterprise Miner, it was a great tools, but until now, I was not aware of how and what it is actually effective to generate insights from the dataset. I do not want to repeat my same mistakes, so I wanted to document my learning journey taking the incentives to do another text analysis and what are the steps. This could be like my journal on what to improve now and in the future.

Imagine putting an analysis of 10k rows new data.. what a disaster to read and fully understand the content of (or gain the knowledge from) a document or collection of documents WITHOUT READING them.

I really tried on text parsing, separating each text and classifying them into the level of worry level. And with very limited time, I learned from other group to focus on syntactic (enhanced presentation of textual information) issue–Excentric labeling–Fluid text–Document lens. They focus on presentation of concepts and themes!!

Labeling Problem. Yes, it requires asking the right question•Where are the labels?–Labeling is difficult to do when so many entities exist–Can add to ball of string problem

More Specific Tasks
•Which documents contain text on topic XYZ?•Which documents are of interest to me?•Are there other documents that might be close enough to be worthwhile?•What are the main themes of a document?•How are certain words or themes distributed through a document?

Semantic Analysis

Similarity Analysis using Vector Space Analysis

Here are some steps I learned during my uni class:
1. How does one compare the similarity of two documents? Plagiarism Analysis ?

2. Separate all open-ended questions into a single word.. Use this excel features: TEXT-TO-COLUMN

3. One model–Make list of each unique word in document
4. Throw out common words (e.g. we, I, you, a, an, the, …)
5. Make different forms the same (bake, bakes, baked)–Store count of how many times each word appeared–Alphabetize, make into a vector.
5. Count on the frequency of text

Model (continued)–Want to see how closely two vectors go in same direction, inner product
–Can get similarity of each document to every other one, using a mass-spring layout algorithm to position representations of each document.

I used this web tools which I found very useful to create wordcloud by abstracting the paragraph or text after data cleaning. To create a Word Cloud Keywords, word frequency visual analysis:

Research Tools:

https://monkeylearn.com/blog/tableau-sentiment-analysis/
https://www.flerlagetwins.com/2019/09/text-analysis.html
https://github.com/flerlagekr/Text-Analysis/blob/master/Text.py

https://www.canva.com/infographics/templates/

Spring Layout

Data visualisation

https://tagcrowd.com/
https://www.online-utility.org/text/analyzer.jsp
https://www.textfixer.com/tools/online-word-counter.php#newText2
https://www.browserling.com/tools/word-frequency
https://worditout.com/word-cloud/create
https://voyant-tools.org/?corpus=d553eb84d452b330d5c5bf847343f995
https://seoscout.com/tools/keyword-analyzer

If text is hard to read, several techniques to solve this text analysis are:

Magnifying lens
Fish eye view
Bifocal display
Perspective wall

Excentric Labeling for Information Visualization

The widespread use of information visualization is hampered by the lack of effective labeling techniques. We propose…

www.cs.umd.edu

Fisheye Menus

We are investigating techniques to support selection of an item from a long linear list. This issue comes up for…

www.cs.umd.edu

https://treevis.net/

Big Questions
• What can visualization provide to help analysts in gathering information from text document collections?
• What can visualisation provide to assist analysts in comprehension and understanding of the knowledge mined from text document collections?

Several techniques of visualization for visualizing document collections to explore and study are
–Galaxies–Themescapes–ThemeRiver

My name is Novia, I am based in Sydney, Australia, currently studying full time. I know the basics of Python, and am proficient in R, HTML/CSS, and some JS. I have few years working in Marketing and CRM, comfortable enough to user research/idea validation by creating pre-survey or talking directly to user.
My strengths lie in UI/UX research and front end. I am currently working with Python, SQL, javascript, html, css, and Figma. I am interested in artificial intelligence, reinforcement learning using machine learning, however, open to build solution in any topics, look forward to expand my knowledge by joining this hackathon.

You can find more about me and my current projects that I am currently researching:
https://www.linkedin.com/in/noviapratiwi/
https://github.com/noviaayup/Projects

Text analysis | Data Visualisation: Wordcloud

Why visualize Text?

Semantic Analysis

Similarity Analysis using Vector Space Analysis

I used this web tools which I found very useful to create wordcloud by abstracting the paragraph or text after data cleaning. To create a Word Cloud Keywords, word frequency visual analysis:

Research Tools:

Excentric Labeling for Information Visualization

The widespread use of information visualization is hampered by the lack of effective labeling techniques. We propose…

Fisheye Menus

We are investigating techniques to support selection of an item from a long linear list. This issue comes up for…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Novia Pratiwi - est.2021

No responses yet