Text analysis | Data Visualisation: Wordcloud
Why visualize Text?
• Understanding — get the “main points” of a document
• Grouping — cluster for overview or classification
• Comparison — compare document collections, or inspect evolution of collection over time
• Correlation — compare patterns in text to those inother data, e.g., correlate with social network
I remember during my uni (bachelor), we were given Qantas text mining, and creatively work on ourselves using SAS Enterprise Miner, it was a great tools, but until now, I was not aware of how and what it is actually effective to generate insights from the dataset. I do not want to repeat my same mistakes, so I wanted to document my learning journey taking the incentives to do another text analysis and what are the steps. This could be like my journal on what to improve now and in the future.
Imagine putting an analysis of 10k rows new data.. what a disaster to read and fully understand the content of (or gain the knowledge from) a document or collection of documents WITHOUT READING them.
I really tried on text parsing, separating each text and classifying them into the level of worry level. And with very limited time, I learned from other group to focus on syntactic (enhanced presentation of textual information) issue–Excentric labeling–Fluid text–Document lens. They focus on presentation of concepts and themes!!
Labeling Problem. Yes, it requires asking the right question•Where are the labels?–Labeling is difficult to do when so many entities exist–Can add to ball of string problem
More Specific Tasks
•Which documents contain text on topic XYZ?•Which documents are of interest to me?•Are there other documents that might be close enough to be worthwhile?•What are the main themes of a document?•How are certain words or themes distributed through a document?
Similarity Analysis using Vector Space Analysis
Here are some steps I learned during my uni class:
1. How does one compare the similarity of two documents? Plagiarism Analysis ?
2. Separate all open-ended questions into a single word.. Use this excel features: TEXT-TO-COLUMN
3. One model–Make list of each unique word in document
4. Throw out common words (e.g. we, I, you, a, an, the, …)
5. Make different forms the same (bake, bakes, baked)–Store count of how many times each word appeared–Alphabetize, make into a vector.
5. Count on the frequency of text
Model (continued)–Want to see how closely two vectors go in same direction, inner product
–Can get similarity of each document to every other one, using a mass-spring layout algorithm to position representations of each document.
I used this web tools which I found very useful to create wordcloud by abstracting the paragraph or text after data cleaning. To create a Word Cloud Keywords, word frequency visual analysis:
If text is hard to read, several techniques to solve this text analysis are:
- Magnifying lens
- Fish eye view
- Bifocal display
- Perspective wall
Excentric Labeling for Information Visualization
The widespread use of information visualization is hampered by the lack of effective labeling techniques. We propose…
We are investigating techniques to support selection of an item from a long linear list. This issue comes up for…
• What can visualization provide to help analysts in gathering information from text document collections?
• What can visualisation provide to assist analysts in comprehension and understanding of the knowledge mined from text document collections?
Several techniques of visualization for visualizing document collections to explore and study are
My name is Novia, I am based in Sydney, Australia, currently studying full time. I know the basics of Python, and am proficient in R, HTML/CSS, and some JS. I have few years working in Marketing and CRM, comfortable enough to user research/idea validation by creating pre-survey or talking directly to user.