The role of marketing for startups: A literature review based on data mining keywords

Aiming to understand the role of marketing for startups, a study through the data mining of 236 scientific articles has been made to detect certain keywords that potentially would give a better understanding of this relationship.

Technique: Data cleaning, data aggregation, data mining, data vizualization. 

Tools: Excel and RStudio.

April 17, 2019

(unavailable since I am not sure if I can share the 236 scientific articles in GitHub)


  1. What is the role of marketing for start-ups?
  2. How feasible is gathering relevant information out of data mining the keywords in scientific articles for writing a literature review?

The data-driven answers

  1. By writing the literature based on mining the scientific articles, a framework (see Framework below) has been created that highlights the role that marketing in startups should have. Startups should focus on market orientation, in order to seek a sustainable competitive advantage by creating value for customers. The literature highlights the importance of having an innovative orientation, strong marketing relationships, proactive with a good market reaction and recently taking advantage of digital tools.

  2. The feasibility of the use of data mining as a tool for the analysis of scientific articles has been demonstrated when achieving the aim of transforming information into knowledge. We found that the terms “competitive advantage”, “market orientation” and “value creation” where the most present keywords in the analysed scientific articles. This allowing us to write a relevant literature review.

Biggest challenges

  • Gather 236 scientific articles.
  • In one step of the process a qualitative analysis (hence, more subjective to the author’s preferences and beliefs) had been necessary.

Future research

This research can be considered for startups in general, however, future researchers should focus their resources on researching different types of startups in different types of industries and contexts.

The mining of data could potentially be used for forecasting by using a linear regression model taking into account the variables and its number of mentions in scientific articles and the demand of a product or service.

Highlighted code

					df<-data.frame(doc_id=1:236,text=myextract$text, stringsAsFactors = F)
corpus<- VCorpus(DataframeSource(df)) 
Tokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
dtm <- TermDocumentMatrix(corpus, control = list(tokenize = Tokenizer))
BigramDF = t(as.matrix( dtm )))
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 25)
wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.36, colors=brewer.pal(8, "Dark2"))
write.table(d, file="DM2TS.csv", sep=",")