Index blog topics with rvest and fuzzywuzzyR package

  • Scrape blog titles and corresponding URLs by rvest.
  • Fuzzy search topics by fuzzywuzzyR.

To get the node paths of the titles, right-click one title, then inspect. In the Elements tab, right-click - copy - Copy selector.

library(rvest)
library(purrr)
library(knitr)
th <- read_html("https://takehomessage.com")
th %>% html_nodes("#base-list-layout > div > div:nth-child(1) > div.ui-post-title > a")
## {xml_nodeset (1)}
## [1] <a href="/2019/12/13/rvest-index-blog-topics/" class="nav-link">Index blo ...

Let’s scrape the useful contents.

IndexBlog <- function(blog, pages){
    topics <- set_names(seq(pages)) %>%
        map_dfr(function(p){
            url <- ifelse(p == 1, blog,
                          paste0(blog, "/page/", p))
            th <- read_html(url) %>%
                html_nodes("#base-list-layout div.ui-post-title > a")
            urls <- th %>% html_attr("href")
            titles <- th %>% html_text()
            data.frame(titles = titles,
                       urls = paste0("[", titles, "](", urls, ")"),
                       stringsAsFactors = FALSE)
        })
    return(topics)
}
topics <- IndexBlog("https://takehomessage.com", 4)
kable(head(topics[,"urls",drop=F]))
urls
Index blog topics with rvest and fuzzywuzzyR package
TidyTuesday: Highcharter for time series data
Have some fun with JavaScript emoji library using the V8 R package.
How to read number column with comma as thousands separator in CSV file?
Keeping structure when copy folder/file in Linux.
How to include a htmlWidget to vuepress markdown?

A howto function to search the topics.

library(fuzzywuzzyR)
howto <- function(keywords, blog = "https://takehomessage.com",
                  pages = 4, ...){
    topics <- IndexBlog(blog, pages)
    ms <- GetCloseMatches(string = keywords,
                          sequence_strings = topics$titles,
                          ...)
    topics[match(ms, topics$titles), "urls", drop=F]
}
kable(howto("combine ggplot2", cutoff = 0.1, n = 2))
  urls
11 How to combine multiple ggplots?
10 How to share a legend for combined ggplot2 plots?