![]() Anyone with a Mac OS X or Linux system have all that it takes at their finger tips. They can be connected together to accomplish really neat tasks without the need for more complicated code. This goes to show just how flexible the standard Unix tools are. Notice that the data are riddled with HTML tags because they were scraped directly from the web site.Ī few interesting features stand out: We have the latitude and longitude of where the victim was found then there’s the street address the age, race, and gender of the victim the date on which the victim was found in which hospital the victim ultimately died the cause of death.#!/usr/bin/env bash if thenĮcho "Expected a file at $1, but it doesn't exist." >& 2Īnd now the same can be accomplished by running emails.sh EMAIL_SAMPLES.TXT. So when we read the data in with readLines(), each element of the character vector represents one homicide event. The data set is formatted so that each homicide is presented on a single line of text. > homicides > # Total number of events recorded > length(homicides) 1571 > homicides "39.311024, -76.674227, iconHomicideShooting, 'p2', 'Leon Nelson3400 Clifton Ave.Baltimore, MD 21216black male, 17 years oldFound on January 1, 2007Victim died at Shock TraumaCause: shooting'" > homicides "39.33626300000, -76.55553990000, icon_homicide_shooting, 'p1200', 'Davon Diggs4100 Parkwood AveBaltimore, MD 21206Race: BlackGender: maleAge: 21 years oldFound on November 5, 2011Victim died at Johns Hopkins Bayview Medical Center Cause: ShootingOriginally reported in 5000 Belair Road later determined to be rear alley of 4100 block Parkwood'" Here is an excerpt of the Baltimore City homicides dataset: The data in this file contain data from January 2007 to October 2013. Unfortunately, the data on the web site are not particularly amenable to analysis, so I’ve scraped the data and put it in a separate file. I encourage you to go look at the web site/map to get a sense of what kinds of data are presented there. That data is collected and presented in a map that is publically available. The Baltimore Sun newspaper collects information on all homicides that occur in the city (it also reports on many of them). Probably easier to explain through demonstration.įor this chapter, we will use a running example using data from homicides in Baltimore City. Regexec(): This function searches a character vector for a regular expression, much like regexpr(), but it will additionally return the locations of any parenthesized sub-expressions. Sub(), gsub(): Search a character vector for regular expression matches and replace that match with another string ![]() Regexpr(), gregexpr(): Search a character vector for regular expression matches and return the indices of the string where the match begins and the length of the match grepl() returns a TRUE/ FALSE vector indicating which elements of the character vector contain a match grep() returns the indices into the character vector that contain a match or the specific strings that happen to have the match. ![]() Grep(), grepl(): These functions search for matches of a regular expression/pattern in a character vector. The primary R functions for dealing with regular expressions are 22.4 Example: Bootstrapping a Statistic. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |