Thursday, July 10, 2014
Getting & Cleaning Data - Week 1 Notes
Startup
if (!file.exists("hwdata")) { dir.create("hwdata")}
Internet files
fileURL <- "http://myURL.com/goeshere/la/la/la/uglyFileName.csv"
download.file(fileURL, destfile = "./mydatapath/mydata1.csv", method="curl")
Curl is needed if you're trying to use an https site
Local files
read.table()
read.file()
Excel files
library(xlsx)
myData <- read.xlsx(myfile, sheetIndex=1, header=TRUE)
write.xlsx() to write it back out as Excel again
XML files
install.packages("XML")
library(XML)
cleanURL <- getURL(myFileURL)
doc <-xmlTreeParse(cleanURL, useInternal=TRUE)
rootNode <- xmlRoot(doc)
names(rootNode)
rootNode[[1]]
rootNode[[1]][[1]]
xmlSApply(rootNode,xmlValue) **this will dump entire file to screen**
xpathSApply(rootNode,"//node",xmlValue) **just "node" elements**
JSON files
library(jsonlite)
jsonData <- fromJSON("myURLGoesHere")
thing <- toJSON(dataFrameIHad)
cat(thing) ##get json!
otherThing <- fromJSON(thing) ## reverse the process
Fancy Stuff - parse this stuff faster with C-compiled functions
install.packages("data.table")
library(data.table)
DT <- data.table(x=rnorm(9), y=rep(c("a","b", "c"), each=3),z=rnorm(9))
tables()
**assignment makes pointers! not copies! use copy function!
Finish up
dateDownloaded <- date()
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment