Thursday, July 10, 2014

Getting & Cleaning Data - Week 1 Notes

Startup if (!file.exists("hwdata")) { dir.create("hwdata")} Internet files fileURL <- "http://myURL.com/goeshere/la/la/la/uglyFileName.csv" download.file(fileURL, destfile = "./mydatapath/mydata1.csv", method="curl") Curl is needed if you're trying to use an https site Local files read.table() read.file() Excel files library(xlsx) myData <- read.xlsx(myfile, sheetIndex=1, header=TRUE) write.xlsx() to write it back out as Excel again XML files install.packages("XML") library(XML) cleanURL <- getURL(myFileURL) doc <-xmlTreeParse(cleanURL, useInternal=TRUE) rootNode <- xmlRoot(doc) names(rootNode) rootNode[[1]] rootNode[[1]][[1]] xmlSApply(rootNode,xmlValue) **this will dump entire file to screen** xpathSApply(rootNode,"//node",xmlValue) **just "node" elements** JSON files library(jsonlite) jsonData <- fromJSON("myURLGoesHere") thing <- toJSON(dataFrameIHad) cat(thing) ##get json! otherThing <- fromJSON(thing) ## reverse the process Fancy Stuff - parse this stuff faster with C-compiled functions install.packages("data.table") library(data.table) DT <- data.table(x=rnorm(9), y=rep(c("a","b", "c"), each=3),z=rnorm(9)) tables() **assignment makes pointers! not copies! use copy function! Finish up dateDownloaded <- date()

No comments:

Post a Comment