Chapter 5 Import / Export
The import()
and export()
methods are used to read / write collection dumps via a connection, such as a file, socket or URL.
5.1 JSON
The default format for is newline delimited JSON lines, i.e. one line for each record (aka NDJSON)
#> { "_id" : { "$oid" : "5b6ca140368aa2856aaf107b" }, "name" : "erik", "age" : 29 }
#> { "_id" : { "$oid" : "5b6ca14047a302fe1310fd2e" }, "name" : "jerry", "age" : 31, "has_age" : true }
#> { "_id" : { "$oid" : "5b6ca14047a302fe1310fd2f" }, "name" : "anna", "age" : 23, "has_age" : true }
#> { "_id" : { "$oid" : "5b6ca14047a302fe1310fd30" }, "name" : "joe", "has_age" : false }
Usually we will export to a file:
Let’s test this by removing the entire collection, and then importing it back from the file:
#> [1] 0
#> [1] 53940
You could also export data as json to a memory buffer using raw connections:
con <- rawConnection(raw(), 'wb')
dmd$export(con)
json <- rawToChar(rawConnectionValue(con))
df <- jsonlite::stream_in(textConnection(json), verbose = FALSE)
head(df)
#> $oid carat cut color clarity depth table price x y z
#> 1 5b6ca13c47a302fe131029db 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#> 2 5b6ca13c47a302fe131029dc 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
#> 3 5b6ca13c47a302fe131029dd 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
#> 4 5b6ca13c47a302fe131029de 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
#> 5 5b6ca13c47a302fe131029df 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
#> 6 5b6ca13c47a302fe131029e0 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
5.2 Via jsonlite
The jsonlite
package also allows for importing/exporting the NDJSON format directly in R via the stream_in
and stream_out
methods:
#> $oid carat cut color clarity depth table price x y z
#> 1 5b6ca13c47a302fe131029db 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
#> 2 5b6ca13c47a302fe131029dc 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
#> 3 5b6ca13c47a302fe131029dd 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
#> 4 5b6ca13c47a302fe131029de 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63
#> 5 5b6ca13c47a302fe131029df 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
#> 6 5b6ca13c47a302fe131029e0 0.24 Very Good J VVS2 62.8 57.0 336 3.94 3.96 2.48
#> 7 5b6ca13c47a302fe131029e1 0.24 Very Good I VVS1 62.3 57.0 336 3.95 3.98 2.47
#> 8 5b6ca13c47a302fe131029e2 0.26 Very Good H SI1 61.9 55.0 337 4.07 4.11 2.53
#> 9 5b6ca13c47a302fe131029e3 0.22 Fair E VS2 65.1 61.0 337 3.87 3.78 2.49
#> [ reached getOption("max.print") -- omitted 53931 rows ]
This is a convenient way to exchange data in a way with R users that might not have MongoDB. Similarly jsonlite
allows for exporting data in a way that is easy to import in Mongo:
jsonlite::stream_out(mtcars, file("mtcars.json"), verbose = FALSE)
mt <- mongo("mtcars")
mt$import(file("mtcars.json"))
mt$find()
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> [ reached getOption("max.print") -- omitted 23 rows ]
5.3 Streaming
Both mongolite
and jsonlite
also allow for importing NDJSON data from a HTTP stream:
flt <- mongo("flights")
flt$import(gzcon(curl::curl("https://jeroen.github.io/data/nycflights13.json.gz")))
flt$count()
#> [1] 336776
The same operation in jsonlite
would be:
flights <- jsonlite::stream_in(
gzcon(curl::curl("https://jeroen.github.io/data/nycflights13.json.gz")), verbose = FALSE)
nrow(flights)
#> [1] 336776
5.4 BSON
MongoDB internally stores data in BSON format, which is a binary version of JSON. Use the bson
parameter to dump a collection directly in BSON format:
Same to read it back:
#> [1] 336776
#> year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
#> 1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
#> 2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
#> 3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160 1089 5 42
#> 4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183 1576 5 44
#> 5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116 762 5 54
Using BSON to import/export is faster than JSON, but the resulting file can only be read by MongoDB.