Chapter 5 Import / Export

The import() and export() methods are used to read / write collection dumps via a connection, such as a file, socket or URL.

5.1 JSON

The default format for is newline delimited JSON lines, i.e. one line for each record (aka NDJSON)

#> { "_id" : { "$oid" : "5b6ca140368aa2856aaf107b" }, "name" : "erik", "age" : 29 }
#> { "_id" : { "$oid" : "5b6ca14047a302fe1310fd2e" }, "name" : "jerry", "age" : 31, "has_age" : true }
#> { "_id" : { "$oid" : "5b6ca14047a302fe1310fd2f" }, "name" : "anna", "age" : 23, "has_age" : true }
#> { "_id" : { "$oid" : "5b6ca14047a302fe1310fd30" }, "name" : "joe", "has_age" : false }

Usually we will export to a file:

Let’s test this by removing the entire collection, and then importing it back from the file:

#> [1] 0
#> [1] 53940

You could also export data as json to a memory buffer using raw connections:

#>                       $oid carat       cut color clarity depth table price    x    y    z
#> 1 5b6ca13c47a302fe131029db  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
#> 2 5b6ca13c47a302fe131029dc  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
#> 3 5b6ca13c47a302fe131029dd  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
#> 4 5b6ca13c47a302fe131029de  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
#> 5 5b6ca13c47a302fe131029df  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
#> 6 5b6ca13c47a302fe131029e0  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48

5.2 Via jsonlite

The jsonlite package also allows for importing/exporting the NDJSON format directly in R via the stream_in and stream_out methods:

#>                           $oid carat       cut color clarity depth table price     x     y     z
#> 1     5b6ca13c47a302fe131029db  0.23     Ideal     E     SI2  61.5  55.0   326  3.95  3.98  2.43
#> 2     5b6ca13c47a302fe131029dc  0.21   Premium     E     SI1  59.8  61.0   326  3.89  3.84  2.31
#> 3     5b6ca13c47a302fe131029dd  0.23      Good     E     VS1  56.9  65.0   327  4.05  4.07  2.31
#> 4     5b6ca13c47a302fe131029de  0.29   Premium     I     VS2  62.4  58.0   334  4.20  4.23  2.63
#> 5     5b6ca13c47a302fe131029df  0.31      Good     J     SI2  63.3  58.0   335  4.34  4.35  2.75
#> 6     5b6ca13c47a302fe131029e0  0.24 Very Good     J    VVS2  62.8  57.0   336  3.94  3.96  2.48
#> 7     5b6ca13c47a302fe131029e1  0.24 Very Good     I    VVS1  62.3  57.0   336  3.95  3.98  2.47
#> 8     5b6ca13c47a302fe131029e2  0.26 Very Good     H     SI1  61.9  55.0   337  4.07  4.11  2.53
#> 9     5b6ca13c47a302fe131029e3  0.22      Fair     E     VS2  65.1  61.0   337  3.87  3.78  2.49
#>  [ reached getOption("max.print") -- omitted 53931 rows ]

This is a convenient way to exchange data in a way with R users that might not have MongoDB. Similarly jsonlite allows for exporting data in a way that is easy to import in Mongo:

#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#>  [ reached getOption("max.print") -- omitted 23 rows ]

5.3 Streaming

Both mongolite and jsonlite also allow for importing NDJSON data from a HTTP stream:

#> [1] 336776

The same operation in jsonlite would be:

#> [1] 336776

5.4 BSON

MongoDB internally stores data in BSON format, which is a binary version of JSON. Use the bson parameter to dump a collection directly in BSON format:

Same to read it back:

#> [1] 336776
#>   year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
#> 1 2013     1   1      517         2      830        11      UA  N14228   1545    EWR  IAH      227     1400    5     17
#> 2 2013     1   1      533         4      850        20      UA  N24211   1714    LGA  IAH      227     1416    5     33
#> 3 2013     1   1      542         2      923        33      AA  N619AA   1141    JFK  MIA      160     1089    5     42
#> 4 2013     1   1      544        -1     1004       -18      B6  N804JB    725    JFK  BQN      183     1576    5     44
#> 5 2013     1   1      554        -6      812       -25      DL  N668DN    461    LGA  ATL      116      762    5     54

Using BSON to import/export is faster than JSON, but the resulting file can only be read by MongoDB.