logo
Tags down

shadow

R: Read in random rows from file using fread or equivalent?


By : tree
Date : October 17 2020, 03:08 PM
wish of those help I have a very large multi-gigabyte file which is too costly to load into memory. The ordering of the rows in the file, however, are not random. Is there a way to read in a random subset of the rows using something like fread? , Using the tidyverse (as opposed to data.table), you could do:
code :
library(readr)
library(purrr)
library(dplyr)

# generate some random numbers between 1 and how many rows your files has,
# assuming you can ballpark the number of rows in your file
#
# Generating 900 integers because we'll grab 10 rows for each start, 
# giving us a total of 9000 rows in the final
start_at  <- floor(runif(900, min = 1, max = (n_rows_in_your_file - 10) ))

# sort the index sequentially
start_at  <- start_at[order(start_at)]

# Read in 10 rows at a time, starting at your random numbers, 
# binding results rowwise into a single data frame
sample_of_rows  <- map_dfr(start_at, ~read_csv("data_file", n_max = 10, skip = .x) ) 


Share : facebook icon twitter icon

A C# equivalent of C's fread file i/o


By : Jessica Pacilan
Date : March 29 2020, 07:55 AM
Any of those help There isn't anything wrong with using the P/Invoke marshaller, it is not unsafe and you don't have to use the unsafe keyword. Getting it wrong will just produce bad data. It can be a lot easier to use than explicitly writing the deserialization code, especially when the file contains strings. You can't use BinaryReader.ReadString(), it assumes that the string was written by BinaryWriter. Make sure however that you declare the structure of the data with a struct declaration, this.GetType() is not likely to work out well.
Here's a generic class that will make it work for any structure declaration:
code :
  class StructureReader<T> where T : struct {
    private byte[] mBuffer;
    public StructureReader() {
      mBuffer = new byte[Marshal.SizeOf(typeof(T))];
    }
    public T Read(System.IO.FileStream fs) {
      int bytes = fs.Read(mBuffer, 0, mBuffer.Length);
      if (bytes == 0) throw new InvalidOperationException("End-of-file reached");
      if (bytes != mBuffer.Length) throw new ArgumentException("File contains bad data");
      T retval;
      GCHandle hdl = GCHandle.Alloc(mBuffer, GCHandleType.Pinned);
      try {
        retval = (T)Marshal.PtrToStructure(hdl.AddrOfPinnedObject(), typeof(T));
      }
      finally {
        hdl.Free();
      }
      return retval;
    }
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
struct Sample {
  [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 42)]
  public string someString;
}
  var data = new List<Sample>();
  var reader = new StructureReader<Sample>();
  using (var stream = new FileStream(@"c:\temp\test.bin", FileMode.Open, FileAccess.Read)) {
    while(stream.Position < stream.Length) {
      data.Add(reader.Read(stream));
    }
  }

fread to read top n rows from a large file


By : user1835342
Date : March 29 2020, 07:55 AM
With these it helps Another workaround is to fetch the first 500 lines with shell command:
code :
rdata<- fread(
    cmd = paste('head -n 500', csvfile),
    sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
)

R data.table fread cannot read in irregular column lengths when the larger rows do not appear early in a file


By : user2690548
Date : March 29 2020, 07:55 AM
Any of those help Note that what you have is not a csv file since it has no header. If we add a header it will work. First use fread to read it in as a single field per line giving the character vector Lines. From that compute the maximum number of fields n. Finally re-read Lines after prefixing it with a header.
code :
Lines <- fread("shortLong.csv", sep = "")[[1]]
n <- max(count.fields(textConnection(Lines), sep = ","))
fread(text = c(toString(1:n), Lines), header = TRUE, fill = TRUE)

How to read specific rows of CSV file with fread function


By : Riley
Date : March 29 2020, 07:55 AM
Hope this helps This approach takes a vector v (corresponding to your read_vec), identifies sequences of rows to read, feeds those to sequential calls to fread(...), and rbinds the result together.
If the rows you want are randomly distributed throughout the file, this may not be faster. However, if the rows are in blocks (e.g., c(1:50, 55, 70, 100:500, 700:1500)) then there will be few calls to fread(...) and you may see a significant improvement.

Read csv file with selected rows using data.table's fread


By : adit.haha
Date : October 09 2020, 03:00 PM
it helps some times I was going through some earlier post- , This worked for me in windows (unix alternative is grep)
code :
write.csv(iris,"iris.csv")

fread(cmd = paste('findstr', 'versicolor', 'iris.csv'))

    V1  V2  V3  V4  V5         V6
 1:  51 7.0 3.2 4.7 1.4 versicolor
 2:  52 6.4 3.2 4.5 1.5 versicolor
 3:  53 6.9 3.1 4.9 1.5 versicolor
 4:  54 5.5 2.3 4.0 1.3 versicolor
 5:  55 6.5 2.8 4.6 1.5 versicolor
 6:  56 5.7 2.8 4.5 1.3 versicolor
 7:  57 6.3 3.3 4.7 1.6 versicolor
 8:  58 4.9 2.4 3.3 1.0 versicolor
 9:  59 6.6 2.9 4.6 1.3 versicolor
10:  60 5.2 2.7 3.9 1.4 versicolor
11:  61 5.0 2.0 3.5 1.0 versicolor
Related Posts Related Posts :
  • Make a factor variable out of few data.frame columns
  • Overlay overall distribution graph with segment wise distribution
  • Cross product of vector
  • How to store loop output of each iteration to data frame
  • Acquire factors for each level of a character vector
  • can I estimate a time varying seasonal effect in R with GAMM?
  • SD value not showed in dplyr
  • Use milliseconds in variable Time with R
  • Why does R.predict.svm return a list of the wrong size?
  • ggmap + ggplot will not plot certain values
  • How to stop for loop from printing results in R
  • Restructuring DataFrame Based on Single Column Values
  • How to split data.frame to equal columns
  • Replace NAs in vector (A) with specific values from another vector (B) and force the copied value in vector (B) to NAs
  • How to add an in memory png image to a plot?
  • selectInput is not updated properly in R Shiny
  • Use of for loop to delete rows of specific instances in R
  • How to plot the output from an nls model fit in ggplot2
  • Strptime my table gives me NA
  • Melting an R data.table with a factor column
  • Scale circle size Venn diagram by relative proportion
  • How to scrape this links with follow_link in R?
  • Use GET function to run results from a loop
  • How would you run a loop to randomize a community matrix and store them?
  • How to add secondary Y axis in ggplot in R?
  • heatmap with values (ggplot2)--how to make cells square and automatically sized?
  • R piped inner join not working
  • scraping table with rvest (XHR file)
  • Function to return the mean of type numeric
  • Adding a column to custom piped function
  • How to represent categorical variable vs Continuous variable using ggplot?
  • How to Export Each Grouped Table in a List of Tables to a Different Excel Tab Using ReadXL and Tidyverse
  • How to follow group by time
  • Function with a for loop to create a column with values 1:n conditioned by intervals matched by another column
  • Assigning 40 shapes or more in scale_shape_manual
  • install.keras() in RStudio fails with http connection error
  • How to pass a dataframe slice to histogram function for mode normalisation in R?
  • How to manipulate a community diversity profile
  • r igraph - Identify ties of nodes to a subgraph regardless of affiliation to said subgraph
  • Display a rectangle in ggplot with x axis in date format
  • Merging two Dataframes in R by ID, One is the subset of the other
  • How do I apply conditions on a particular group element and find permutations from another group in the same table?
  • how to add into an existing column from another column in R
  • fileInput not returning any dataframe
  • Change dataframe values R using different column name provided?
  • error calling combine function loop foreach in R
  • Find mean for sorted top n transactions
  • Finding the largest number in a vector which is smaller than specific value
  • Create a list name column in a list of data frames
  • shadow
    Privacy Policy - Terms - Contact Us © soohba.com