logo
down
shadow

filtering a DataFrame based on rows being less than a percent of sum of any column


filtering a DataFrame based on rows being less than a percent of sum of any column

By : Suvadeep Banerjee
Date : November 20 2020, 11:01 PM
Hope that helps Here are my example data:
code :
row_sum = df.sum(axis=1)
total_sum = row_sum.sum()
print(df.loc[row_sum/total_sum > 0.01])
                               Clint   Gibbon  Orangutan   Rhesus    Susie
count augCGP                  2338.0   4178.0     5753.0   4239.0   2740.0
      augTM                   2888.0   4313.0     3656.0   5114.0   2894.0
      augTM,augTMR            1441.0   3882.0     3520.0   3357.0   2789.0
      augTM,augTMR,transMap   8725.0   5839.0     6567.0   6296.0  10196.0
      augTM,transMap         17341.0   6828.0     6568.0  11563.0  10821.0
      augTMR                  2881.0   6550.0     5952.0   4217.0   5399.0
      transMap               39284.0  44285.0    46113.0  39930.0  41300.0


Share : facebook icon twitter icon
Filtering rows based on column values in spark dataframe scala

Filtering rows based on column values in spark dataframe scala


By : MinartistJ
Date : March 29 2020, 07:55 AM
I wish this help you I have a dataframe(spark): , One way is to use monotonically_increasing_id() and a self-join:
code :
val data = Seq((3,0),(3,1),(3,0),(4,1),(4,0),(4,0)).toDF("id", "value")
data.show
+---+-----+
| id|value|
+---+-----+
|  3|    0|
|  3|    1|
|  3|    0|
|  4|    1|
|  4|    0|
|  4|    0|
+---+-----+
val dataWithIndex = data.withColumn("idx", monotonically_increasing_id())
// dataWithIndex.cache()
val minIdx = dataWithIndex
               .filter($"value" === 1)
               .groupBy($"id")
               .agg(min($"idx"))
               .toDF("r_id", "min_idx")
dataWithIndex.join(
  minIdx,
  ($"r_id" === $"id") && ($"idx" <= $"min_idx")
).select($"id", $"value").show
+---+-----+
| id|value|
+---+-----+
|  3|    0|
|  3|    1|
|  4|    1|
+---+-----+
//  dataWithIndex.cache()
Filtering a multiindex dataframe based on column values dropping all rows inside level

Filtering a multiindex dataframe based on column values dropping all rows inside level


By : Cody Hulett
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I am trying to filter a DataFrame based on one or more values. Here is an example CSV: , try this:
code :
In [169]: df = df.drop(df[(df.classifier=='AlnCoverage') & (df.value < 1)].index)

In [170]: df
Out[170]:
                                          classifier    value
AlignmentId          TranscriptId
ENSMUST00000025014-1 ENSMUST00000025014  AlnCoverage  1.00000
                     ENSMUST00000025014  AlnIdentity  0.96382
                     ENSMUST00000025014      Badness  0.03618
pandas dataframe: filtering on rows based on the string content of a column

pandas dataframe: filtering on rows based on the string content of a column


By : sagar saxena
Date : March 29 2020, 07:55 AM
it fixes the issue I use the following code to filter out some rows in my data frame: , You just need to add .str in there to use the Pandas string methods:
code :
In [12]: df = pd.DataFrame({'s': ['good_1', 'bad_1', 'good_2']})

In [13]: df
Out[13]:
        s
0  good_1
1   bad_1
2  good_2

In [14]: df['s'].str.startswith("good_")
Out[14]:
0     True
1    False
2     True
Name: s, dtype: bool

In [15]: df[df['s'].str.startswith("good_")]
Out[15]:
        s
0  good_1
2  good_2
Filtering rows in dataframe based on partial column name and mathematical expression

Filtering rows in dataframe based on partial column name and mathematical expression


By : Glenn Macvicar
Date : March 29 2020, 07:55 AM
I wish did fix the issue. I'm trying to find a way to filter rows using both the partial name of the column headers and a mathematical expression (x > 0). Given my data here: , Using:
code :
cols <- grep('[SW]$', names(df), value = TRUE)
df[rowSums(df[, cols] > 0) == length(cols),]
     OTU_ID X3_22L15_S X3_22T10_W X3_22L6_S X3_22Algae
2 denovo147         44        484        28          0
wcols <- grep('W$', names(df), value = TRUE)
scols <- grep('S$', names(df), value = TRUE)

df[rowSums(df[, wcols, drop = FALSE] > 0) & rowSums(df[, scols, drop = FALSE] > 0),]
     OTU_ID X3_22L15_S X3_22T10_W X3_22L6_S X3_22Algae
2 denovo147         44        484        28          0
filtering rows in a datatable or dataframe based a column that happens to be a list

filtering rows in a datatable or dataframe based a column that happens to be a list


By : Moody One
Date : March 29 2020, 07:55 AM
To fix the issue you can do I would like to filter this data table, but the column I want to filter by is a list. , We can use identical:
code :
data[sapply(conditions, identical, c('rain', 'sleet')), ] 
library(tidyverse)

data %>%
  filter(map_lgl(conditions, identical, c('rain', 'sleet')))
   date conditions
1:    3 rain,sleet
Related Posts Related Posts :
  • Django Form Based on Variable Attributes
  • Relocate all the evens
  • How to scrap span ids' texts in beautifulsoup in the following html?
  • How to generate random number in a given range as a Tensorflow variable
  • Gradient Descent Variation doesn't work
  • Python 2.7 - search for a particular URL on a webpage with ajax
  • How to configure Luigi task retry correctly?
  • web.py : an urlencoded slash into args
  • Use of pyzmq's logging handler in python
  • How to count the number of a particular entry. python
  • devide int into lower whole ints
  • Access atribute of every object in pandas dataframe column
  • Combine Dataframe rows on conditions
  • Select closest date (or value) in pandas / python
  • Pycharm and remote interpreter (Docker) shows errors but runs fine
  • Get started to launch google-cloud-ml with my own dataset
  • Multiprocessing: use only the physical cores?
  • Django Login Custom Auth works locally but not on production server
  • Python: Invalid HTTP basic authentication header with long base64 string
  • How can I request several pages without wating for the output?
  • Flask Response vs Flask make_response
  • python linear regression predict by date
  • How to get pandas dataframe where columns are the subsequent n-elements from another column dataframe?
  • MYSQL: "Access denied for user 'X'@'localhost' (using password: YES)" PYTHON
  • install scipy package via pycharm in windows 10 64 bit - python 3.5
  • Update time in linux and solaris machines from robot framework
  • Complex pandas isin function
  • Averaging over every n elements of an array without numpy
  • An elegant way of inserting multiple arguments
  • IntegrityError:NOT NULL constraint failed: chatapp_chat.message
  • Indexing of 3d numpy arrays with 2d arrays
  • Creating a mean of columns with csv writer
  • Reading in environment variables from an environment file
  • Collapse duplicate rows with pandas
  • How can I use skyfied to convert SGP4 TEME coordinate to ECEF?
  • How to modify object in Python's Rtree index
  • Create Hexbin plot with pandas dataframe using index and columns names as x and y
  • SQLAlchemy query returns no data if a database field is empty
  • Python pandas column asignment between dataframe and series does not work
  • ValueError: Unknown label type: array while using Decision Tree Classifier and using a custom dataset
  • Trouble accessing exif information with PIL.Image._getexif()
  • Use all coordinates in a grid except with certain value
  • Why for loop is splitting strings of user input?
  • How can I add two variable and assign to result variable in Python?
  • Error when parsing timestamp with pandas read_csv
  • Slicing arrays based on boolean array in python
  • Feeding scipy.sparse() sparse matrices into CVXOPT
  • How to separate a irregularly cased string to get the words? - Python
  • Pandas: replace some values in column if that contain a substring
  • Fabric does not close the ssh connection
  • Python Creating Classes Code
  • When will train() method in easy_seq2seq stop?
  • How to split each element of the RDD in spark with python?
  • Read in csv file in python, round the values and write back to file
  • How to properly close a QWidget-window in an API with PythonQt
  • How to know which segment a value reside in
  • pandas: convert multiple categories to dummies
  • 'Options' object has no attribute 'get_all_field_names'
  • Customize django filter model field
  • NLTK tag Dutch sentence
  • shadow
    Privacy Policy - Terms - Contact Us © soohba.com