logo
down
shadow

Create a pandas dataframe with null columns


Create a pandas dataframe with null columns

By : nicky zhu
Date : November 21 2020, 11:01 PM
help you fix your problem If you create your dataframe from a dict, any extra columns specified in the columns keyword will be initialized as null:
code :
In [3]: pd.DataFrame({'col1':['val1','val2','val3']}, 
                     columns=['col1','col2','col3'])
Out[3]:
   col1 col2 col3
0  val1  NaN  NaN
1  val2  NaN  NaN
2  val3  NaN  NaN
In [4]: pd.DataFrame([], ['val1','val2','val3'], ['col1','col2','col3'])
Out[4]:
     col1 col2 col3
val1  NaN  NaN  NaN
val2  NaN  NaN  NaN
val3  NaN  NaN  NaN


Share : facebook icon twitter icon
Python Pandas Dataframe: How to create columns from existing list in dataframe?

Python Pandas Dataframe: How to create columns from existing list in dataframe?


By : Zino Raouf
Date : October 14 2020, 09:27 AM
hope this fix your issue Probably, the easiest way to solve you problem, is to iterate over the tuples contained in the dataframe, and to create a new one. You can do it with two nested for loops.
code :
df_new = []
for i in df.itertuples():
    for l in i.list:
        df_new.append([i.year, i.month, i.day, l[0], l[1]])

df_new = pd.DataFrame(df_new, columns=['year', 'month', 'day', 'country', 'count'])
Pandas Create dataframe keeping only columns from another dataframe and appending

Pandas Create dataframe keeping only columns from another dataframe and appending


By : kn16h7
Date : March 29 2020, 07:55 AM
it fixes the issue You can use very fast numpy solution, thanks divakar for answer:
code :
#convert df1 to numpy array
a = df1.values
#convert first 10 columns to numpy array
b = good_models.iloc[:, :10].values
#reshape in numpy, add all columns names without last
df = pd.DataFrame(a[:, b].swapaxes(0,1).reshape(-1,b.shape[1]), columns=labels[:-1])

#add new columns - by repating with tile and repeat
df['target'] = np.tile(df1['target'].values, len(good_models))
df['model_index'] = np.repeat(good_models.index, len(df1))
In [251]: %timeit (jez())
100 loops, best of 3: 2.47 ms per loop

In [252]: %timeit (orig())
1 loop, best of 3: 703 ms per loop
import numpy as np
import pandas as pd
np.random.seed(452)
df1=pd.DataFrame(np.random.randint(0,100,size=(100,20+1)),columns=list(range(0,20+1)))
df1['target']=np.random.randint(2,size=100)
### This needs to be the columns in model_data_long
labels=['var1','var2','var3','var4','var5','var6','var7','var8','var9','var10','target']
### Contains the columns I want to exctract from df1 and append to model_data_long    
good_models=pd.DataFrame.from_records([(0,1,2,3,4,5,6,7,8,9,'target'),
                                     (9,8,7,6,5,4,3,2,1,0,'target'),
                                     (20,19,18,17,16,15,14,13,12,11,'target')],columns=labels)
### works but is slow

#100 times repeat rows for 300 rows                                         
good_models = pd.concat([good_models]*100).reset_index(drop=True)
def orig():

    model_data_long=pd.DataFrame()
    for i in range(0,len(good_models)):
        ### Extracting the values for a record from good_models
        t_list=good_models[good_models.index==i].values.tolist()[0]
        ### Keeping only the columns from t_list from the df1 frame.
        temp_data=pd.DataFrame(data=df1.filter(items=t_list,axis=1))
        ### renaming the columns in temp_data
        temp_data.columns=[labels]
        ### It is imparative that I have an index variable in the model_data_long dataframe.
        ### Setting the model_index variable, critical.
        temp_data['model_index']=i
        ### Finally, append to a long running dataframe.
        model_data_long=model_data_long.append([temp_data],ignore_index=True)

    return model_data_long


def jez():
    #convert df1 to numpy array
    a = df1.values
    #convert first 10 columns to numpy array
    b = good_models.iloc[:, :10].values
    #reshape in numpy, add all columns names without last
    df = pd.DataFrame(a[:, b].swapaxes(0,1).reshape(-1,b.shape[1]), columns=labels[:-1])

    #add new columns - by repating with tile and repeat
    df['target'] = np.tile(df1['target'].values, len(good_models))
    df['model_index'] = np.repeat(good_models.index, len(df1))
    return df
create a single column containing all non-null values from multiple columns in a pandas dataframe

create a single column containing all non-null values from multiple columns in a pandas dataframe


By : Larro Ava
Date : March 29 2020, 07:55 AM
will be helpful for those in need I've done this with iterrows(), but hoping there is a faster and more elegant way to achieve the desired outcome. , Let's try filter and stack?
code :
pd.Series(df.filter(like='Product').stack().values, name='product_list')

0     131.0
1     320.0
2     320.0
3     131.0
4     420.0
5     420.0
...
arr = df.filter(like='Product').values.ravel()
pd.Series(arr[~np.isnan(arr)].astype(np.int), name='product_list')
0     131.0
1     320.0
2     320.0
3     131.0
4     420.0
5     420.0
...
Create a new Dataframe based on Time Difference and a condition on columns in pandas dataframe

Create a new Dataframe based on Time Difference and a condition on columns in pandas dataframe


By : user2960369
Date : March 29 2020, 07:55 AM
wish help you to fix your issue I would first use a groupby on ticketID to compute a rank per ticket, then pivot the dataframe using that rank as column and ticketID as index to obtain the expected data.
After sorting the columns you get the expected dataframe. Time to rename the columns and reset the index to have a nice dataframe. Code could be:
code :
df['rank'] = df.groupby('ticketID').apply(lambda x:
                                          pd.Series(range(len(x)))).values
resul = df.pivot('ticketID', 'rank').fillna('')
resul.columns = resul.columns.swaplevel()
resul.sort_index(axis=1,inplace=True, level=0, sort_remaining=False)
resul.columns = ['{1}_{0}'.format(*c) for c in resul.columns]
resul.reset_index(inplace=True)
   ticketID             ChangeDate_0 OldStatus_0                  NewStatus_0             ChangeDate_1                  OldStatus_1                        NewStatus_1             ChangeDate_2                        OldStatus_2     NewStatus_2
0   1012327  2019-03-18 09:00:32.903      R or O   Action mail sent to client  2019-03-18 09:21:34.820   Action mail sent to client                Response Client - R  2019-03-18 09:34:21.890                Response Client - R  Status Updated
1   1012328  2019-03-18 07:00:09.960      R or O         ticket Closed - None  2019-03-18 07:09:31.420         ticket Closed - None                     Status Updated                                                                            
2   1012329  2019-03-18 06:52:03.490      R or O    ticket Closed - Satisfied  2019-03-18 07:09:33.433    ticket Closed - Satisfied                     Status Updated                                                                            
3   1012330  2019-03-18 10:25:13.493      R or O  Action mail sent to Service  2019-03-18 10:55:20.963  Action mail sent to Service  ticket Closed - Service Responded  2019-03-18 11:02:05.327  ticket Closed - Service Responded  Status Updated
4   1012332  2019-03-18 09:00:41.967      R or O   Action mail sent to client  2019-03-18 10:24:20.150   Action mail sent to client                Response Client - R  2019-03-18 10:32:40.717                Response Client - R  Status Updated
Using pandas groupby to create new dataframe containing all columns of parent dataframe

Using pandas groupby to create new dataframe containing all columns of parent dataframe


By : Yao Jian Yap
Date : March 29 2020, 07:55 AM
hope this fix your issue EDIT: REWROTE ENTIRE QUESTION , pandas
fuzzy and lovable, also pretty quick
Related Posts Related Posts :
  • Python 3 - TypeError: a bytes-like object is required, not 'str'
  • How can I; if var is integer then execute
  • Python Spark combineByKey Average
  • use python 3.4 instead of python 3.5
  • How would I use Try and except in this code?
  • Checking whether a list of numbers contains ANY five number sequence
  • Python 3 - Print Syntax Error
  • Going over a list to find which if any items repeat more than X times, then returning those items
  • make sprite crouch in current position python
  • Python for-loop
  • Without using packages, how to calculate number of years,months,days between two dates in python
  • cannot import name 'CredentialsFileSymbolicLinkError'
  • Python takes six times more memory than it should
  • Gracefully handle suppressed (unhandled) exceptions in PyQt4
  • How to represent Objects in Functional Python?
  • Drop previous pandas tables after merged into 1
  • does an imported function from a module could access class from this module?
  • How to wait for non empty input field in Selenium Python
  • Get the value of a cookie #Python
  • Error running Deep-shopping model
  • How to use single global variable for multiple Python files
  • Scipy stats.probplot not returning r^2 value bug?
  • Graphing Simulation with Bokeh
  • Output_shape of lstm model
  • pip3 --version ImportError
  • Trying to concat two time series dataframes and matchup the timestamps as close as possible
  • Using one list to sort another
  • Python, printing separate index's from dictionary
  • python3 failed to import cv2
  • How to assign a label's text to a button's command in Tkinter
  • apply_async order of results
  • Save image as stream and then display
  • shadow
    Privacy Policy - Terms - Contact Us © soohba.com