StructType and StructField are classes used to programmatically specify the schema to the DataFrame and create complex columns like nested struct, array, and map columns.
StructType is a collection of StructField's
from pyspark.sql.types import StructField,StructType,IntegerType,StringType
data =[(1,'Rakesh'),(2,'Rakesh Yadav')]
schema = StructType([\
StructField(name='id',dataType=IntegerType()),\
StructField(name='name',dataType=StringType())])
df = spark.createDataFrame(data,schema)
df.show()
Complext Nested Columns
from pyspark.sql.types import StructField,StructType,IntegerType,StringType
data =[(1,('Rakesh','Yadav')),(2,('Ramesh','Yadav'))]
nameSchema = StructType([\
StructField('firstName',StringType()),\
StructField('lastName',StringType())\
])
schema = StructType([\
StructField(name='id',dataType=IntegerType()),\
StructField(name='name',dataType=nameSchema)])
df = spark.createDataFrame(data,schema)
df.show()