TroubleshootingΒΆ
Job is slow or OOMs (throws an
OutOfMemoryError) while using an aggregate likecollect_listorsample_call_summary_statsTry disabling the ObjectHashAggregate by setting
spark.sql.execution.useObjectHashAggregateExectofalse
Job is slow or OOMs while writing to partitioned table
This error can occur when reading from highly compressed files. Try decreasing
spark.files.maxPartitionBytesto a smaller value like33554432(32MB)
My VCF looks weird after merging VCFs and saving with
bigvcfWhen saving to a VCF, the samples in the genotypes array must be in the same order for each row. This ordering is not guaranteed when using
collect_listto join multiple VCFs. Try sorting the array usingsort_array.